Bastio
Providers

Google Vertex AI Integration

Complete guide to using Google Vertex AI with Bastio.

Google Vertex AI

Access Google Gemini and partner models (Claude, Mistral, Llama) through a single GCP credential with Bastio's full security protection.

Overview

Google Vertex AI provides a unified gateway to multiple AI providers through Google Cloud's infrastructure. With Bastio, you can:

  • One credential, four vendors - Access Google, Anthropic, Mistral, and Meta models with a single GCP service account
  • Enterprise-grade security - VPC-SC, CMEK encryption, audit logging, compliance certifications
  • Unified billing - All usage consolidated in your Google Cloud bill
  • Full security coverage - All Bastio security features work across all providers
  • Regional control - Deploy in specific regions for data residency requirements

Why Vertex AI?

Unlike direct provider integrations, Vertex AI offers unique advantages:

FeatureDirect ProvidersVertex AI
Credentials needed4 separate API keys1 GCP service account
Billing4 separate invoices1 GCP invoice
ComplianceVaries by providerGCP certifications (SOC, HIPAA, FedRAMP)
Network securityPublic internetVPC-SC, private endpoints
Model accessIndividual agreementsModel Garden marketplace

Supported Models

Google Gemini Models

Native Google models available directly through Vertex AI:

Gemini 3.x (Preview)

ModelContextMax OutputInput PriceOutput Price
gemini-3-pro-preview1M tokens64K tokens$2.00/1M$12.00/1M

Gemini 2.5 (GA)

ModelContextMax OutputInput PriceOutput Price
gemini-2.5-pro1M tokens64K tokens$1.25/1M$10.00/1M
gemini-2.5-flash1M tokens64K tokens$0.30/1M$2.50/1M
gemini-2.5-flash-lite1M tokens64K tokens$0.10/1M$0.40/1M

Gemini 2.0 (GA)

ModelContextMax OutputInput PriceOutput Price
gemini-2.0-flash1M tokens8K tokens$0.10/1M$0.40/1M
gemini-2.0-flash-lite1M tokens8K tokens$0.075/1M$0.30/1M

Gemini 1.5 (Legacy)

ModelContextMax OutputInput PriceOutput Price
gemini-1.5-pro2M tokens8K tokens$1.25/1M$5.00/1M
gemini-1.5-flash1M tokens8K tokens$0.075/1M$0.30/1M
gemini-1.5-flash-8b1M tokens8K tokens$0.0375/1M$0.15/1M

Anthropic Claude Models (via Model Garden)

ModelContextMax OutputInput PriceOutput Price
claude-opus-4-5200K tokens32K tokens$15.00/1M$75.00/1M
claude-sonnet-4-5200K tokens32K tokens$3.00/1M$15.00/1M
claude-haiku-3-5200K tokens8K tokens$0.80/1M$4.00/1M
claude-3-opus200K tokens4K tokens$15.00/1M$75.00/1M
claude-3-sonnet200K tokens4K tokens$3.00/1M$15.00/1M

Mistral AI Models (via Model Garden)

ModelContextMax OutputInput PriceOutput Price
mistral-large128K tokens8K tokens$2.00/1M$6.00/1M
mistral-small128K tokens8K tokens$0.20/1M$0.60/1M
codestral32K tokens8K tokens$0.20/1M$0.60/1M
mistral-nemo128K tokens8K tokens$0.15/1M$0.15/1M

Meta Llama Models (via Model Garden)

ModelContextMax OutputInput PriceOutput PriceVision
llama-4-maverick128K tokens8K tokens$0.40/1M$1.20/1MYes
llama-4-scout128K tokens8K tokens$0.20/1M$0.60/1MYes
llama-3.3-70b128K tokens8K tokens$0.20/1M$0.20/1MNo
llama-3.2-90b-vision128K tokens8K tokens$0.30/1M$0.90/1MYes

Quick Start

Prerequisites

  1. Google Cloud Platform (GCP) account
  2. GCP project with billing enabled
  3. Vertex AI API enabled
  4. Service Account with Vertex AI User role

Step 1: Enable Vertex AI API

  1. Go to the Google Cloud Console
  2. Select your project (or create a new one)
  3. Navigate to Vertex AI > Dashboard
  4. Click Enable All Recommended APIs

Step 2: Create a Service Account

  1. Go to IAM & Admin > Service Accounts
  2. Click Create Service Account
  3. Name it (e.g., bastio-vertex-ai)
  4. Grant it the Vertex AI User role (roles/aiplatform.user)
  5. Click Done

Step 3: Generate a JSON Key

  1. Click on the newly created service account
  2. Go to the Keys tab
  3. Click Add Key > Create new key
  4. Select JSON and click Create
  5. Save the downloaded file securely

Step 4: Configure in Bastio

  1. Go to Dashboard > Proxies > Create New Proxy
  2. Select Google Vertex AI as provider
  3. Paste the contents of your JSON key file
  4. Click Create Proxy

BYOK Mode (Bring Your Own Key)

Use your own GCP credentials with Bastio.

Via Dashboard

  1. Go to Dashboard > Proxies > Create New Proxy
  2. Select Google Vertex AI as provider
  3. Choose Your API Keys (BYOK) mode
  4. Enter your GCP Service Account JSON:
{
  "type": "service_account",
  "project_id": "your-project-id",
  "private_key_id": "key-id...",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "bastio-vertex-ai@your-project.iam.gserviceaccount.com",
  "client_id": "123456789",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/..."
}
  1. Click Create Proxy

Via API

# Create Vertex AI proxy
curl -X POST https://api.bastio.ai/proxy \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production Vertex AI",
    "provider": "vertex",
    "llm_mode": "byok",
    "model_behavior": "passthrough"
  }'

# Add GCP credentials
curl -X POST https://api.bastio.ai/keys/provider \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "vertex",
    "key_name": "GCP Production",
    "api_key": "{\"type\":\"service_account\",\"project_id\":\"...\",\"private_key\":\"...\"}"
  }'

Partner Model Setup Guides

Partner models (Claude, Mistral, Llama) require additional enablement in your GCP project through Model Garden.

Enabling Anthropic Claude

Step 1: Access Model Garden

  1. Go to Vertex AI Model Garden
  2. Search for "Claude"
  3. Click on Claude 3.5 Sonnet (or your preferred model)

Step 2: Enable the Model

  1. Click Enable on the model card
  2. Review and accept Anthropic's terms and conditions
  3. Wait for provisioning (usually instant)

Step 3: Verify Enablement

Test with a direct API call:

# Get access token
ACCESS_TOKEN=$(gcloud auth print-access-token)

# Test Claude endpoint
curl -X POST \
  "https://us-central1-aiplatform.googleapis.com/v1/projects/YOUR_PROJECT/locations/us-central1/publishers/anthropic/models/claude-3-5-sonnet:rawPredict" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "anthropic_version": "vertex-2023-10-16",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

Common Claude Issues

"Model not found" error

  • Verify you enabled the model in Model Garden
  • Check the region supports Claude (use us-central1)
  • Ensure you accepted Anthropic's terms

"Permission denied" error

  • Service account needs roles/aiplatform.user
  • The model must be enabled in your project

Enabling Mistral AI

Step 1: Access Model Garden

  1. Go to Vertex AI Model Garden
  2. Search for "Mistral"
  3. Click on Mistral Large (or your preferred model)

Step 2: Enable the Model

  1. Click Enable on the model card
  2. Accept Mistral's terms and conditions
  3. Wait for provisioning

Step 3: Verify Enablement

ACCESS_TOKEN=$(gcloud auth print-access-token)

curl -X POST \
  "https://us-central1-aiplatform.googleapis.com/v1/projects/YOUR_PROJECT/locations/us-central1/publishers/mistralai/models/mistral-large:rawPredict" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-large",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

Common Mistral Issues

"Endpoint not found" error

  • Use correct publisher: mistralai (not mistral)
  • Check region availability

Enabling Meta Llama

Step 1: Access Model Garden

  1. Go to Vertex AI Model Garden
  2. Search for "Llama"
  3. Click on Llama 4 Maverick (or your preferred model)

Step 2: Accept Meta License

  1. Click Enable on the model card
  2. Important: Review and accept Meta's license agreement
  3. This is a separate agreement from Google's terms
  4. Wait for provisioning

Step 3: Verify Enablement

ACCESS_TOKEN=$(gcloud auth print-access-token)

curl -X POST \
  "https://us-central1-aiplatform.googleapis.com/v1/projects/YOUR_PROJECT/locations/us-central1/publishers/meta/models/llama-4-maverick:rawPredict" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-maverick",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

Vision Models Note

For Llama vision models (llama-4-maverick, llama-4-scout, llama-3.2-90b-vision), you can include images:

{
  "model": "llama-4-maverick",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "What's in this image?"},
      {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
    ]
  }]
}

Making Requests

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.bastio.ai/v1/guard/{PROXY_ID}/v1",
    api_key="your-bastio-api-key"
)

# Using Gemini
response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ]
)

print(response.choices[0].message.content)

JavaScript Example

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.bastio.ai/v1/guard/{PROXY_ID}/v1',
  apiKey: process.env.BASTIO_API_KEY,
});

// Using Claude via Vertex
const response = await client.chat.completions.create({
  model: 'claude-sonnet-4-5',
  messages: [
    { role: 'user', content: 'Write a haiku about AI' }
  ],
});

console.log(response.choices[0].message.content);

Streaming Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.bastio.ai/v1/guard/{PROXY_ID}/v1",
    api_key="your-bastio-api-key"
)

# Streaming with Mistral
stream = client.chat.completions.create(
    model="mistral-large",
    messages=[{"role": "user", "content": "Write a short story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Using Different Providers

Same proxy, different models - Bastio routes automatically:

# Google Gemini
client.chat.completions.create(model="gemini-2.5-flash", ...)

# Anthropic Claude
client.chat.completions.create(model="claude-sonnet-4-5", ...)

# Mistral AI
client.chat.completions.create(model="mistral-large", ...)

# Meta Llama
client.chat.completions.create(model="llama-4-maverick", ...)

Model Routing

Bastio automatically routes requests to the correct Vertex AI endpoint based on the model name:

  • Gemini models (gemini-*): Routed to Google's native Vertex AI endpoint
  • Claude models (claude-*): Routed to Anthropic's Model Garden endpoint
  • Mistral models (mistral-*, codestral): Routed to Mistral's Model Garden endpoint
  • Llama models (llama-*): Routed to Meta's Model Garden endpoint

No configuration needed - just specify the model name and Bastio handles the rest.

Credential Format

Service Account JSON Structure

Your GCP Service Account JSON file contains these fields:

{
  "type": "service_account",
  "project_id": "your-project-id",
  "private_key_id": "abc123...",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "service-account@project.iam.gserviceaccount.com",
  "client_id": "123456789012345678901",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/..."
}

Required Fields

  • type: Must be "service_account"
  • project_id: Your GCP project ID
  • private_key: The private key for authentication
  • client_email: Service account email

Supported Regions

Vertex AI is available in multiple regions. Common regions with full model support:

RegionLocationGeminiClaudeMistralLlama
us-central1IowaYesYesYesYes
us-east4VirginiaYesYesYesYes
europe-west1BelgiumYesYesYesYes
europe-west4NetherlandsYesYesYesYes
asia-northeast1TokyoYesLimitedLimitedLimited

Recommendation: Use us-central1 for best model availability.

Pricing & Cost Tracking

Pricing Comparison

Vertex AI pricing is generally identical to direct provider pricing:

ProviderModelDirectVertex AI
GoogleGemini 2.5 FlashN/A$0.30/$2.50
AnthropicClaude Sonnet 4.5$3/$15$3/$15
MistralMistral Large$2/$6$2/$6
MetaLlama 4 Maverick$0.40/$1.20$0.40/$1.20

Prices per 1M tokens (input/output)

Cost Tracking Features

Bastio automatically tracks costs across all providers:

  • Dashboard - Real-time spending across all models
  • Analytics - Historical cost analysis by model, user, time
  • Billing - Detailed breakdowns by provider and model
  • Alerts - Set spending limits and notifications

Troubleshooting

Permission Denied

Error: PERMISSION_DENIED or Access denied

Solutions:

  1. Verify Service Account has roles/aiplatform.user role
  2. For partner models, check they're enabled in Model Garden
  3. Ensure billing is enabled on your GCP project
  4. Test credentials with gcloud auth activate-service-account --key-file=key.json

Quota Exceeded

Error: RESOURCE_EXHAUSTED or Quota exceeded

Solutions:

  1. Go to IAM & Admin > Quotas in GCP Console
  2. Filter by "Vertex AI"
  3. Request quota increases for specific models
  4. Consider using different regions for better availability

Partner Model Not Found

Error: Model not found or Endpoint not available

Solutions:

  1. Verify model is enabled in Model Garden
  2. Check you've accepted the provider's terms/license
  3. Confirm the region supports the model
  4. Use exact model name (e.g., claude-sonnet-4-5, not claude-4.5)

Service Account Issues

Error: Invalid credentials or Could not parse service account

Solutions:

  1. Verify JSON is valid (no trailing commas, proper escaping)
  2. Check the private key hasn't been truncated
  3. Ensure type field is "service_account"
  4. Generate a new key if the current one is corrupted

Region Availability

Error: Region not supported or endpoint errors

Solutions:

  1. Use us-central1 for best compatibility
  2. Check Vertex AI regions
  3. Partner models may have limited regional availability

Frequently Asked Questions

Q: Can I use both Vertex AI and direct provider APIs?

A: Yes! Create separate proxies for each. For example, have one Vertex AI proxy for GCP-compliant workloads and a direct Anthropic proxy for other use cases.

Q: Does streaming work for all partner models?

A: Yes, streaming is fully supported for Gemini, Claude, Mistral, and Llama models through Vertex AI.

Q: How do I switch from direct Anthropic to Vertex Claude?

A: Create a new Vertex AI proxy, enable Claude in Model Garden, and update your app's proxy ID. Test in staging before switching production.

Q: What if my Service Account key expires?

A: Service Account keys don't expire by default. However, if you set a key expiration or delete the key, you'll need to generate a new one and update your Bastio credentials.

Q: Are there any feature differences between direct and Vertex?

A: Most features are identical. Some very new features may appear on direct APIs slightly before Vertex. Vertex may have additional compliance/security features.

Q: Can I use temporary credentials or workload identity?

A: Currently, Bastio requires a Service Account JSON key. Workload identity federation is not yet supported.

Q: Which models support vision/images?

A: Gemini models, and Llama 4 Maverick, Llama 4 Scout, and Llama 3.2 90B Vision all support image inputs.

When to Use Vertex AI

Choose Vertex AI if you:

  • Already have Google Cloud infrastructure
  • Need GCP compliance certifications (SOC 2, HIPAA, FedRAMP)
  • Want consolidated billing through GCP
  • Need VPC Service Controls or private endpoints
  • Want to access multiple AI providers with one credential
  • Have Google Cloud enterprise agreements

Choose Direct Providers if you:

  • Want the simplest possible setup
  • Don't have a GCP account
  • Need the absolute latest model features immediately
  • Prefer direct vendor relationships
  • Have existing provider API keys

Additional Resources


Need help? Contact support@bastio.ai or visit our support page.