Google Vertex AI

Access Google Gemini and partner models (Claude, Mistral, Llama) through a single GCP credential with Bastio's full security protection.

Overview

Google Vertex AI provides a unified gateway to multiple AI providers through Google Cloud's infrastructure. With Bastio, you can:

One credential, four vendors - Access Google, Anthropic, Mistral, and Meta models with a single GCP service account
Enterprise-grade security - VPC-SC, CMEK encryption, audit logging, compliance certifications
Unified billing - All usage consolidated in your Google Cloud bill
Full security coverage - All Bastio security features work across all providers
Regional control - Deploy in specific regions for data residency requirements

Why Vertex AI?

Unlike direct provider integrations, Vertex AI offers unique advantages:

Feature	Direct Providers	Vertex AI
Credentials needed	4 separate API keys	1 GCP service account
Billing	4 separate invoices	1 GCP invoice
Compliance	Varies by provider	GCP certifications (SOC, HIPAA, FedRAMP)
Network security	Public internet	VPC-SC, private endpoints
Model access	Individual agreements	Model Garden marketplace

Supported Models

Google Gemini Models

Native Google models available directly through Vertex AI:

Gemini 3.x (Preview)

Model	Context	Max Output	Input Price	Output Price
`gemini-3-pro-preview`	1M tokens	64K tokens	$2.00/1M	$12.00/1M

Gemini 2.5 (GA)

Model	Context	Max Output	Input Price	Output Price
`gemini-2.5-pro`	1M tokens	64K tokens	$1.25/1M	$10.00/1M
`gemini-2.5-flash`	1M tokens	64K tokens	$0.30/1M	$2.50/1M
`gemini-2.5-flash-lite`	1M tokens	64K tokens	$0.10/1M	$0.40/1M

Gemini 2.0 (GA)

Model	Context	Max Output	Input Price	Output Price
`gemini-2.0-flash`	1M tokens	8K tokens	$0.10/1M	$0.40/1M
`gemini-2.0-flash-lite`	1M tokens	8K tokens	$0.075/1M	$0.30/1M

Gemini 1.5 (Legacy)

Model	Context	Max Output	Input Price	Output Price
`gemini-1.5-pro`	2M tokens	8K tokens	$1.25/1M	$5.00/1M
`gemini-1.5-flash`	1M tokens	8K tokens	$0.075/1M	$0.30/1M
`gemini-1.5-flash-8b`	1M tokens	8K tokens	$0.0375/1M	$0.15/1M

Anthropic Claude Models (via Model Garden)

Model	Context	Max Output	Input Price	Output Price
`claude-opus-4-5`	200K tokens	32K tokens	$15.00/1M	$75.00/1M
`claude-sonnet-4-5`	200K tokens	32K tokens	$3.00/1M	$15.00/1M
`claude-haiku-3-5`	200K tokens	8K tokens	$0.80/1M	$4.00/1M
`claude-3-opus`	200K tokens	4K tokens	$15.00/1M	$75.00/1M
`claude-3-sonnet`	200K tokens	4K tokens	$3.00/1M	$15.00/1M

Mistral AI Models (via Model Garden)

Model	Context	Max Output	Input Price	Output Price
`mistral-large`	128K tokens	8K tokens	$2.00/1M	$6.00/1M
`mistral-small`	128K tokens	8K tokens	$0.20/1M	$0.60/1M
`codestral`	32K tokens	8K tokens	$0.20/1M	$0.60/1M
`mistral-nemo`	128K tokens	8K tokens	$0.15/1M	$0.15/1M

Meta Llama Models (via Model Garden)

Model	Context	Max Output	Input Price	Output Price	Vision
`llama-4-maverick`	128K tokens	8K tokens	$0.40/1M	$1.20/1M	Yes
`llama-4-scout`	128K tokens	8K tokens	$0.20/1M	$0.60/1M	Yes
`llama-3.3-70b`	128K tokens	8K tokens	$0.20/1M	$0.20/1M	No
`llama-3.2-90b-vision`	128K tokens	8K tokens	$0.30/1M	$0.90/1M	Yes

Quick Start

Prerequisites

Google Cloud Platform (GCP) account
GCP project with billing enabled
Vertex AI API enabled
Service Account with Vertex AI User role

Step 1: Enable Vertex AI API

Go to the Google Cloud Console
Select your project (or create a new one)
Navigate to Vertex AI > Dashboard
Click Enable All Recommended APIs

Step 2: Create a Service Account

Go to IAM & Admin > Service Accounts
Click Create Service Account
Name it (e.g., bastio-vertex-ai)
Grant it the Vertex AI User role (roles/aiplatform.user)
Click Done

Step 3: Generate a JSON Key

Click on the newly created service account
Go to the Keys tab
Click Add Key > Create new key
Select JSON and click Create
Save the downloaded file securely

Step 4: Configure in Bastio

Go to Dashboard > Proxies > Create New Proxy
Select Google Vertex AI as provider
Paste the contents of your JSON key file
Click Create Proxy

BYOK Mode (Bring Your Own Key)

Use your own GCP credentials with Bastio.

Via Dashboard

Go to Dashboard > Proxies > Create New Proxy
Select Google Vertex AI as provider
Choose Your API Keys (BYOK) mode
Enter your GCP Service Account JSON:

{
  "type": "service_account",
  "project_id": "your-project-id",
  "private_key_id": "key-id...",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "bastio-vertex-ai@your-project.iam.gserviceaccount.com",
  "client_id": "123456789",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/..."
}

Click Create Proxy

Via API

# Create Vertex AI proxy
curl -X POST https://api.bastio.com/proxy \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production Vertex AI",
    "provider": "vertex",
    "llm_mode": "byok",
    "model_behavior": "passthrough"
  }'

# Add GCP credentials
curl -X POST https://api.bastio.com/keys/provider \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "vertex",
    "key_name": "GCP Production",
    "api_key": "{\"type\":\"service_account\",\"project_id\":\"...\",\"private_key\":\"...\"}"
  }'

Partner Model Setup Guides

Partner models (Claude, Mistral, Llama) require additional enablement in your GCP project through Model Garden.

Enabling Anthropic Claude

Step 1: Access Model Garden

Go to Vertex AI Model Garden
Search for "Claude"
Click on Claude 3.5 Sonnet (or your preferred model)

Step 2: Enable the Model

Click Enable on the model card
Review and accept Anthropic's terms and conditions
Wait for provisioning (usually instant)

Step 3: Verify Enablement

Test with a direct API call:

# Get access token
ACCESS_TOKEN=$(gcloud auth print-access-token)

# Test Claude endpoint
curl -X POST \
  "https://us-central1-aiplatform.googleapis.com/v1/projects/YOUR_PROJECT/locations/us-central1/publishers/anthropic/models/claude-3-5-sonnet:rawPredict" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "anthropic_version": "vertex-2023-10-16",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

Common Claude Issues

"Model not found" error

Verify you enabled the model in Model Garden
Check the region supports Claude (use us-central1)
Ensure you accepted Anthropic's terms

"Permission denied" error

Service account needs roles/aiplatform.user
The model must be enabled in your project

Enabling Mistral AI

Step 1: Access Model Garden

Go to Vertex AI Model Garden
Search for "Mistral"
Click on Mistral Large (or your preferred model)

Step 2: Enable the Model

Click Enable on the model card
Accept Mistral's terms and conditions
Wait for provisioning

Step 3: Verify Enablement

ACCESS_TOKEN=$(gcloud auth print-access-token)

curl -X POST \
  "https://us-central1-aiplatform.googleapis.com/v1/projects/YOUR_PROJECT/locations/us-central1/publishers/mistralai/models/mistral-large:rawPredict" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-large",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

Common Mistral Issues

"Endpoint not found" error

Use correct publisher: mistralai (not mistral)
Check region availability

Enabling Meta Llama

Step 1: Access Model Garden

Go to Vertex AI Model Garden
Search for "Llama"
Click on Llama 4 Maverick (or your preferred model)

Step 2: Accept Meta License

Click Enable on the model card
Important: Review and accept Meta's license agreement
This is a separate agreement from Google's terms
Wait for provisioning

Step 3: Verify Enablement

ACCESS_TOKEN=$(gcloud auth print-access-token)

curl -X POST \
  "https://us-central1-aiplatform.googleapis.com/v1/projects/YOUR_PROJECT/locations/us-central1/publishers/meta/models/llama-4-maverick:rawPredict" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-maverick",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

Vision Models Note

For Llama vision models (llama-4-maverick, llama-4-scout, llama-3.2-90b-vision), you can include images:

{
  "model": "llama-4-maverick",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "What's in this image?"},
      {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
    ]
  }]
}

Making Requests

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.bastio.com/v1/guard/{PROXY_ID}/v1",
    api_key="your-bastio-api-key"
)

# Using Gemini
response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ]
)

print(response.choices[0].message.content)

JavaScript Example

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.bastio.com/v1/guard/{PROXY_ID}/v1',
  apiKey: process.env.BASTIO_API_KEY,
});

// Using Claude via Vertex
const response = await client.chat.completions.create({
  model: 'claude-sonnet-4-5',
  messages: [
    { role: 'user', content: 'Write a haiku about AI' }
  ],
});

console.log(response.choices[0].message.content);

Streaming Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.bastio.com/v1/guard/{PROXY_ID}/v1",
    api_key="your-bastio-api-key"
)

# Streaming with Mistral
stream = client.chat.completions.create(
    model="mistral-large",
    messages=[{"role": "user", "content": "Write a short story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Using Different Providers

Same proxy, different models - Bastio routes automatically:

# Google Gemini
client.chat.completions.create(model="gemini-2.5-flash", ...)

# Anthropic Claude
client.chat.completions.create(model="claude-sonnet-4-5", ...)

# Mistral AI
client.chat.completions.create(model="mistral-large", ...)

# Meta Llama
client.chat.completions.create(model="llama-4-maverick", ...)

Model Routing

Bastio automatically routes requests to the correct Vertex AI endpoint based on the model name:

Gemini models (gemini-*): Routed to Google's native Vertex AI endpoint
Claude models (claude-*): Routed to Anthropic's Model Garden endpoint
Mistral models (mistral-*, codestral): Routed to Mistral's Model Garden endpoint
Llama models (llama-*): Routed to Meta's Model Garden endpoint

No configuration needed - just specify the model name and Bastio handles the rest.

Credential Format

Service Account JSON Structure

Your GCP Service Account JSON file contains these fields:

{
  "type": "service_account",
  "project_id": "your-project-id",
  "private_key_id": "abc123...",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "service-account@project.iam.gserviceaccount.com",
  "client_id": "123456789012345678901",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/..."
}

Required Fields

type: Must be "service_account"
project_id: Your GCP project ID
private_key: The private key for authentication
client_email: Service account email

Supported Regions

Vertex AI is available in multiple regions. Common regions with full model support:

Region	Location	Gemini	Claude	Mistral	Llama
`us-central1`	Iowa	Yes	Yes	Yes	Yes
`us-east4`	Virginia	Yes	Yes	Yes	Yes
`europe-west1`	Belgium	Yes	Yes	Yes	Yes
`europe-west4`	Netherlands	Yes	Yes	Yes	Yes
`asia-northeast1`	Tokyo	Yes	Limited	Limited	Limited

Recommendation: Use us-central1 for best model availability.

Pricing & Cost Tracking

Pricing Comparison

Vertex AI pricing is generally identical to direct provider pricing:

Provider	Model	Direct	Vertex AI
Google	Gemini 2.5 Flash	N/A	$0.30/$2.50
Anthropic	Claude Sonnet 4.5	$3/$15	$3/$15
Mistral	Mistral Large	$2/$6	$2/$6
Meta	Llama 4 Maverick	$0.40/$1.20	$0.40/$1.20

Prices per 1M tokens (input/output)

Cost Tracking Features

Bastio automatically tracks costs across all providers:

Dashboard - Real-time spending across all models
Analytics - Historical cost analysis by model, user, time
Billing - Detailed breakdowns by provider and model
Alerts - Set spending limits and notifications

Troubleshooting

Permission Denied

Error: PERMISSION_DENIED or Access denied

Solutions:

Verify Service Account has roles/aiplatform.user role
For partner models, check they're enabled in Model Garden
Ensure billing is enabled on your GCP project
Test credentials with gcloud auth activate-service-account --key-file=key.json

Quota Exceeded

Error: RESOURCE_EXHAUSTED or Quota exceeded

Solutions:

Go to IAM & Admin > Quotas in GCP Console
Filter by "Vertex AI"
Request quota increases for specific models
Consider using different regions for better availability

Partner Model Not Found

Error: Model not found or Endpoint not available

Solutions:

Verify model is enabled in Model Garden
Check you've accepted the provider's terms/license
Confirm the region supports the model
Use exact model name (e.g., claude-sonnet-4-5, not claude-4.5)

Service Account Issues

Error: Invalid credentials or Could not parse service account

Solutions:

Verify JSON is valid (no trailing commas, proper escaping)
Check the private key hasn't been truncated
Ensure type field is "service_account"
Generate a new key if the current one is corrupted

Region Availability

Error: Region not supported or endpoint errors

Solutions:

Use us-central1 for best compatibility
Check Vertex AI regions
Partner models may have limited regional availability

Frequently Asked Questions

Q: Can I use both Vertex AI and direct provider APIs?

A: Yes! Create separate proxies for each. For example, have one Vertex AI proxy for GCP-compliant workloads and a direct Anthropic proxy for other use cases.

Q: Does streaming work for all partner models?

A: Yes, streaming is fully supported for Gemini, Claude, Mistral, and Llama models through Vertex AI.

Q: How do I switch from direct Anthropic to Vertex Claude?

A: Create a new Vertex AI proxy, enable Claude in Model Garden, and update your app's proxy ID. Test in staging before switching production.

Q: What if my Service Account key expires?

A: Service Account keys don't expire by default. However, if you set a key expiration or delete the key, you'll need to generate a new one and update your Bastio credentials.

Q: Are there any feature differences between direct and Vertex?

A: Most features are identical. Some very new features may appear on direct APIs slightly before Vertex. Vertex may have additional compliance/security features.

Q: Can I use temporary credentials or workload identity?

A: Currently, Bastio requires a Service Account JSON key. Workload identity federation is not yet supported.

Q: Which models support vision/images?

A: Gemini models, and Llama 4 Maverick, Llama 4 Scout, and Llama 3.2 90B Vision all support image inputs.

When to Use Vertex AI

Choose Vertex AI if you:

Already have Google Cloud infrastructure
Need GCP compliance certifications (SOC 2, HIPAA, FedRAMP)
Want consolidated billing through GCP
Need VPC Service Controls or private endpoints
Want to access multiple AI providers with one credential
Have Google Cloud enterprise agreements

Choose Direct Providers if you:

Want the simplest possible setup
Don't have a GCP account
Need the absolute latest model features immediately
Prefer direct vendor relationships
Have existing provider API keys

Additional Resources

Need help? Contact hello@bastio.com or visit our support page.

Google Vertex AI Integration