Google Vertex AI Integration
Complete guide to using Google Vertex AI with Bastio.
Google Vertex AI
Access Google Gemini and partner models (Claude, Mistral, Llama) through a single GCP credential with Bastio's full security protection.
Overview
Google Vertex AI provides a unified gateway to multiple AI providers through Google Cloud's infrastructure. With Bastio, you can:
- One credential, four vendors - Access Google, Anthropic, Mistral, and Meta models with a single GCP service account
- Enterprise-grade security - VPC-SC, CMEK encryption, audit logging, compliance certifications
- Unified billing - All usage consolidated in your Google Cloud bill
- Full security coverage - All Bastio security features work across all providers
- Regional control - Deploy in specific regions for data residency requirements
Why Vertex AI?
Unlike direct provider integrations, Vertex AI offers unique advantages:
| Feature | Direct Providers | Vertex AI |
|---|---|---|
| Credentials needed | 4 separate API keys | 1 GCP service account |
| Billing | 4 separate invoices | 1 GCP invoice |
| Compliance | Varies by provider | GCP certifications (SOC, HIPAA, FedRAMP) |
| Network security | Public internet | VPC-SC, private endpoints |
| Model access | Individual agreements | Model Garden marketplace |
Supported Models
Google Gemini Models
Native Google models available directly through Vertex AI:
Gemini 3.x (Preview)
| Model | Context | Max Output | Input Price | Output Price |
|---|---|---|---|---|
gemini-3-pro-preview | 1M tokens | 64K tokens | $2.00/1M | $12.00/1M |
Gemini 2.5 (GA)
| Model | Context | Max Output | Input Price | Output Price |
|---|---|---|---|---|
gemini-2.5-pro | 1M tokens | 64K tokens | $1.25/1M | $10.00/1M |
gemini-2.5-flash | 1M tokens | 64K tokens | $0.30/1M | $2.50/1M |
gemini-2.5-flash-lite | 1M tokens | 64K tokens | $0.10/1M | $0.40/1M |
Gemini 2.0 (GA)
| Model | Context | Max Output | Input Price | Output Price |
|---|---|---|---|---|
gemini-2.0-flash | 1M tokens | 8K tokens | $0.10/1M | $0.40/1M |
gemini-2.0-flash-lite | 1M tokens | 8K tokens | $0.075/1M | $0.30/1M |
Gemini 1.5 (Legacy)
| Model | Context | Max Output | Input Price | Output Price |
|---|---|---|---|---|
gemini-1.5-pro | 2M tokens | 8K tokens | $1.25/1M | $5.00/1M |
gemini-1.5-flash | 1M tokens | 8K tokens | $0.075/1M | $0.30/1M |
gemini-1.5-flash-8b | 1M tokens | 8K tokens | $0.0375/1M | $0.15/1M |
Anthropic Claude Models (via Model Garden)
| Model | Context | Max Output | Input Price | Output Price |
|---|---|---|---|---|
claude-opus-4-5 | 200K tokens | 32K tokens | $15.00/1M | $75.00/1M |
claude-sonnet-4-5 | 200K tokens | 32K tokens | $3.00/1M | $15.00/1M |
claude-haiku-3-5 | 200K tokens | 8K tokens | $0.80/1M | $4.00/1M |
claude-3-opus | 200K tokens | 4K tokens | $15.00/1M | $75.00/1M |
claude-3-sonnet | 200K tokens | 4K tokens | $3.00/1M | $15.00/1M |
Mistral AI Models (via Model Garden)
| Model | Context | Max Output | Input Price | Output Price |
|---|---|---|---|---|
mistral-large | 128K tokens | 8K tokens | $2.00/1M | $6.00/1M |
mistral-small | 128K tokens | 8K tokens | $0.20/1M | $0.60/1M |
codestral | 32K tokens | 8K tokens | $0.20/1M | $0.60/1M |
mistral-nemo | 128K tokens | 8K tokens | $0.15/1M | $0.15/1M |
Meta Llama Models (via Model Garden)
| Model | Context | Max Output | Input Price | Output Price | Vision |
|---|---|---|---|---|---|
llama-4-maverick | 128K tokens | 8K tokens | $0.40/1M | $1.20/1M | Yes |
llama-4-scout | 128K tokens | 8K tokens | $0.20/1M | $0.60/1M | Yes |
llama-3.3-70b | 128K tokens | 8K tokens | $0.20/1M | $0.20/1M | No |
llama-3.2-90b-vision | 128K tokens | 8K tokens | $0.30/1M | $0.90/1M | Yes |
Quick Start
Prerequisites
- Google Cloud Platform (GCP) account
- GCP project with billing enabled
- Vertex AI API enabled
- Service Account with Vertex AI User role
Step 1: Enable Vertex AI API
- Go to the Google Cloud Console
- Select your project (or create a new one)
- Navigate to Vertex AI > Dashboard
- Click Enable All Recommended APIs
Step 2: Create a Service Account
- Go to IAM & Admin > Service Accounts
- Click Create Service Account
- Name it (e.g.,
bastio-vertex-ai) - Grant it the Vertex AI User role (
roles/aiplatform.user) - Click Done
Step 3: Generate a JSON Key
- Click on the newly created service account
- Go to the Keys tab
- Click Add Key > Create new key
- Select JSON and click Create
- Save the downloaded file securely
Step 4: Configure in Bastio
- Go to Dashboard > Proxies > Create New Proxy
- Select Google Vertex AI as provider
- Paste the contents of your JSON key file
- Click Create Proxy
BYOK Mode (Bring Your Own Key)
Use your own GCP credentials with Bastio.
Via Dashboard
- Go to Dashboard > Proxies > Create New Proxy
- Select Google Vertex AI as provider
- Choose Your API Keys (BYOK) mode
- Enter your GCP Service Account JSON:
{
"type": "service_account",
"project_id": "your-project-id",
"private_key_id": "key-id...",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "bastio-vertex-ai@your-project.iam.gserviceaccount.com",
"client_id": "123456789",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/..."
}- Click Create Proxy
Via API
# Create Vertex AI proxy
curl -X POST https://api.bastio.ai/proxy \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Production Vertex AI",
"provider": "vertex",
"llm_mode": "byok",
"model_behavior": "passthrough"
}'
# Add GCP credentials
curl -X POST https://api.bastio.ai/keys/provider \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"provider": "vertex",
"key_name": "GCP Production",
"api_key": "{\"type\":\"service_account\",\"project_id\":\"...\",\"private_key\":\"...\"}"
}'Partner Model Setup Guides
Partner models (Claude, Mistral, Llama) require additional enablement in your GCP project through Model Garden.
Enabling Anthropic Claude
Step 1: Access Model Garden
- Go to Vertex AI Model Garden
- Search for "Claude"
- Click on Claude 3.5 Sonnet (or your preferred model)
Step 2: Enable the Model
- Click Enable on the model card
- Review and accept Anthropic's terms and conditions
- Wait for provisioning (usually instant)
Step 3: Verify Enablement
Test with a direct API call:
# Get access token
ACCESS_TOKEN=$(gcloud auth print-access-token)
# Test Claude endpoint
curl -X POST \
"https://us-central1-aiplatform.googleapis.com/v1/projects/YOUR_PROJECT/locations/us-central1/publishers/anthropic/models/claude-3-5-sonnet:rawPredict" \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"anthropic_version": "vertex-2023-10-16",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 100
}'Common Claude Issues
"Model not found" error
- Verify you enabled the model in Model Garden
- Check the region supports Claude (use
us-central1) - Ensure you accepted Anthropic's terms
"Permission denied" error
- Service account needs
roles/aiplatform.user - The model must be enabled in your project
Enabling Mistral AI
Step 1: Access Model Garden
- Go to Vertex AI Model Garden
- Search for "Mistral"
- Click on Mistral Large (or your preferred model)
Step 2: Enable the Model
- Click Enable on the model card
- Accept Mistral's terms and conditions
- Wait for provisioning
Step 3: Verify Enablement
ACCESS_TOKEN=$(gcloud auth print-access-token)
curl -X POST \
"https://us-central1-aiplatform.googleapis.com/v1/projects/YOUR_PROJECT/locations/us-central1/publishers/mistralai/models/mistral-large:rawPredict" \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "mistral-large",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 100
}'Common Mistral Issues
"Endpoint not found" error
- Use correct publisher:
mistralai(notmistral) - Check region availability
Enabling Meta Llama
Step 1: Access Model Garden
- Go to Vertex AI Model Garden
- Search for "Llama"
- Click on Llama 4 Maverick (or your preferred model)
Step 2: Accept Meta License
- Click Enable on the model card
- Important: Review and accept Meta's license agreement
- This is a separate agreement from Google's terms
- Wait for provisioning
Step 3: Verify Enablement
ACCESS_TOKEN=$(gcloud auth print-access-token)
curl -X POST \
"https://us-central1-aiplatform.googleapis.com/v1/projects/YOUR_PROJECT/locations/us-central1/publishers/meta/models/llama-4-maverick:rawPredict" \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-maverick",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 100
}'Vision Models Note
For Llama vision models (llama-4-maverick, llama-4-scout, llama-3.2-90b-vision), you can include images:
{
"model": "llama-4-maverick",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
]
}]
}Making Requests
Python Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.bastio.ai/v1/guard/{PROXY_ID}/v1",
api_key="your-bastio-api-key"
)
# Using Gemini
response = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[
{"role": "user", "content": "Explain quantum computing"}
]
)
print(response.choices[0].message.content)JavaScript Example
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.bastio.ai/v1/guard/{PROXY_ID}/v1',
apiKey: process.env.BASTIO_API_KEY,
});
// Using Claude via Vertex
const response = await client.chat.completions.create({
model: 'claude-sonnet-4-5',
messages: [
{ role: 'user', content: 'Write a haiku about AI' }
],
});
console.log(response.choices[0].message.content);Streaming Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.bastio.ai/v1/guard/{PROXY_ID}/v1",
api_key="your-bastio-api-key"
)
# Streaming with Mistral
stream = client.chat.completions.create(
model="mistral-large",
messages=[{"role": "user", "content": "Write a short story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)Using Different Providers
Same proxy, different models - Bastio routes automatically:
# Google Gemini
client.chat.completions.create(model="gemini-2.5-flash", ...)
# Anthropic Claude
client.chat.completions.create(model="claude-sonnet-4-5", ...)
# Mistral AI
client.chat.completions.create(model="mistral-large", ...)
# Meta Llama
client.chat.completions.create(model="llama-4-maverick", ...)Model Routing
Bastio automatically routes requests to the correct Vertex AI endpoint based on the model name:
- Gemini models (
gemini-*): Routed to Google's native Vertex AI endpoint - Claude models (
claude-*): Routed to Anthropic's Model Garden endpoint - Mistral models (
mistral-*,codestral): Routed to Mistral's Model Garden endpoint - Llama models (
llama-*): Routed to Meta's Model Garden endpoint
No configuration needed - just specify the model name and Bastio handles the rest.
Credential Format
Service Account JSON Structure
Your GCP Service Account JSON file contains these fields:
{
"type": "service_account",
"project_id": "your-project-id",
"private_key_id": "abc123...",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "service-account@project.iam.gserviceaccount.com",
"client_id": "123456789012345678901",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/..."
}Required Fields
type: Must be"service_account"project_id: Your GCP project IDprivate_key: The private key for authenticationclient_email: Service account email
Supported Regions
Vertex AI is available in multiple regions. Common regions with full model support:
| Region | Location | Gemini | Claude | Mistral | Llama |
|---|---|---|---|---|---|
us-central1 | Iowa | Yes | Yes | Yes | Yes |
us-east4 | Virginia | Yes | Yes | Yes | Yes |
europe-west1 | Belgium | Yes | Yes | Yes | Yes |
europe-west4 | Netherlands | Yes | Yes | Yes | Yes |
asia-northeast1 | Tokyo | Yes | Limited | Limited | Limited |
Recommendation: Use us-central1 for best model availability.
Pricing & Cost Tracking
Pricing Comparison
Vertex AI pricing is generally identical to direct provider pricing:
| Provider | Model | Direct | Vertex AI |
|---|---|---|---|
| Gemini 2.5 Flash | N/A | $0.30/$2.50 | |
| Anthropic | Claude Sonnet 4.5 | $3/$15 | $3/$15 |
| Mistral | Mistral Large | $2/$6 | $2/$6 |
| Meta | Llama 4 Maverick | $0.40/$1.20 | $0.40/$1.20 |
Prices per 1M tokens (input/output)
Cost Tracking Features
Bastio automatically tracks costs across all providers:
- Dashboard - Real-time spending across all models
- Analytics - Historical cost analysis by model, user, time
- Billing - Detailed breakdowns by provider and model
- Alerts - Set spending limits and notifications
Troubleshooting
Permission Denied
Error: PERMISSION_DENIED or Access denied
Solutions:
- Verify Service Account has
roles/aiplatform.userrole - For partner models, check they're enabled in Model Garden
- Ensure billing is enabled on your GCP project
- Test credentials with
gcloud auth activate-service-account --key-file=key.json
Quota Exceeded
Error: RESOURCE_EXHAUSTED or Quota exceeded
Solutions:
- Go to IAM & Admin > Quotas in GCP Console
- Filter by "Vertex AI"
- Request quota increases for specific models
- Consider using different regions for better availability
Partner Model Not Found
Error: Model not found or Endpoint not available
Solutions:
- Verify model is enabled in Model Garden
- Check you've accepted the provider's terms/license
- Confirm the region supports the model
- Use exact model name (e.g.,
claude-sonnet-4-5, notclaude-4.5)
Service Account Issues
Error: Invalid credentials or Could not parse service account
Solutions:
- Verify JSON is valid (no trailing commas, proper escaping)
- Check the private key hasn't been truncated
- Ensure
typefield is"service_account" - Generate a new key if the current one is corrupted
Region Availability
Error: Region not supported or endpoint errors
Solutions:
- Use
us-central1for best compatibility - Check Vertex AI regions
- Partner models may have limited regional availability
Frequently Asked Questions
Q: Can I use both Vertex AI and direct provider APIs?
A: Yes! Create separate proxies for each. For example, have one Vertex AI proxy for GCP-compliant workloads and a direct Anthropic proxy for other use cases.
Q: Does streaming work for all partner models?
A: Yes, streaming is fully supported for Gemini, Claude, Mistral, and Llama models through Vertex AI.
Q: How do I switch from direct Anthropic to Vertex Claude?
A: Create a new Vertex AI proxy, enable Claude in Model Garden, and update your app's proxy ID. Test in staging before switching production.
Q: What if my Service Account key expires?
A: Service Account keys don't expire by default. However, if you set a key expiration or delete the key, you'll need to generate a new one and update your Bastio credentials.
Q: Are there any feature differences between direct and Vertex?
A: Most features are identical. Some very new features may appear on direct APIs slightly before Vertex. Vertex may have additional compliance/security features.
Q: Can I use temporary credentials or workload identity?
A: Currently, Bastio requires a Service Account JSON key. Workload identity federation is not yet supported.
Q: Which models support vision/images?
A: Gemini models, and Llama 4 Maverick, Llama 4 Scout, and Llama 3.2 90B Vision all support image inputs.
When to Use Vertex AI
Choose Vertex AI if you:
- Already have Google Cloud infrastructure
- Need GCP compliance certifications (SOC 2, HIPAA, FedRAMP)
- Want consolidated billing through GCP
- Need VPC Service Controls or private endpoints
- Want to access multiple AI providers with one credential
- Have Google Cloud enterprise agreements
Choose Direct Providers if you:
- Want the simplest possible setup
- Don't have a GCP account
- Need the absolute latest model features immediately
- Prefer direct vendor relationships
- Have existing provider API keys
Additional Resources
- Google Vertex AI Documentation
- Vertex AI Model Garden
- Claude on Vertex AI
- Mistral on Vertex AI
- Llama on Vertex AI
- GCP IAM Best Practices
- Bastio Support
Need help? Contact support@bastio.ai or visit our support page.