Complete guide to using Azure AI Foundry with Bastio for OpenAI, Llama, Mistral, DeepSeek, and Microsoft models.

Azure AI Foundry

Access OpenAI, Meta Llama, Mistral AI, DeepSeek, and Microsoft models through a single Azure credential with Bastio's full security protection.

Overview

Azure AI Foundry (formerly Azure AI) provides a unified gateway to multiple AI providers through Microsoft Azure's infrastructure. With Bastio, you can:

One credential, five vendors - Access OpenAI, Meta, Mistral, DeepSeek, and Microsoft models with a single Azure API key
Enterprise-grade security - Azure compliance certifications, VNet integration, private endpoints
Unified billing - All usage consolidated in your Azure subscription
Full security coverage - All Bastio security features work across all providers
OpenAI-compatible API - Same API format for all models, no code changes needed

Why Azure AI Foundry?

Unlike direct provider integrations, Azure AI Foundry offers unique advantages:

Feature	Direct Providers	Azure AI Foundry
Credentials needed	5 separate API keys	1 Azure API key
Billing	5 separate invoices	1 Azure invoice
Compliance	Varies by provider	Azure certifications (SOC, HIPAA, ISO)
Network security	Public internet	VNet, private endpoints
API format	Different per provider	Unified OpenAI-compatible
Model access	Individual agreements	Model Catalog marketplace

Supported Models

OpenAI Models (via Azure OpenAI)

Native OpenAI models available through Azure OpenAI Service:

GPT-4o Family

Model	Context	Max Output	Input Price	Output Price	Vision	Tools
`gpt-4o`	128K tokens	16K tokens	$2.50/1M	$10.00/1M	Yes	Yes
`gpt-4o-2024-11-20`	128K tokens	16K tokens	$2.50/1M	$10.00/1M	Yes	Yes
`gpt-4o-mini`	128K tokens	16K tokens	$0.15/1M	$0.60/1M	Yes	Yes

GPT-4 Turbo

Model	Context	Max Output	Input Price	Output Price	Vision	Tools
`gpt-4-turbo`	128K tokens	4K tokens	$10.00/1M	$30.00/1M	Yes	Yes

GPT-4 Base

Model	Context	Max Output	Input Price	Output Price	Vision	Tools
`gpt-4`	8K tokens	8K tokens	$30.00/1M	$60.00/1M	No	Yes
`gpt-4-32k`	32K tokens	32K tokens	$60.00/1M	$120.00/1M	No	Yes

GPT-3.5 Turbo

Model	Context	Max Output	Input Price	Output Price
`gpt-35-turbo`	16K tokens	4K tokens	$0.50/1M	$1.50/1M
`gpt-35-turbo-16k`	16K tokens	16K tokens	$3.00/1M	$4.00/1M

o1 Reasoning Models

Model	Context	Max Output	Input Price	Output Price
`o1-preview`	128K tokens	32K tokens	$15.00/1M	$60.00/1M
`o1-mini`	128K tokens	65K tokens	$3.00/1M	$12.00/1M

Meta Llama Models (via Model Catalog)

Model	Context	Max Output	Input Price	Output Price	Vision	Tools
`Meta-Llama-3.1-405B-Instruct`	128K tokens	4K tokens	$5.33/1M	$16.00/1M	No	Yes
`Meta-Llama-3.1-70B-Instruct`	128K tokens	4K tokens	$2.68/1M	$3.54/1M	No	Yes
`Meta-Llama-3.1-8B-Instruct`	128K tokens	4K tokens	$0.30/1M	$0.61/1M	No	Yes
`Llama-3.2-90B-Vision-Instruct`	128K tokens	4K tokens	$2.00/1M	$2.00/1M	Yes	No
`Llama-3.2-11B-Vision-Instruct`	128K tokens	4K tokens	$0.37/1M	$0.37/1M	Yes	No

Mistral AI Models (via Model Catalog)

Model	Context	Max Output	Input Price	Output Price	Tools
`Mistral-Large-2407`	128K tokens	8K tokens	$2.00/1M	$6.00/1M	Yes
`Mistral-Small`	128K tokens	8K tokens	$1.00/1M	$3.00/1M	Yes
`Codestral-2405`	32K tokens	8K tokens	$1.00/1M	$3.00/1M	No
`Mistral-Nemo`	128K tokens	8K tokens	$0.30/1M	$0.30/1M	Yes

DeepSeek Models (via Model Catalog)

Model	Context	Max Output	Input Price	Output Price
`DeepSeek-R1`	64K tokens	8K tokens	$0.55/1M	$2.19/1M
`DeepSeek-V3-0324`	64K tokens	8K tokens	$0.27/1M	$1.10/1M

Microsoft Models

Model	Context	Max Output	Input Price	Output Price
`Phi-4`	16K tokens	4K tokens	$0.125/1M	$0.50/1M
`Phi-3.5-mini-instruct`	128K tokens	4K tokens	$0.13/1M	$0.52/1M

Embedding Models

Model	Dimensions	Input Price
`text-embedding-3-large`	3072	$0.13/1M
`text-embedding-3-small`	1536	$0.02/1M
`text-embedding-ada-002`	1536	$0.10/1M

Quick Start

Prerequisites

Azure account with active subscription
Azure AI Hub or Azure OpenAI resource
Models deployed in your project
API key from Azure Portal

Step 1: Create Azure AI Resource

Go to the Azure Portal
Click Create a resource > AI + Machine Learning
Choose Azure AI Hub (for all models) or Azure OpenAI (for OpenAI models only)
Select your subscription and resource group
Choose a region (e.g., East US, West Europe)
Click Review + create > Create

Step 2: Deploy Models

For Azure OpenAI Models:

Go to your Azure OpenAI resource
Click Model deployments > Create new deployment
Select the model (e.g., gpt-4o)
Name your deployment (e.g., gpt-4o-production)
Set tokens-per-minute quota
Click Create

For Model Catalog Models (Llama, Mistral, DeepSeek):

Go to Azure AI Studio
Navigate to Model Catalog
Search for your model (e.g., Llama-3.1-70B)
Click Deploy > Serverless API
Accept terms and conditions
Click Deploy

Step 3: Get API Credentials

Go to your Azure AI resource in the Portal
Navigate to Keys and Endpoint
Copy KEY 1 or KEY 2
Copy the Endpoint URL
Note your resource name (the first part of the endpoint URL)

Step 4: Configure in Bastio

Go to Dashboard > Proxies > Create New Proxy
Select Azure AI Foundry as provider
Enter your credentials as JSON (see Credential Format below)
Click Create Proxy

BYOK Mode (Bring Your Own Key)

Use your own Azure credentials with Bastio.

Via API

# Create Azure AI Foundry proxy
curl -X POST https://api.bastio.com/proxy \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production Azure AI",
    "provider": "azure",
    "llm_mode": "byok",
    "model_behavior": "passthrough"
  }'

# Add Azure credentials
curl -X POST https://api.bastio.com/keys/provider \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "azure",
    "key_name": "Azure Production",
    "api_key": "{\"resource_name\":\"my-azure-ai\",\"api_key\":\"xxx\",\"api_version\":\"2024-10-21\",\"endpoint_type\":\"inference\"}"
  }'

Model Deployment Guides

Deploying OpenAI Models

Step 1: Access Azure OpenAI

Go to Azure Portal
Navigate to your Azure OpenAI resource
Click Go to Azure OpenAI Studio

Step 2: Create Deployment

Click Deployments > Create new deployment
Select model (e.g., gpt-4o)
Enter deployment name (e.g., gpt-4o)
Select model version
Set tokens-per-minute quota
Click Create

Step 3: Configure Deployment Mapping

If your deployment name differs from the model name, add a mapping:

{
  "resource_name": "my-azure-ai",
  "api_key": "xxx",
  "deployment_mappings": {
    "gpt-4o": "my-gpt4o-deployment"
  }
}

Deploying Meta Llama Models

Step 1: Access Model Catalog

Go to Azure AI Studio
Navigate to Model Catalog
Search for "Llama"

Step 2: Deploy Model

Click on your desired model (e.g., Meta-Llama-3.1-70B-Instruct)
Click Deploy > Serverless API
Review pricing information
Accept Meta's license agreement
Click Deploy

Step 3: Verify Deployment

Test with a direct API call:

curl -X POST \
  "https://my-azure-ai.services.ai.azure.com/models/chat/completions" \
  -H "api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Meta-Llama-3.1-70B-Instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

Deploying Mistral AI Models

Step 1: Access Model Catalog

Go to Azure AI Studio
Navigate to Model Catalog
Search for "Mistral"

Step 2: Deploy Model

Click on your desired model (e.g., Mistral-Large-2407)
Click Deploy > Serverless API
Review pricing information
Accept Mistral's terms
Click Deploy

Deploying DeepSeek Models

Step 1: Access Model Catalog

Go to Azure AI Studio
Navigate to Model Catalog
Search for "DeepSeek"

Step 2: Deploy Model

Click on your desired model (e.g., DeepSeek-R1)
Click Deploy > Serverless API
Review pricing information
Accept DeepSeek's terms
Click Deploy

Making Requests

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.bastio.com/v1/guard/{PROXY_ID}/v1",
    api_key="your-bastio-api-key"
)

# Using GPT-4o
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ]
)

print(response.choices[0].message.content)

JavaScript Example

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.bastio.com/v1/guard/{PROXY_ID}/v1',
  apiKey: process.env.BASTIO_API_KEY,
});

// Using Llama via Azure
const response = await client.chat.completions.create({
  model: 'Meta-Llama-3.1-70B-Instruct',
  messages: [
    { role: 'user', content: 'Write a haiku about AI' }
  ],
});

console.log(response.choices[0].message.content);

Streaming Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.bastio.com/v1/guard/{PROXY_ID}/v1",
    api_key="your-bastio-api-key"
)

# Streaming with Mistral
stream = client.chat.completions.create(
    model="Mistral-Large-2407",
    messages=[{"role": "user", "content": "Write a short story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Using Different Providers

Same proxy, different models - Bastio routes automatically:

# OpenAI GPT-4o
client.chat.completions.create(model="gpt-4o", ...)

# Meta Llama
client.chat.completions.create(model="Meta-Llama-3.1-70B-Instruct", ...)

# Mistral AI
client.chat.completions.create(model="Mistral-Large-2407", ...)

# DeepSeek
client.chat.completions.create(model="DeepSeek-R1", ...)

# Microsoft Phi
client.chat.completions.create(model="Phi-4", ...)

Model Routing

Bastio automatically routes requests based on the model name. All models use the unified Azure AI Inference API with OpenAI-compatible format.

Automatic Routing

OpenAI models (gpt-*, o1-*): Routed to Azure OpenAI endpoint
Llama models (Meta-Llama-*, Llama-*): Routed to Model Catalog endpoint
Mistral models (Mistral-*, Codestral-*): Routed to Model Catalog endpoint
DeepSeek models (DeepSeek-*): Routed to Model Catalog endpoint
Microsoft models (Phi-*): Routed to Model Catalog endpoint

Deployment Name Mappings

For Azure OpenAI deployments with custom names, use deployment_mappings:

{
  "resource_name": "my-azure-ai",
  "api_key": "xxx",
  "deployment_mappings": {
    "gpt-4o": "production-gpt4o",
    "gpt-4o-mini": "production-gpt4o-mini"
  }
}

When you request gpt-4o, Bastio will route to the production-gpt4o deployment.

Credential Format

Full Credential Structure

{
  "resource_name": "your-azure-resource",
  "api_key": "your-api-key",
  "api_version": "2024-10-21",
  "endpoint_type": "inference",
  "deployment_mappings": {
    "gpt-4o": "my-gpt4o-deployment",
    "gpt-4o-mini": "my-gpt4o-mini-deployment"
  }
}

Required Fields

resource_name: Your Azure AI resource name (the prefix of your endpoint URL)
api_key: API key from Azure Portal (Keys and Endpoint section)

Optional Fields

api_version: Azure API version (default: 2024-10-21)
endpoint_type: inference (unified Model Inference API) or openai (Azure OpenAI API)
deployment_mappings: Map model names to deployment names

Endpoint Types

Inference Endpoint (Recommended)

URL: https://{resource}.services.ai.azure.com/models/chat/completions
Works with all models (OpenAI + Model Catalog)
Model specified in request body
Simpler configuration

OpenAI Endpoint

URL: https://{resource}.openai.azure.com/openai/deployments/{deployment}/chat/completions
Per-deployment URLs
Requires deployment mappings
Compatible with existing Azure OpenAI customers

Pricing & Cost Tracking

Provider	Model	Direct	Azure
OpenAI	GPT-4o	$2.50/$10	$2.50/$10
OpenAI	GPT-4o Mini	$0.15/$0.60	$0.15/$0.60
Meta	Llama 3.1 70B	$2.68/$3.54	$2.68/$3.54
Mistral	Mistral Large	$2/$6	$2/$6
DeepSeek	DeepSeek R1	$0.55/$2.19	$0.55/$2.19

Cost Tracking Features

Bastio automatically tracks costs across all Azure AI models:

Dashboard - Real-time spending across all models
Analytics - Historical cost analysis by model, user, time
Billing - Detailed breakdowns by provider and model
Alerts - Set spending limits and notifications

Troubleshooting

Permission Denied

Error: AuthenticationError or Access denied

Solutions:

Verify API key is correct (copy fresh from Azure Portal)
Check API key hasn't been regenerated
Ensure the key has access to all required deployments
Test key directly with Azure endpoint

Quota Exceeded

Error: RateLimitError or Quota exceeded

Solutions:

Go to Quotas in Azure Portal
Request quota increase for the model
Consider deploying in additional regions
Use a different model temporarily

Model Not Found

Error: Model not found or Deployment not found

Solutions:

Verify model is deployed in Azure AI Studio
Check deployment name matches model name (or use deployment_mappings)
Confirm deployment is in the same resource as your API key
For Model Catalog models, ensure terms are accepted

Credential Issues

Error: Invalid credentials or Could not parse credentials

Solutions:

Verify JSON is valid (no trailing commas, proper quotes)
Check resource_name matches your Azure resource
Ensure api_key is a valid Azure API key
Test credentials with curl first

Region Availability

Error: Region not available or endpoint errors

Solutions:

Check model availability in your region
Model Catalog models may have limited regional availability
Consider East US or West Europe for best availability
OpenAI models generally have wider availability

Frequently Asked Questions

Q: Can I use both Azure AI and direct provider APIs?

A: Yes! Create separate proxies for each. For example, have an Azure AI proxy for enterprise workloads and direct OpenAI proxy for development.

Q: Does streaming work for all models?

A: Yes, streaming is fully supported for all Azure AI models including OpenAI, Llama, Mistral, DeepSeek, and Microsoft models.

Q: What's the difference between Azure OpenAI and Azure AI Foundry?

A: Azure OpenAI provides only OpenAI models. Azure AI Foundry (via Model Catalog) provides access to Llama, Mistral, DeepSeek, and Microsoft models in addition to OpenAI. Both can be accessed through a single Bastio proxy.

Q: Do I need separate deployments for each model?

A: For Azure OpenAI models (GPT-4o, etc.), yes - you need to deploy each model. For Model Catalog models (Llama, Mistral), the serverless API handles this automatically.

Q: How do deployment mappings work?

A: If your Azure OpenAI deployment name differs from the model name (e.g., deployment prod-gpt4 for model gpt-4o), add a mapping in your credentials. This tells Bastio which deployment to use for each model name.

Q: Can I use Azure's content filtering?

A: Yes, Azure's built-in content filtering applies on top of Bastio's security features, giving you multiple layers of protection.

Q: Which models support vision/images?

A: GPT-4o, GPT-4o Mini, GPT-4 Turbo, and Llama 3.2 Vision models all support image inputs.

When to Use Azure AI Foundry

Choose Azure AI Foundry if you:

Already have Azure infrastructure
Need Azure compliance certifications (SOC 2, HIPAA, ISO)
Want consolidated billing through Azure
Need VNet integration or private endpoints
Want to access multiple AI providers with one credential
Have Azure enterprise agreements
Prefer simple API key authentication

Choose Direct Providers if you:

Want the simplest possible setup
Don't have an Azure account
Need the absolute latest model features immediately
Prefer direct vendor relationships
Have existing provider API keys

Additional Resources

Need help? Contact hello@bastio.com or visit our support page.

Azure AI Foundry Integration