Gateway

One endpoint, every provider

Drop-in OpenAI-compatible proxy. Change one line of code to get security scanning, response caching, and multi-provider failover.

< 1ms

Added latency

6

Providers supported

100%

OpenAI SDK compatible

Drop-in Integration

Change one line, keep your SDK.

Before

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

After

const openai = new OpenAI({
  apiKey: process.env.BASTIO_API_KEY,
  baseURL: "https://api.bastio.com/v1/guard/px_..."
});
Supported Endpoints

Every OpenAI endpoint, fully proxied.

/v1/chat/completions

Chat + streaming

/v1/responses

Responses API

/v1/embeddings

Text embeddings

/v1/models

Model listing

Provider Routing

Automatic failover across six providers.

ProviderStatusLatencyModels
OpenAI✓ healthy92msGPT-4o, o1, o3-mini
Anthropic✓ healthy105msClaude Opus, Sonnet, Haiku
Google⚠ degraded340msGemini 2.5 Pro, Flash

What's included

Security, caching, and reliability — built in

Every request through the gateway gets automatic security scanning, intelligent caching, and multi-provider reliability at no extra configuration.

Semantic response caching
SSE streaming support
Automatic provider failover
Circuit breaker protection
Health monitoring
Rate limiting
Request/response logging
Cost tracking
PII redaction in transit
Provider load balancing
Configurable retry logic
Cache TTL policies

Python SDK

Use the standard OpenAI Python SDK with streaming — just point it at Bastio.

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-bastio-key",
    base_url="https://api.bastio.com/v1/guard/px_..."
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

REST API

Or call the gateway directly with any HTTP client — no SDK required.

curl https://api.bastio.com/v1/guard/px_.../chat/completions \
  -H "Authorization: Bearer sk-your-bastio-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello"}
    ],
    "stream": true
  }'

Intelligent Caching

Semantic similarity matching reduces duplicate LLM calls and cuts costs by up to 70%.

Full Streaming

Server-Sent Events with real-time security scanning on every chunk.

Auto Failover

Circuit breakers detect provider issues and reroute traffic in under 3 seconds.

Start routing AI traffic through Bastio

OpenAI-compatible gateway included with every plan. No extra cost.