Gateway & Caching

Drop‑in proxy with caching across providers

OpenAI‑compatible endpoint with intelligent caching, multi‑provider routing, and automatic failover. Zero code changes required.

Up to 70% Faster

Potential response time improvement with intelligent caching and provider optimization

99.9% Uptime Goal

Multi-provider failover designed with health checks and circuit breakers

5+ Providers

Unified interface supporting OpenAI, Anthropic, Google, Mistral, and more

100% OpenAI Compatible

Drop-in replacement for OpenAI's API with zero code changes. Simply update your base URL and add your Bastio API key header.

Supported Endpoints

/v1/chat/completions

Chat completions with streaming support

/v1/completions

Legacy text completions

/v1/embeddings

Text embeddings generation

/v1/models

Available models listing

Quick Integration

Before (Direct OpenAI)

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://api.openai.com/v1"
});

After (Via Bastio)

const openai = new OpenAI({
  apiKey: "unused", // Provider keys managed by Bastio
  baseURL: "https://api.bastio.com/v1",
  defaultHeaders: {
    "X-API-Key": process.env.BASTIO_API_KEY
  }
});

Intelligent Multi-Provider Routing

Automatic failover, load balancing, and cost optimization across multiple AI providers with configurable routing policies.

OpenAI

• GPT-4, GPT-4 Turbo, GPT-3.5
• Primary for chat completions
• Best performance for code generation

Anthropic

• Claude 3, Claude 2.1, Claude Instant
• Excellent for long-form content
• Strong reasoning capabilities

Google

• Gemini Pro, Gemini Ultra
• Multimodal capabilities
• Cost-effective for high volume

Routing Strategies

Cost-Optimized

Route to the lowest-cost provider that meets quality requirements

Performance-First

Prioritize fastest response times and lowest latency

Quality-Based

Route based on model capabilities for specific task types

Load Balancing

Distribute requests evenly to prevent rate limiting

Intelligent Response Caching

Advanced caching algorithms reduce costs and improve performance with semantic similarity matching and configurable TTL policies.

Semantic Caching

Match similar queries using embedding-based similarity, not just exact string matching.

• Vector similarity scoring
• Configurable similarity thresholds
• Context-aware matching

Smart TTL Policies

Adaptive cache expiration based on content type, user patterns, and update frequency.

• Dynamic TTL adjustment
• Content-type specific policies
• User behavior analysis

Cache Performance Metrics

Target Hit Rate

80%+

Target Latency Reduction

60%+

Target Cost Savings

70%+

Cache Configuration

{
  "cache_policy": "semantic",
  "similarity_threshold": 0.85,
  "ttl_minutes": 60,
  "max_cache_size": "10GB",
  "exclude_patterns": ["password", "token"]
}

Advanced Streaming Support

Full support for Server-Sent Events (SSE) streaming with security analysis, caching, and provider failover for real-time applications.

Streaming Features

Real-time security analysis on streaming content
Partial response caching for common prefixes
Seamless failover during streaming
Backpressure handling and flow control

Example Implementation

const stream = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "Hello" }],
  stream: true
});

for await (const chunk of stream) {
  // Security analysis happens in real-time
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Reliability & Circuit Breakers

99.9%

Gateway Uptime Target

Design goal

<3s

Failover Time

Automatic provider switch

30s

Health Check Interval

Proactive monitoring

5min

Circuit Recovery

Automatic retry logic

Health Monitoring

Continuous health checks across all providers with intelligent circuit breaker patterns that prevent cascading failures.

Healthy: Auto-route traffic

Degraded: Reduced traffic

Offline: Circuit breaker open

Example Target Scenarios

SaaS Customer Support Use Case

Designed for customer success platforms to reduce AI costs while improving response times through intelligent caching and provider optimization.

• Target: 80%+ cache hit rate on common queries
• Goal: Sub-3s average response times
• Multi-provider failover for high availability
• Potential: $10K+ monthly savings

EdTech Content Generation

Built for educational platforms targeting 99.9% uptime and 40%+ cost reduction through multi-provider routing and semantic caching.

• Seamless failover during provider outages
• Target: 70%+ cache hit rate on content
• Simplified provider integration management
• Target: 100ms+ latency improvement

Ready to Optimize Your AI Gateway?

Start with a drop-in replacement that reduces costs, improves performance, and eliminates provider downtime risk.

Start Free Trial Integration Guide

← Threat Detection Next: PII & DLP →

Drop‑in proxy with caching across providersDrop‑inproxywithcachingacrossproviders