Dropāin proxy with caching across providers
OpenAIācompatible endpoint with intelligent caching, multiāprovider routing, and automatic failover. Zero code changes required.
Up to 70% Faster
Potential response time improvement with intelligent caching and provider optimization
99.9% Uptime Goal
Multi-provider failover designed with health checks and circuit breakers
5+ Providers
Unified interface supporting OpenAI, Anthropic, Google, Mistral, and more
100% OpenAI Compatible
Drop-in replacement for OpenAI's API with zero code changes. Simply update your base URL and add your Bastio API key header.
Supported Endpoints
Chat completions with streaming support
Legacy text completions
Text embeddings generation
Available models listing
Quick Integration
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://api.openai.com/v1"
});const openai = new OpenAI({
  apiKey: "unused", // Provider keys managed by Bastio
  baseURL: "https://api.bastio.com/v1",
  defaultHeaders: {
    "X-API-Key": process.env.BASTIO_API_KEY
  }
});Intelligent Multi-Provider Routing
Automatic failover, load balancing, and cost optimization across multiple AI providers with configurable routing policies.
OpenAI
- ⢠GPT-4, GPT-4 Turbo, GPT-3.5
- ⢠Primary for chat completions
- ⢠Best performance for code generation
Anthropic
- ⢠Claude 3, Claude 2.1, Claude Instant
- ⢠Excellent for long-form content
- ⢠Strong reasoning capabilities
- ⢠Gemini Pro, Gemini Ultra
- ⢠Multimodal capabilities
- ⢠Cost-effective for high volume
Routing Strategies
Cost-Optimized
Route to the lowest-cost provider that meets quality requirements
Performance-First
Prioritize fastest response times and lowest latency
Quality-Based
Route based on model capabilities for specific task types
Load Balancing
Distribute requests evenly to prevent rate limiting
Intelligent Response Caching
Advanced caching algorithms reduce costs and improve performance with semantic similarity matching and configurable TTL policies.
Semantic Caching
Match similar queries using embedding-based similarity, not just exact string matching.
- ⢠Vector similarity scoring
- ⢠Configurable similarity thresholds
- ⢠Context-aware matching
Smart TTL Policies
Adaptive cache expiration based on content type, user patterns, and update frequency.
- ⢠Dynamic TTL adjustment
- ⢠Content-type specific policies
- ⢠User behavior analysis
Cache Performance Metrics
Cache Configuration
{
  "cache_policy": "semantic",
  "similarity_threshold": 0.85,
  "ttl_minutes": 60,
  "max_cache_size": "10GB",
  "exclude_patterns": ["password", "token"]
}Advanced Streaming Support
Full support for Server-Sent Events (SSE) streaming with security analysis, caching, and provider failover for real-time applications.
Streaming Features
- Real-time security analysis on streaming content
- Partial response caching for common prefixes
- Seamless failover during streaming
- Backpressure handling and flow control
Example Implementation
const stream = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "Hello" }],
  stream: true
});
for await (const chunk of stream) {
  // Security analysis happens in real-time
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}Reliability & Circuit Breakers
Health Monitoring
Continuous health checks across all providers with intelligent circuit breaker patterns that prevent cascading failures.
Example Target Scenarios
SaaS Customer Support Use Case
Designed for customer success platforms to reduce AI costs while improving response times through intelligent caching and provider optimization.
- ⢠Target: 80%+ cache hit rate on common queries
- ⢠Goal: Sub-3s average response times
- ⢠Multi-provider failover for high availability
- ⢠Potential: $10K+ monthly savings
EdTech Content Generation
Built for educational platforms targeting 99.9% uptime and 40%+ cost reduction through multi-provider routing and semantic caching.
- ⢠Seamless failover during provider outages
- ⢠Target: 70%+ cache hit rate on content
- ⢠Simplified provider integration management
- ⢠Target: 100ms+ latency improvement
Ready to Optimize Your AI Gateway?
Start with a drop-in replacement that reduces costs, improves performance, and eliminates provider downtime risk.