šŸš€Start free with 10,000 API requests/month included→
Gateway & Caching

Drop‑in proxy with caching across providers

OpenAI‑compatible endpoint with intelligent caching, multi‑provider routing, and automatic failover. Zero code changes required.

Up to 70% Faster

Potential response time improvement with intelligent caching and provider optimization

99.9% Uptime Goal

Multi-provider failover designed with health checks and circuit breakers

5+ Providers

Unified interface supporting OpenAI, Anthropic, Google, Mistral, and more

100% OpenAI Compatible

Drop-in replacement for OpenAI's API with zero code changes. Simply update your base URL and add your Bastio API key header.

Supported Endpoints

/v1/chat/completions

Chat completions with streaming support

/v1/completions

Legacy text completions

/v1/embeddings

Text embeddings generation

/v1/models

Available models listing

Quick Integration

Before (Direct OpenAI)
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://api.openai.com/v1"
});
After (Via Bastio)
const openai = new OpenAI({
  apiKey: "unused", // Provider keys managed by Bastio
  baseURL: "https://api.bastio.com/v1",
  defaultHeaders: {
    "X-API-Key": process.env.BASTIO_API_KEY
  }
});

Intelligent Multi-Provider Routing

Automatic failover, load balancing, and cost optimization across multiple AI providers with configurable routing policies.

OpenAI

  • • GPT-4, GPT-4 Turbo, GPT-3.5
  • • Primary for chat completions
  • • Best performance for code generation

Anthropic

  • • Claude 3, Claude 2.1, Claude Instant
  • • Excellent for long-form content
  • • Strong reasoning capabilities

Google

  • • Gemini Pro, Gemini Ultra
  • • Multimodal capabilities
  • • Cost-effective for high volume

Routing Strategies

Cost-Optimized

Route to the lowest-cost provider that meets quality requirements

Performance-First

Prioritize fastest response times and lowest latency

Quality-Based

Route based on model capabilities for specific task types

Load Balancing

Distribute requests evenly to prevent rate limiting

Intelligent Response Caching

Advanced caching algorithms reduce costs and improve performance with semantic similarity matching and configurable TTL policies.

Semantic Caching

Match similar queries using embedding-based similarity, not just exact string matching.

  • • Vector similarity scoring
  • • Configurable similarity thresholds
  • • Context-aware matching

Smart TTL Policies

Adaptive cache expiration based on content type, user patterns, and update frequency.

  • • Dynamic TTL adjustment
  • • Content-type specific policies
  • • User behavior analysis

Cache Performance Metrics

Target Hit Rate
80%+
Target Latency Reduction
60%+
Target Cost Savings
70%+

Cache Configuration

{
  "cache_policy": "semantic",
  "similarity_threshold": 0.85,
  "ttl_minutes": 60,
  "max_cache_size": "10GB",
  "exclude_patterns": ["password", "token"]
}

Advanced Streaming Support

Full support for Server-Sent Events (SSE) streaming with security analysis, caching, and provider failover for real-time applications.

Streaming Features

  • Real-time security analysis on streaming content
  • Partial response caching for common prefixes
  • Seamless failover during streaming
  • Backpressure handling and flow control

Example Implementation

const stream = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "Hello" }],
  stream: true
});

for await (const chunk of stream) {
  // Security analysis happens in real-time
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Reliability & Circuit Breakers

99.9%
Gateway Uptime Target
Design goal
<3s
Failover Time
Automatic provider switch
30s
Health Check Interval
Proactive monitoring
5min
Circuit Recovery
Automatic retry logic

Health Monitoring

Continuous health checks across all providers with intelligent circuit breaker patterns that prevent cascading failures.

Healthy: Auto-route traffic
Degraded: Reduced traffic
Offline: Circuit breaker open

Example Target Scenarios

SaaS Customer Support Use Case

Designed for customer success platforms to reduce AI costs while improving response times through intelligent caching and provider optimization.

  • • Target: 80%+ cache hit rate on common queries
  • • Goal: Sub-3s average response times
  • • Multi-provider failover for high availability
  • • Potential: $10K+ monthly savings

EdTech Content Generation

Built for educational platforms targeting 99.9% uptime and 40%+ cost reduction through multi-provider routing and semantic caching.

  • • Seamless failover during provider outages
  • • Target: 70%+ cache hit rate on content
  • • Simplified provider integration management
  • • Target: 100ms+ latency improvement

Ready to Optimize Your AI Gateway?

Start with a drop-in replacement that reduces costs, improves performance, and eliminates provider downtime risk.