Memory

Your AI remembers every conversation

Semantic long-term memory for AI agents. Store context, retrieve it by meaning, and cut token costs by 90% — with zero infrastructure.

< 50ms

Retrieval time

90%

Token cost reduction

0

Infrastructure to manage

Zero-Config Setup

Add a user ID. Memory just works.

Before

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Deploy this"}]
)

After

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Deploy this"}],
    user="user_12345"  # Memory enabled
)
Semantic Retrieval

Find context by meaning, not keywords.

Query: “What deployment do they prefer?”

ResultScore
Prefers AWS with ECS deployment0.94
Uses GitHub Actions for CI/CD0.87
Production runs on PostgreSQL0.81
Memory Timeline

Context builds automatically across conversations.

09:14:22

I prefer TypeScript and Next.js

stored

09:31:05

We deploy on AWS with ECS

stored

10:02:18

Help me set up CI/CD

2 memories retrieved

14:45:33

Deploy this project

3 memories retrieved

What's included

Search, security, and storage — handled

Every memory is embedded, indexed, and isolated per user. No vector databases, embedding APIs, or infrastructure to manage.

pgvector semantic search
Per-user memory isolation
Automatic deduplication
Injection protection
Configurable retention
Device fingerprint fallback
Per-proxy configuration
Encrypted at rest
LRU eviction policies
Cosine similarity matching
Automatic context injection
Tier-based storage limits

Python SDK

Pass a user parameter and Bastio stores and retrieves memory automatically.

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-bastio-key",
    base_url="https://api.bastio.com/v1/guard/px_..."
)

# Memory is stored automatically
client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "I prefer AWS"}],
    user="user_12345"
)

# Context retrieved on next request
client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Deploy this"}],
    user="user_12345"  # Remembers AWS preference
)

REST API

Include a user field in your request body to enable memory for any HTTP client.

curl https://api.bastio.com/v1/guard/px_.../chat/completions \
  -H "Authorization: Bearer sk-your-bastio-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Deploy this"}
    ],
    "user": "user_12345"
  }'

Semantic Search

pgvector with cosine similarity finds context by meaning, not keywords. Sub-50ms retrieval.

Per-User Isolation

Every user's memory is scoped per proxy. No cross-user leakage, encrypted at rest.

Zero Infrastructure

No Pinecone, no embedding APIs. Enable memory with one toggle in your proxy settings.

Give your AI agents memory

Long-term memory included with every plan. Free tier starts with 1,000 memories per proxy.