Memory

Your AI remembers every conversation

Semantic long-term memory for AI agents. Store context, retrieve it by meaning, and cut token costs by 90% — with zero infrastructure.

< 50ms

Retrieval time

90%

Token cost reduction

Infrastructure to manage

Zero-Config Setup

Add a user ID. Memory just works.

Before

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Deploy this"}]
)

After

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Deploy this"}],
    user="user_12345"  # Memory enabled
)

Semantic Retrieval

Find context by meaning, not keywords.

Query: “What deployment do they prefer?”

Result	Score
Prefers AWS with ECS deployment	0.94
Uses GitHub Actions for CI/CD	0.87
Production runs on PostgreSQL	0.81

Memory Timeline

Context builds automatically across conversations.

09:14:22

“I prefer TypeScript and Next.js”

stored

09:31:05

“We deploy on AWS with ECS”

stored

10:02:18

“Help me set up CI/CD”

2 memories retrieved

14:45:33

“Deploy this project”

3 memories retrieved

What's included

Search, security, and storage — handled

Every memory is embedded, indexed, and isolated per user. No vector databases, embedding APIs, or infrastructure to manage.

pgvector semantic search

Per-user memory isolation

Automatic deduplication

Injection protection

Configurable retention

Device fingerprint fallback

Per-proxy configuration

Encrypted at rest

LRU eviction policies

Cosine similarity matching

Automatic context injection

Tier-based storage limits

Python SDK

Pass a user parameter and Bastio stores and retrieves memory automatically.

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-bastio-key",
    base_url="https://api.bastio.com/v1/guard/px_..."
)

# Memory is stored automatically
client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "I prefer AWS"}],
    user="user_12345"
)

# Context retrieved on next request
client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Deploy this"}],
    user="user_12345"  # Remembers AWS preference
)

REST API

Include a user field in your request body to enable memory for any HTTP client.

curl https://api.bastio.com/v1/guard/px_.../chat/completions \
  -H "Authorization: Bearer sk-your-bastio-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Deploy this"}
    ],
    "user": "user_12345"
  }'

Semantic Search

pgvector with cosine similarity finds context by meaning, not keywords. Sub-50ms retrieval.

Per-User Isolation

Every user's memory is scoped per proxy. No cross-user leakage, encrypted at rest.

Zero Infrastructure

No Pinecone, no embedding APIs. Enable memory with one toggle in your proxy settings.

Give your AI agents memory

Long-term memory included with every plan. Free tier starts with 1,000 memories per proxy.

Get started free Read the docs