Memory
Your AI remembers every conversation
Semantic long-term memory for AI agents. Store context, retrieve it by meaning, and cut token costs by 90% — with zero infrastructure.
< 50ms
Retrieval time
90%
Token cost reduction
0
Infrastructure to manage
Add a user ID. Memory just works.
Before
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Deploy this"}]
)After
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Deploy this"}],
user="user_12345" # Memory enabled
)Find context by meaning, not keywords.
Query: “What deployment do they prefer?”
| Result | Score |
|---|---|
| Prefers AWS with ECS deployment | 0.94 |
| Uses GitHub Actions for CI/CD | 0.87 |
| Production runs on PostgreSQL | 0.81 |
Context builds automatically across conversations.
09:14:22
“I prefer TypeScript and Next.js”
stored
09:31:05
“We deploy on AWS with ECS”
stored
10:02:18
“Help me set up CI/CD”
2 memories retrieved
14:45:33
“Deploy this project”
3 memories retrieved
What's included
Search, security, and storage — handled
Every memory is embedded, indexed, and isolated per user. No vector databases, embedding APIs, or infrastructure to manage.
Python SDK
Pass a user parameter and Bastio stores and retrieves memory automatically.
from openai import OpenAI
client = OpenAI(
api_key="sk-your-bastio-key",
base_url="https://api.bastio.com/v1/guard/px_..."
)
# Memory is stored automatically
client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "I prefer AWS"}],
user="user_12345"
)
# Context retrieved on next request
client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Deploy this"}],
user="user_12345" # Remembers AWS preference
)REST API
Include a user field in your request body to enable memory for any HTTP client.
curl https://api.bastio.com/v1/guard/px_.../chat/completions \
-H "Authorization: Bearer sk-your-bastio-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Deploy this"}
],
"user": "user_12345"
}'Semantic Search
pgvector with cosine similarity finds context by meaning, not keywords. Sub-50ms retrieval.
Per-User Isolation
Every user's memory is scoped per proxy. No cross-user leakage, encrypted at rest.
Zero Infrastructure
No Pinecone, no embedding APIs. Enable memory with one toggle in your proxy settings.
Give your AI agents memory
Long-term memory included with every plan. Free tier starts with 1,000 memories per proxy.