Memory System

The Memory System allows your Bastio proxies to store and retrieve past interactions, providing your AI agents with long-term memory. This enables context-aware conversations that persist across different sessions.

Overview

When enabled, the Memory System:

Stores user interactions (prompts and completions) in a vector database.
Retrieves relevant past interactions based on the semantic similarity to the current user prompt.
Injects this context into the system prompt of the current request, allowing the LLM to "remember" previous details.

Configuration

You can configure memory settings for each proxy individually.

Enabling Memory

Go to your Proxy Configuration in the Bastio Dashboard.
Navigate to the Memory section.
Toggle Enable Memory.
Select the Memory Strategy (currently "Semantic" is supported).

Auto-Generate User ID

By default, the memory system requires a user_id to be passed in the API request to associate memories with a specific user.

If you want to enable memory without managing user IDs manually, you can enable Auto-generate User ID.

In the Memory section of your Proxy Configuration.
Toggle Auto-generate User ID.

When enabled, if a request arrives without a user_id, Bastio will generate a consistent, anonymous ID based on the request fingerprint (IP address, User Agent, etc.).

API Usage

With Explicit User ID

To use memory with a specific user, pass the user field in your API request:

const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'My favorite color is blue.' }],
  user: 'user_12345' // Unique ID for the user
});

Subsequent requests with the same user ID will have access to the context established in previous turns.

With Auto-Generated User ID

If Auto-generate User ID is enabled in your proxy settings, you can simply omit the user field:

const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'My favorite color is blue.' }]
});

Bastio will automatically assign a stable ID to this client based on their request fingerprint.

How it Works

Storage

Embedding: User prompts and assistant responses are processed using high-performance embedding models.
Secure Storage: These embeddings are stored in our secure, high-performance database infrastructure.

Retrieval

Semantic Search: When a new request comes in, the system analyzes the semantic meaning of the current prompt.
Context Injection: It identifies the most relevant past interactions for that user and seamlessly injects them into the context window.

Privacy & Security

Isolation: Memories are strictly isolated by proxy_id and user_id. One user's memories are never accessible to another.
Encryption: All memory data is stored encrypted at rest.