Concepts
Verbatim Storage
Engram stores every message exactly as it was sent. No summarization, no extraction, no compression. When you retrieve a conversation, you get back the original text — word for word.
This matters because summaries lose detail. An extracted “memory” that says “user prefers dark mode” throws away the conversation where the user explained why they prefer dark mode, what they tried before, and the three other preferences they mentioned in the same breath.
Conversations and Messages
A conversation is a container for an ordered sequence of messages. It can have:
- A title — human-readable name
- An agent_id — links the conversation to the agent that created it
- Tags — string labels for filtering (e.g.,
["support", "billing"]) - Metadata — arbitrary JSON for your own use
A message belongs to a conversation and has:
- A role —
user,assistant,system, ortool - Content — the verbatim text
- A sequence number — integer ordering within the conversation
- Optional tool_call_id and tool_name — for tool-use messages
- Optional metadata — arbitrary JSON
Messages are ordered by sequence number, not timestamps. This gives deterministic ordering even if messages are inserted in batches.
Chunking
When messages are appended, Engram automatically creates chunks — overlapping windows of messages used for semantic search.
How it works:
- Window size: 5 messages
- Stride: 3 messages
- Overlap: 2 messages between consecutive chunks
For a conversation with 10 messages, you’d get chunks covering:
- Messages 1–5
- Messages 4–8
- Messages 7–10
Each chunk is formatted as:
[user]: Can you check the logs?
[assistant]: Sure, looking at the error logs now.
[tool]: {"errors": [{"level": "ERROR", "msg": "connection refused"}]}
[assistant]: I see a connection refused error in the logs.
[user]: That's the one — when did it start?The overlap ensures that no message context is lost at chunk boundaries.
Embeddings and Vector Search
Each chunk is embedded using the bge-base-en-v1.5 model (768 dimensions). These embeddings capture the semantic meaning of the conversation fragment.
When you search, your query is embedded with the same model and compared against all stored chunks using cosine similarity. The most relevant chunks are returned along with their original messages.
Search flow:
- Your query text is embedded into a 768-dimensional vector
- The vector index finds the most similar chunk vectors (filtered by your organization)
- The matching chunks are fetched from the database
- The original messages from each chunk’s sequence range are loaded
- Results are returned ranked by similarity score (0–1)
You can scope searches to a specific conversation or filter by tags.
Organizations and Tenant Isolation
Every API key belongs to an organization. All data — conversations, messages, chunks, vectors — is scoped to the organization.
- Queries always filter by
organization_id - Vector search includes an organization filter
- There is no way to access another organization’s data, even with a valid API key
The organization_id is denormalized onto messages and chunks (stored directly, not via JOINs) for both performance and safety.
Data Model
Organization
└── API Keys (many)
└── Conversations (many)
└── Messages (many, ordered by sequence)
└── Chunks (many, with vector embeddings)All IDs use prefixed nanoids for readability:
org_— organizationsconv_— conversationsmsg_— messageskey_— API keyschk_— chunks