A technical walkthrough of why standard vector stores fail as agent memory, what the research says about how biological memory actually works, and how to build a system that doesn't forget who you are.
Every production AI agent I've seen has the same dirty secret: it forgets everything between sessions. You can give it a 1-million-token context window and it still wakes up each morning with no idea who you are, what you've built together, or why you made the decisions you made.
The standard fix is RAG — Retrieval-Augmented Generation. Embed your history, stuff it in a vector store, retrieve on cosine similarity. It works well enough for document search. It is fundamentally broken for agent memory. Here's why.
"Cosine similarity tells you two things sound alike. It cannot tell you that one thing caused the other, came before the other, or contradicts the other. Causal, temporal, and logical relationships are invisible to a vector store."
The research paper that changed how I think about this is HippoRAG (arXiv:2405.14831). The core insight: the human hippocampus doesn't store memories as isolated facts. It stores them as a knowledge graph — with the hippocampus acting as the index that links related cortical representations together.
This is why you can recall a memory through smell, sound, emotion, or context. Multiple retrieval paths all lead to the same node. The information is richly cross-indexed, not stored in a single bucket labelled "memories".
The implication for AI: if you want an agent to recall information the way a human does — associatively, contextually, causally — you need a graph, not a vector index. The vector index is the embedding. The graph is the structure that gives those embeddings meaning.
The MAGMA framework (arXiv:2601.03236) implements all four of these as separate graph layers in SQLite. Each layer is optimised for its own retrieval pattern. Together they form something closer to a complete memory system than any flat vector store can achieve.
The single biggest unsolved problem in agent memory isn't retrieval — it's curation. If you store every conversation turn naively, you end up with a graph that's 40% contradictions, 30% duplicates, and 20% irrelevant noise. The 10% of signal is buried.
The AUDN loop solves this. Every time a new memory arrives, it goes through four possible outcomes:
This is the part most developers skip. They build the storage layer, they build the retrieval layer, and they completely ignore the curation layer. Then they wonder why their agent starts hallucinating contradictory facts after a few weeks of use.
// Incoming memory: "User now prefers TypeScript over JavaScript"
async function audnDecide(newMemory, graph) {
// 1. Embed and find nearest neighbours
const similar = await graph.findSimilar(newMemory.embedding, { k: 5, threshold: 0.85 });
// 2. Check for exact duplicate
if (similar.some(n => n.score > 0.97)) return { op: 'NO_OP' };
// 3. Check for contradiction via causal/semantic analysis
const contradicts = similar.filter(n => llm.contradictsMemory(newMemory, n));
if (contradicts.length > 0) {
await graph.deprecate(contradicts); // DELETE old
return { op: 'UPDATE', replaces: contradicts };
}
// 4. Check for related knowledge to merge into
const related = similar.filter(n => n.score > 0.72 && !n.contradicts);
if (related.length > 0) return { op: 'UPDATE', mergeInto: related[0] };
// 5. Genuinely new — add as new node
return { op: 'ADD' };
}The key insight is step 3: using an LLM to detect semantic contradiction, not just similarity. Two statements can be low-cosine-distance neighbours and direct contradictions. "User prefers light mode" and "User switched to dark mode" will have high similarity scores. Without LLM-based contradiction detection, you store both and your agent becomes confused.
Even with perfect AUDN curation, a production agent accumulates hundreds of memory nodes per week. Most of these are fine-grained operational details that should eventually be abstracted into higher-level insights. This is exactly what human sleep does — consolidate episodic memories into semantic knowledge.
The EverMemOS REM cycle (arXiv:2601.02163) runs as a background process. Here's what each phase actually does:
The result isn't just smaller. The post-REM graph is better. Synthesis nodes contain distilled knowledge that no individual fragment had. The agent that wakes up after a REM cycle is genuinely smarter than the one that went to sleep.
You don't need a hosted vector database, a managed graph service, or any cloud infrastructure. The entire system runs on SQLite with the sqlite-vec extension. Here's the schema that matters:
-- Memory nodes (the vertices)
CREATE TABLE memories (
id TEXT PRIMARY KEY,
agent_id TEXT NOT NULL,
content TEXT NOT NULL,
summary TEXT,
importance REAL DEFAULT 0.5, -- decays over time
layer TEXT CHECK(layer IN ('semantic','temporal','causal','entity')),
created_at INTEGER,
accessed_at INTEGER,
deprecated INTEGER DEFAULT 0 -- soft delete for AUDN
);
-- Relationships (the edges)
CREATE TABLE edges (
from_id TEXT REFERENCES memories(id),
to_id TEXT REFERENCES memories(id),
type TEXT CHECK(type IN ('semantic','causal','temporal','entity','contradicts')),
weight REAL DEFAULT 1.0,
created_at INTEGER
);
-- Vector index (sqlite-vec)
CREATE VIRTUAL TABLE memory_vectors
USING vec0(embedding FLOAT[384]);The power is in the combination: fast vector lookup for semantic similarity, graph traversal for causal and temporal chains, and the entity table for persistent world knowledge. Three data structures working together — not three separate services.
async function recall(query, agentId, opts = {}) {
const { hops = 2, k = 10 } = opts;
// Step 1: Vector similarity seed (fast)
const seeds = await db.all(`
SELECT m.*, v.distance
FROM memory_vectors v
JOIN memories m ON m.id = v.rowid
WHERE v.embedding MATCH ? AND k = ?
AND m.agent_id = ? AND m.deprecated = 0
ORDER BY distance
`, [queryEmbedding, k, agentId]);
// Step 2: Graph traversal from seeds (associative)
const visited = new Set(seeds.map(s => s.id));
let frontier = seeds;
for (let hop = 0; hop < hops; hop++) {
const neighbours = await db.all(`
SELECT m.* FROM edges e
JOIN memories m ON m.id = e.to_id
WHERE e.from_id IN (${frontier.map(() => '?').join(',')})
AND e.type != 'contradicts'
AND m.deprecated = 0
`, frontier.map(f => f.id));
neighbours.filter(n => !visited.has(n.id)).forEach(n => {
visited.add(n.id); frontier.push(n);
});
}
// Step 3: Score and rank by importance × recency
return [...visited].map(id => getNode(id))
.sort((a, b) => scoreNode(b) - scoreNode(a))
.slice(0, k);
}The key difference from standard RAG is step 2: graph traversal from the seed results. This is how you get causally and temporally related memories that wouldn't score highly on pure cosine similarity. The seed query finds the entry point. The graph traversal finds everything connected to it.
The architecture described here is fully implemented in VEKTOR (vektormemory.com). The research backing it is in arXiv:2601.03236 (MAGMA), arXiv:2601.02163 (EverMemOS), and arXiv:2405.14831 (HippoRAG). All three are worth reading in full.