AI Engineering  ·  Memory Systems  ·  2025

Why your AI agent
has amnesia
and how to fix it

A technical walkthrough of why standard vector stores fail as agent memory, what the research says about how biological memory actually works, and how to build a system that doesn't forget who you are.

12 MIN READ INTERMEDIATE → ADVANCED NODE.JS · SQLITE · GRAPH THEORY
THE_PROBLEM

Vector similarity is not memory.
It's a search bar.

Every production AI agent I've seen has the same dirty secret: it forgets everything between sessions. You can give it a 1-million-token context window and it still wakes up each morning with no idea who you are, what you've built together, or why you made the decisions you made.

The standard fix is RAG — Retrieval-Augmented Generation. Embed your history, stuff it in a vector store, retrieve on cosine similarity. It works well enough for document search. It is fundamentally broken for agent memory. Here's why.

"Cosine similarity tells you two things sound alike. It cannot tell you that one thing caused the other, came before the other, or contradicts the other. Causal, temporal, and logical relationships are invisible to a vector store."

The three things RAG can't do

✗ FLAT_RAG
CAUSAL REASONINGYou update a dependency. Three weeks later a bug appears. RAG retrieves both as similar-sounding events. It has no idea one caused the other.
TEMPORAL ORDERING"User preferred Python" and "User switched to TypeScript" both match a query about coding preferences. RAG might return the stale one.
CONTRADICTION DETECTIONYou tell the agent you want dark mode. Later you change your mind. Both facts live in the store. The agent has no way to know which is current.
MEMORY DECAYA fragment from 3 years ago scores identically to one from yesterday. There's no concept of recency, importance, or relevance decay.
✓ GRAPH_MEMORY
CAUSAL REASONINGA causal edge explicitly connects "dependency update" → "bug appeared". The agent can traverse this edge and reason about root causes.
TEMPORAL ORDERINGTemporal edges encode before/after. When you query preferences, the graph returns the most recent valid state — not the most similar-sounding fragment.
CONTRADICTION DETECTIONAUDN (Add/Update/Delete/No-Op) curation detects when a new memory contradicts an existing one and deletes the stale version automatically.
MEMORY DECAYImportance scores decay over time. REM compression promotes high-signal memories and prunes low-value noise. The graph gets smarter, not noisier.
// SESSION_MEMORY_COMPARISON — 4 sessions
WITHOUT_GRAPH_MEMORY SESSION_001 ✓ SESSION_002 ✗ SESSION_003 ✗ SESSION_004 ✗ Each session starts from zero. 3 out of 4 are memory wipes. WITH_GRAPH_MEMORY SESSION_001 +84 SESSION_002 +31 SESSION_003 +47 SESSION_004 +22 Full context accumulates. Every session builds on the last.
NEUROSCIENCE_BACKGROUND

What the hippocampus
can teach us about graph design

The research paper that changed how I think about this is HippoRAG (arXiv:2405.14831). The core insight: the human hippocampus doesn't store memories as isolated facts. It stores them as a knowledge graph — with the hippocampus acting as the index that links related cortical representations together.

This is why you can recall a memory through smell, sound, emotion, or context. Multiple retrieval paths all lead to the same node. The information is richly cross-indexed, not stored in a single bucket labelled "memories".

The implication for AI: if you want an agent to recall information the way a human does — associatively, contextually, causally — you need a graph, not a vector index. The vector index is the embedding. The graph is the structure that gives those embeddings meaning.

The three memory systems that matter

SYSTEM_01 // SEMANTIC
Declarative facts
"User prefers TypeScript." "Project uses Python for data pipelines." These are stable facts. Store them as high-dimensional vectors. Retrieve by similarity. This is the part RAG gets right.
RAG handles this ✓
SYSTEM_02 // EPISODIC
Event sequences
"On Tuesday, the user updated the auth module, then on Thursday reported a login bug." The temporal ordering is the information. Without it, you just have two disconnected facts.
RAG misses this ✗
SYSTEM_03 // CAUSAL
Cause and effect
"The auth update caused the login bug." This is not retrievable by similarity — it requires an explicit directed edge in a graph. This is where most agent memory systems completely fail.
RAG misses this ✗
SYSTEM_04 // ENTITY
Named world model
"Sarah is the project lead. The auth module lives at /src/auth. The deadline is Q3." A persistent entity index that doesn't need to be re-established every session.
RAG partially handles

The MAGMA framework (arXiv:2601.03236) implements all four of these as separate graph layers in SQLite. Each layer is optimised for its own retrieval pattern. Together they form something closer to a complete memory system than any flat vector store can achieve.

TECHNICAL_DEEP_DIVE

The AUDN curation loop:
how memory stays clean

The single biggest unsolved problem in agent memory isn't retrieval — it's curation. If you store every conversation turn naively, you end up with a graph that's 40% contradictions, 30% duplicates, and 20% irrelevant noise. The 10% of signal is buried.

The AUDN loop solves this. Every time a new memory arrives, it goes through four possible outcomes:

A
ADD
New information. No existing node matches. Create a new node and link it to related entities.
U
UPDATE
Information that modifies existing knowledge. Merge into the existing node, preserving history.
D
DELETE
New information directly contradicts existing. Mark the old node as deprecated. Don't just add both.
N
NO-OP
Already known. Exact or near-duplicate of an existing node. Increment confidence. Don't store twice.

This is the part most developers skip. They build the storage layer, they build the retrieval layer, and they completely ignore the curation layer. Then they wonder why their agent starts hallucinating contradictory facts after a few weeks of use.

audn-decision.js // simplified logic
// Incoming memory: "User now prefers TypeScript over JavaScript" async function audnDecide(newMemory, graph) { // 1. Embed and find nearest neighbours const similar = await graph.findSimilar(newMemory.embedding, { k: 5, threshold: 0.85 }); // 2. Check for exact duplicate if (similar.some(n => n.score > 0.97)) return { op: 'NO_OP' }; // 3. Check for contradiction via causal/semantic analysis const contradicts = similar.filter(n => llm.contradictsMemory(newMemory, n)); if (contradicts.length > 0) { await graph.deprecate(contradicts); // DELETE old return { op: 'UPDATE', replaces: contradicts }; } // 4. Check for related knowledge to merge into const related = similar.filter(n => n.score > 0.72 && !n.contradicts); if (related.length > 0) return { op: 'UPDATE', mergeInto: related[0] }; // 5. Genuinely new — add as new node return { op: 'ADD' }; }

The key insight is step 3: using an LLM to detect semantic contradiction, not just similarity. Two statements can be low-cosine-distance neighbours and direct contradictions. "User prefers light mode" and "User switched to dark mode" will have high similarity scores. Without LLM-based contradiction detection, you store both and your agent becomes confused.

EVERMEMOS_INTERNALS

REM compression: how the
graph gets smarter while idle

Even with perfect AUDN curation, a production agent accumulates hundreds of memory nodes per week. Most of these are fine-grained operational details that should eventually be abstracted into higher-level insights. This is exactly what human sleep does — consolidate episodic memories into semantic knowledge.

The EverMemOS REM cycle (arXiv:2601.02163) runs as a background process. Here's what each phase actually does:

PHASE_01
SCAN
Walk the graph. Flag nodes with importance score below threshold τ. Flag nodes that haven't been accessed in >N days.
PHASE_02
CLUSTER
Union-Find on flagged nodes. Group fragments that share entity connections or high semantic similarity into candidate clusters.
PHASE_03
SYNTHESIZE
For each cluster: send to LLM with prompt "distil these N memories into one high-density insight." The output becomes a new synthesis node.
PHASE_04
PRUNE
Replace the cluster with the synthesis node. Preserve all outbound edges from cluster members — attach them to the synthesis node.
PHASE_05
VERIFY
Integrity check. Confirm no dangling edges. Confirm synthesis node correctly inherits all entity relationships. Re-index.
388
Raw fragments before REM
11
Insight nodes after REM
Compression ratio achieved

The result isn't just smaller. The post-REM graph is better. Synthesis nodes contain distilled knowledge that no individual fragment had. The agent that wakes up after a REM cycle is genuinely smarter than the one that went to sleep.

IMPLEMENTATION

Building this yourself:
the minimum viable graph

You don't need a hosted vector database, a managed graph service, or any cloud infrastructure. The entire system runs on SQLite with the sqlite-vec extension. Here's the schema that matters:

schema.sql // core tables
-- Memory nodes (the vertices) CREATE TABLE memories ( id TEXT PRIMARY KEY, agent_id TEXT NOT NULL, content TEXT NOT NULL, summary TEXT, importance REAL DEFAULT 0.5, -- decays over time layer TEXT CHECK(layer IN ('semantic','temporal','causal','entity')), created_at INTEGER, accessed_at INTEGER, deprecated INTEGER DEFAULT 0 -- soft delete for AUDN ); -- Relationships (the edges) CREATE TABLE edges ( from_id TEXT REFERENCES memories(id), to_id TEXT REFERENCES memories(id), type TEXT CHECK(type IN ('semantic','causal','temporal','entity','contradicts')), weight REAL DEFAULT 1.0, created_at INTEGER ); -- Vector index (sqlite-vec) CREATE VIRTUAL TABLE memory_vectors USING vec0(embedding FLOAT[384]);

The power is in the combination: fast vector lookup for semantic similarity, graph traversal for causal and temporal chains, and the entity table for persistent world knowledge. Three data structures working together — not three separate services.

The retrieval query that actually works

recall.js // multi-layer retrieval
async function recall(query, agentId, opts = {}) { const { hops = 2, k = 10 } = opts; // Step 1: Vector similarity seed (fast) const seeds = await db.all(` SELECT m.*, v.distance FROM memory_vectors v JOIN memories m ON m.id = v.rowid WHERE v.embedding MATCH ? AND k = ? AND m.agent_id = ? AND m.deprecated = 0 ORDER BY distance `, [queryEmbedding, k, agentId]); // Step 2: Graph traversal from seeds (associative) const visited = new Set(seeds.map(s => s.id)); let frontier = seeds; for (let hop = 0; hop < hops; hop++) { const neighbours = await db.all(` SELECT m.* FROM edges e JOIN memories m ON m.id = e.to_id WHERE e.from_id IN (${frontier.map(() => '?').join(',')}) AND e.type != 'contradicts' AND m.deprecated = 0 `, frontier.map(f => f.id)); neighbours.filter(n => !visited.has(n.id)).forEach(n => { visited.add(n.id); frontier.push(n); }); } // Step 3: Score and rank by importance × recency return [...visited].map(id => getNode(id)) .sort((a, b) => scoreNode(b) - scoreNode(a)) .slice(0, k); }

The key difference from standard RAG is step 2: graph traversal from the seed results. This is how you get causally and temporally related memories that wouldn't score highly on pure cosine similarity. The seed query finds the entry point. The graph traversal finds everything connected to it.

TAKEAWAYS

What to actually build

The architecture described here is fully implemented in VEKTOR (vektormemory.com). The research backing it is in arXiv:2601.03236 (MAGMA), arXiv:2601.02163 (EverMemOS), and arXiv:2405.14831 (HippoRAG). All three are worth reading in full.

Get VEKTOR → Read the Docs →
CONTINUE READING