PAPER_01 // GRAPH ARCHITECTURE
ARXIV.ORG / 2601.03236 ↗

MAGMA: Multi-level Attributed Graph Memory Architecture for LLM Agents

GRAPH MEMORY MULTI-LEVEL ATTRIBUTED GRAPHS LLM AGENTS
READ PAPER ↗

THE CORE PROBLEM IT SOLVES

Most AI agent memory systems treat all memories equally — a flat list of text chunks retrieved by similarity. MAGMA argues this is fundamentally wrong. Human memory isn't flat. It's structured across multiple levels with different properties at each level.

A person remembers episodic events differently from conceptual knowledge, differently from learned skills. Each type of memory has different retrieval patterns, different decay rates, and different relationship structures.

THE KEY CONCEPTS

  • Multi-level architecture — memory organised into distinct layers (episodic, semantic, procedural, temporal) each with their own graph structure and query patterns
  • Attributed graphs — each memory node carries metadata (timestamp, confidence, source, recency weight) not just content
  • Cross-level edges — memories at different levels can connect to each other, creating rich associative retrieval paths
  • Level-appropriate queries — different question types route to different graph levels automatically

WHY IT MATTERS FOR AGENTS

An agent asked "what does this user prefer?" should query semantic memory. Asked "what happened in our last session?" it should query episodic memory. Asked "how do I run this task?" it should query procedural memory. A single flat retrieval system gets all of these wrong some of the time. MAGMA gets them right by design.

VEKTOR IMPLEMENTATION

VEKTOR implements all 4 MAGMA graph types as separate SQLite-backed graph stores with a unified query interface. When you call mem.recall(query), VEKTOR routes across the appropriate graph levels and merges results by relevance score. The graph type is configurable at agent initialisation time.

MAGMA LEVEL STRUCTURE
EPISODIC_LAYER time-ordered events SEMANTIC_LAYER concepts + facts PROCEDURAL_LAYER skills + workflows TEMPORAL_LAYER time-weighted decay
PAPER_02 // MEMORY LIFECYCLE
ARXIV.ORG / 2601.02163 ↗

EverMemOS: Persistent Memory Operating System for Large Language Model Agents

MEMORY OS LIFECYCLE PERSISTENCE CONSOLIDATION
READ PAPER ↗

THE CORE PROBLEM IT SOLVES

Even if you store agent memories, how do they get managed over time? Memory can't just grow forever — it needs to be consolidated, pruned, and prioritised. EverMemOS treats agent memory like an operating system treats RAM and disk: active working memory, short-term cache, and long-term persistent storage.

The paper introduces the concept of memory as a managed resource with explicit lifecycle states, not just a growing database.

THE KEY CONCEPTS

  • Working memory — the active context window, managed explicitly rather than implicitly by token limits
  • Memory consolidation — background process that compresses and restructures memories during "idle" periods, similar to sleep-time consolidation in humans
  • Retrieval-augmented injection — relevant long-term memories injected into context automatically at query time, not manually
  • Forgetting curves — memories decay unless reinforced by repeated access, modelling the Ebbinghaus forgetting curve
  • Memory promotion/demotion — frequently accessed short-term memories get promoted to long-term storage automatically

WHY IT MATTERS FOR AGENTS

Without lifecycle management, agent memory either grows until it's unusable, or gets arbitrarily truncated. EverMemOS gives memory a biology — things that matter get remembered, things that don't fade. This is how you build agents that behave more like humans over time.

VEKTOR IMPLEMENTATION

VEKTOR's memory lifecycle follows the EverMemOS model. Each stored memory has a recency_score and access_count. The consolidation process runs on a configurable interval, merging related memories and decaying unused ones. The AUDN deduplication loop prevents redundant storage before consolidation is needed.

MEMORY LIFECYCLE STATES
WORKING_MEM consolidate SHORT_TERM decay promote FORGOTTEN LONG_TERM recall → inject
PAPER_03 // MEMORY COMPRESSION
ARXIV.ORG / 2504.19413 ↗

Mem0: The Memory Layer for Personalised AI

COMPRESSION DEDUPLICATION PERSONALISATION TOKEN EFFICIENCY
READ PAPER ↗

THE CORE PROBLEM IT SOLVES

Storing memories is the easy part. The hard part is keeping the memory layer efficient enough to actually use at inference time. If recalling memory takes 2 seconds and injects 4,000 tokens, it's too slow and too expensive to use in production.

Mem0 tackles the compression and deduplication problem — how to extract only the meaningful signal from conversations and store it in a compact, deduplicated form that can be retrieved in milliseconds.

THE KEY CONCEPTS

  • Extraction-before-storage — rather than storing raw conversation turns, Mem0 extracts structured facts, preferences, and entities first
  • Semantic deduplication — before storing a new memory, check if semantically equivalent content already exists. Merge rather than duplicate
  • Conflict resolution — when new information contradicts stored memory, resolve the conflict explicitly rather than storing both
  • Token-efficient retrieval — compressed memories returned as structured JSON inject far fewer tokens than raw text chunks
  • Personalisation graph — user preferences, patterns, and identity facts maintained as a lightweight personal knowledge graph

WHY IT MATTERS FOR AGENTS

A memory system that doubles your token usage per call isn't viable at scale. Mem0's approach proves you can have rich, personalised memory at a fraction of the token cost of naive approaches — if you invest in proper extraction and deduplication upfront.

VEKTOR IMPLEMENTATION

VEKTOR's AUDN (Automatic Update and Deduplication Node) loop is directly inspired by Mem0's deduplication approach. Before any memory is written to the graph, it passes through the AUDN loop: extract structured facts, check semantic similarity against existing nodes, merge if similarity exceeds threshold, store new node only if genuinely novel. This keeps the graph compact and retrieval fast.

AUDN DEDUP PIPELINE
RAW_CONVERSATION EXTRACT_FACTS CHECK_SIMILARITY → MERGE? SKIP (DUP) STORE NODE
PAPER_04 // AGENT OS PARADIGM
LETTA.COM ↗

Letta / MemGPT: The LLM as an Operating System

AGENT OS CONTEXT MANAGEMENT MEMORY PAGING SELF-EDITING
VISIT LETTA ↗

THE CORE PROBLEM IT SOLVES

MemGPT (now Letta) introduced a paradigm shift: treat the LLM like a CPU and the context window like RAM. Just as an OS pages memory between RAM and disk, MemGPT pages agent context between the active window and external storage.

This reframing is powerful because it makes the memory problem an engineering problem with known solutions, rather than a novel AI problem requiring novel techniques.

THE KEY CONCEPTS

  • Memory paging — content moves in and out of the active context window based on relevance, just like virtual memory paging
  • Self-editing memory — the agent itself can read and write its own memory storage, creating a feedback loop of self-improvement
  • Tiered storage — in-context (fast, expensive), external recall storage (slower, cheap), archival storage (slowest, cheapest)
  • Function calling as system calls — memory operations exposed as tool calls the LLM can invoke, making memory management explicit
  • Stateful agents — the core product insight: agents should be stateful entities that persist between conversations, not stateless request handlers

WHY IT MATTERS FOR AGENTS

The OS metaphor is the most useful mental model for building agents with memory. Letta proved that stateful agents are not just possible but production-ready — they run Nokia, Bain, and others at scale. VEKTOR takes the same paradigm and makes it accessible to any Node.js developer without the full Letta platform overhead.

VEKTOR IMPLEMENTATION

VEKTOR adopts the tiered storage model from Letta: active working memory (in-context), the MAGMA graph (external recall), and archival cold storage. The key difference is VEKTOR is self-hosted SQLite — no cloud platform, no API dependency, no per-call cost. The agent OS concept lives in vektor-core.js as a lightweight runtime that wraps your existing agent code.

TIERED STORAGE MODEL
IN-CONTEXT (ACTIVE) fast · expensive · limited page out RECALL STORAGE (GRAPH) medium · cheap · indexed ARCHIVAL STORAGE slow · very cheap · unlimited ← VEKTOR implements all 3 tiers →

How the four papers combine inside VEKTOR.

FROM // LETTA

Agent OS Model

Agents are stateful processes, not stateless functions. Memory is a managed resource, not an afterthought.

FROM // EVERMEMOS

Lifecycle Management

Memory has states — working, short-term, long-term, forgotten. Consolidation runs in the background like garbage collection.

FROM // MEM0

AUDN Deduplication

Extract before storing. Deduplicate before writing. Merge before accumulating. Keep the graph clean and fast.

FROM // MAGMA

4-Level Graph Types

Different memory types need different graph structures. Route queries to the right level automatically.

Ready to give your agents real memory?

One-time payment. Self-hosted. No subscriptions. Drop into any Node.js agent in minutes.