VEKTOR implements concepts from four peer-reviewed papers and open research projects. The implementation is entirely original. No code was copied. These papers shaped how we think about memory.
All concepts explained here are derived from publicly available research. VEKTOR's source code is original and proprietary. Links go directly to the original papers.
Most AI agent memory systems treat all memories equally — a flat list of text chunks retrieved by similarity. MAGMA argues this is fundamentally wrong. Human memory isn't flat. It's structured across multiple levels with different properties at each level.
A person remembers episodic events differently from conceptual knowledge, differently from learned skills. Each type of memory has different retrieval patterns, different decay rates, and different relationship structures.
An agent asked "what does this user prefer?" should query semantic memory. Asked "what happened in our last session?" it should query episodic memory. Asked "how do I run this task?" it should query procedural memory. A single flat retrieval system gets all of these wrong some of the time. MAGMA gets them right by design.
VEKTOR implements all 4 MAGMA graph types as separate SQLite-backed graph stores with a unified query interface. When you call mem.recall(query), VEKTOR routes across the appropriate graph levels and merges results by relevance score. The graph type is configurable at agent initialisation time.
Even if you store agent memories, how do they get managed over time? Memory can't just grow forever — it needs to be consolidated, pruned, and prioritised. EverMemOS treats agent memory like an operating system treats RAM and disk: active working memory, short-term cache, and long-term persistent storage.
The paper introduces the concept of memory as a managed resource with explicit lifecycle states, not just a growing database.
Without lifecycle management, agent memory either grows until it's unusable, or gets arbitrarily truncated. EverMemOS gives memory a biology — things that matter get remembered, things that don't fade. This is how you build agents that behave more like humans over time.
VEKTOR's memory lifecycle follows the EverMemOS model. Each stored memory has a recency_score and access_count. The consolidation process runs on a configurable interval, merging related memories and decaying unused ones. The AUDN deduplication loop prevents redundant storage before consolidation is needed.
Storing memories is the easy part. The hard part is keeping the memory layer efficient enough to actually use at inference time. If recalling memory takes 2 seconds and injects 4,000 tokens, it's too slow and too expensive to use in production.
Mem0 tackles the compression and deduplication problem — how to extract only the meaningful signal from conversations and store it in a compact, deduplicated form that can be retrieved in milliseconds.
A memory system that doubles your token usage per call isn't viable at scale. Mem0's approach proves you can have rich, personalised memory at a fraction of the token cost of naive approaches — if you invest in proper extraction and deduplication upfront.
VEKTOR's AUDN (Automatic Update and Deduplication Node) loop is directly inspired by Mem0's deduplication approach. Before any memory is written to the graph, it passes through the AUDN loop: extract structured facts, check semantic similarity against existing nodes, merge if similarity exceeds threshold, store new node only if genuinely novel. This keeps the graph compact and retrieval fast.
MemGPT (now Letta) introduced a paradigm shift: treat the LLM like a CPU and the context window like RAM. Just as an OS pages memory between RAM and disk, MemGPT pages agent context between the active window and external storage.
This reframing is powerful because it makes the memory problem an engineering problem with known solutions, rather than a novel AI problem requiring novel techniques.
The OS metaphor is the most useful mental model for building agents with memory. Letta proved that stateful agents are not just possible but production-ready — they run Nokia, Bain, and others at scale. VEKTOR takes the same paradigm and makes it accessible to any Node.js developer without the full Letta platform overhead.
VEKTOR adopts the tiered storage model from Letta: active working memory (in-context), the MAGMA graph (external recall), and archival cold storage. The key difference is VEKTOR is self-hosted SQLite — no cloud platform, no API dependency, no per-call cost. The agent OS concept lives in vektor-core.js as a lightweight runtime that wraps your existing agent code.
Agents are stateful processes, not stateless functions. Memory is a managed resource, not an afterthought.
Memory has states — working, short-term, long-term, forgotten. Consolidation runs in the background like garbage collection.
Extract before storing. Deduplicate before writing. Merge before accumulating. Keep the graph clean and fast.
Different memory types need different graph structures. Route queries to the right level automatically.
One-time payment. Self-hosted. No subscriptions. Drop into any Node.js agent in minutes.