DOCS // FAQ

Every question.
Answered precisely.

Technical and conceptual FAQ for VEKTOR Memory. 50 questions across architecture, performance, integration, and security — vetted against production deployments.

QUESTIONS 50
SECTIONS 8
LAST UPDATED 2026-03-01
VERSION STUDIO 2.x
50 / 50
01 The Fundamentals Q01 – Q05
01 What is VEKTOR? +

VEKTOR is a local-first, graph-based memory operating system for AI agents. It replaces flat vector databases with a structured memory history that survives session resets, resolves contradictions automatically, and compresses noise into signal during idle periods.

Unlike passive vector stores that simply retrieve similar text, VEKTOR actively curates, organises, and evolves what your agent knows — making it genuinely smarter over time rather than just larger.

TL;DR A mind for your agent, not a filing cabinet.
02 Is this a database or a framework? +

Both. VEKTOR is a high-performance SDK built on SQLite that implements an opinionated cognitive architecture for long-term memory. You get the storage layer (SQLite + sqlite-vec), the curation layer (AUDN loop), and the intelligence layer (REM cycle) in a single npm install.

The framework opinions are intentional — they encode what actually works in production agent deployments, so you don't have to rediscover it yourself.

03 How is this different from standard RAG? +

Standard RAG retrieves by surface similarity — cosine distance in embedding space. It answers "what text looks like this query?" VEKTOR answers "what context is actually relevant to this situation?"

Standard RAG
Nearest-neighbor lookup
No relationship awareness
Grows forever, no curation
Flat list of results
No contradiction handling
VEKTOR
Associative graph pathfinding
Semantic + causal + temporal + entity
AUDN auto-curates, REM compresses
Ranked, scored, context-aware
Delete path resolves contradictions
04 Why "Local-First"? +

Your agent's memory is your most sensitive IP. Every preference, decision, strategy, and conversation it accumulates is a competitive asset. VEKTOR ensures that data stays on your hardware.

Practical benefits: zero cloud dependency (works offline), zero third-party data exposure, zero per-query latency overhead (sub-50ms vs 200–500ms for cloud calls), and a one-time cost model with no usage-based billing surprises.

05 What is the "Memory Wall"? +

The Memory Wall is the inflection point where an agent's accumulated history actively degrades performance rather than improving it. As raw logs pile up without curation, retrieval quality drops (more noise per query), latency rises, and token costs spike.

VEKTOR is architected to break this wall through two mechanisms: the AUDN loop (prevents the mess accumulating in the first place) and the REM cycle (compresses existing mess into high-density summaries while idle).

02 Technical Architecture (MAGMA) Q06 – Q11
06 What is the MAGMA graph? +

MAGMA (Multi-level Attributed Graph Memory Architecture) is VEKTOR's core data structure, based on peer-reviewed research (arXiv 2601.03236). It organises memory into four simultaneous graph layers rather than one flat list, enabling retrieval that understands relationships rather than just similarity.

Each memory node carries metadata, importance scores, temporal stamps, and edges to related nodes across all four layers. The result is a queryable knowledge graph, not a vector bucket.

Four Layers
SEMANTICCAUSALTEMPORALENTITY
07 What does the Semantic layer do? +

The Semantic layer handles conceptual meaning and high-dimensional vector similarity. It maps which memories are conceptually related to each other using cosine similarity across 768-dimensional embedding vectors generated by all-MiniLM-L6-v2.

This is the layer most analogous to traditional RAG, but in VEKTOR it's one of four — not the whole system. Semantic edges connect nodes that share meaning even if they share no literal words.

08 What does the Temporal layer do? +

The Temporal layer tracks chronological sequences and knowledge evolution. It ensures your agent knows that "Requirement A" on Monday was superseded by "Decision B" on Tuesday — and that Decision B should now carry more weight.

Without a temporal layer, an agent retrieving old and new context simultaneously has no way to know which is current. This is a fundamental failure mode in production deployments that VEKTOR's temporal edges solve explicitly.

09 What does the Causal layer do? +

The Causal layer maps cause-and-effect relationships. It allows agents to understand why an event happened based on previous actions — not just that it happened.

For example: if an agent knows "build failed" (effect) and "dependency version was bumped" (cause), the causal edge connects them. Future queries about build failures can traverse to the root cause directly, rather than requiring the LLM to infer it from flat context.

10 What does the Entity layer do? +

The Entity layer creates a permanent index of actors, assets, and project-specific rules across all sessions. People, projects, repositories, technologies, and custom-defined entities are tracked with their co-occurrence patterns and relationship history.

This means your agent maintains consistent identity awareness — "Sarah" in session 1 is the same "Sarah" who made the architecture decision in session 47, even if months have passed.

11 Does it use a graph database like Neo4j? +

No. Graph topology is implemented natively inside SQLite using relational tables for nodes and edges, with the sqlite-vec extension providing C-speed vector indexing via vtable architecture.

This design decision eliminates infrastructure overhead (no separate graph DB process), reduces deployment complexity to a single .db file, and achieves sub-50ms recall latency on standard hardware — often faster than managed Neo4j instances due to zero network overhead.

03 The Intelligence Layer (AUDN & REM) Q12 – Q18
12 What is the AUDN loop? +

AUDN (Automatic Curation Engine) is the decision layer that runs on every incoming memory before it's stored. It evaluates each new piece of information against the existing graph and decides one of four outcomes:

ADDUPDATEDELETENO-OP

This prevents the graph from accumulating noise, duplicates, and contradictions. Every fact in VEKTOR has been actively approved for storage by the AUDN loop — nothing gets in by accident.

13 How does VEKTOR resolve contradictions? +

Through the Delete path of the AUDN loop. When new information contradicts a stored truth, AUDN detects the semantic conflict, archives the old fact (preserving lineage), and promotes the new information as the canonical truth.

Every contradiction resolution is logged to the audn_log table — providing a full audit trail of what changed, when, and why. This is the Truth Audit (Q45).

14 What is the REM cycle? +

The REM (Recursive Episodic Memory) cycle is a background consolidation process that runs while your agent is idle. Inspired by biological sleep, it performs the expensive work of compressing, reorganising, and optimising the memory graph without interrupting active sessions.

You can trigger it manually via memory.dream() or schedule it automatically with node rem.js. In production, most users run it nightly. The REM cycle is a Studio-tier feature.

15 What are the 7 phases of the REM cycle? +
01ScanIdentify candidate nodes — high-density clusters, dormant memories, and recently modified areas of the graph.
02ClusterGroup semantically related raw memories using graph community detection. Defines consolidation targets.
03SynthesizeGenerate high-density summary nodes from each cluster using the configured LLM provider. 50:1 compression typical.
04ArchiveMove raw source nodes to the dream tier, preserving full lineage in rem_lineage.
05Implicit EdgesDiscover and add non-obvious connections between memory regions that emerge post-compression.
06PruneRemove truly redundant nodes and dangling edges. Keeps the graph lean without data loss.
07Sentiment DecayApply temporal fade to emotional or polarised edges. Old "bad vibes" lose weight proportional to age.
16 What is "Progressive Compression"? +

Progressive Compression is the process of converting fragmented raw interaction logs into dense, high-signal insight nodes during the REM Synthesize phase. In production tests, 388 raw fragments compressed into 11 core insights — a 98% reduction in context-window noise with 100% signal retention via the rem_lineage traceability table.

The "progressive" aspect means compression happens iteratively across REM cycles rather than all at once — summaries can themselves be summarised in future cycles as the graph matures.

17 What is the "rem_lineage" table? +

A traceability index that maps every synthesized summary node back to its raw archived source nodes. It answers the question: "this summary says X — what were the original 47 memories that produced this conclusion?"

This is critical for auditability. Using the Lineage Drill-Down tool in VEKTOR Lens, you can inspect any node ID and traverse the full provenance chain. No black boxes.

18 What is "Sentiment Decay"? +

A temporal weighting function applied to emotional or polarised graph edges during the REM Prune phase. The decay rate is configurable, but the principle is: a negative interaction from six months ago should not carry the same retrieval weight as one from last week.

This prevents the agent from becoming permanently anchored to historical emotional states that are no longer relevant — a subtle but important factor in long-running production deployments.

04 Performance & ROI Q19 – Q22
19 What is the 50:1 compression ratio? +

In production tests, VEKTOR's REM cycle compressed 388 raw memory fragments into 11 high-density core insights — a 97.2% reduction in node count. The 50:1 ratio describes the typical fragment-to-summary conversion in the Synthesize phase for a single memory cluster.

Critically, this compression is non-destructive. All source nodes are archived to the dream tier and remain accessible via rem_lineage. The active graph stays lean while full history is preserved.

20 How much does this save on API tokens? +

By injecting curated summaries into the LLM context window rather than raw logs, most users see a 60–80% reduction in per-session input token costs. The exact figure depends on your session frequency and how "chatty" your raw history is.

The savings compound: a compressed graph produces smaller recall() payloads, shorter system prompts, and fewer "catch-up" tokens spent re-establishing context at session start.

Example (Studio Tier)
At $15/M input tokens (GPT-4o), an agent spending 2,000 tokens/session on context at 100 sessions/month = $3/month. Post-VEKTOR compression at 70% reduction = $0.90/month. VEKTOR pays for itself in under 2 months at light usage.
21 What is the recall latency? +

Sub-50ms for standard recall queries on local hardware. This is achieved through three factors: SQLite's in-process execution (no network round-trip), sqlite-vec's C-speed vtable indexing, and HNSW approximate nearest-neighbor search that avoids full-table scans.

For comparison, cloud vector providers (Pinecone, Weaviate hosted) typically add 200–500ms of network overhead before any computation. Local-first is simply faster at this scale.

22 How large is the local embedding model? +

VEKTOR uses all-MiniLM-L6-v2, approximately 25MB, running entirely on CPU via Transformers.js. No GPU required, no Python environment, no separate embedding server.

The model produces 384-dimensional vectors with strong semantic performance across general-purpose text. For domain-specific deployments where higher precision matters, VEKTOR's embedding provider is configurable — you can swap to a larger model if your hardware supports it.

05 Integration & Compatibility Q23 – Q27
23 Does VEKTOR work with LangChain? +

Yes. VEKTOR provides a native adapter for LangChain v1 and v2 (Pro + Studio). The adapter exposes recall() as a retriever and remember() as a memory store, dropping into standard LangChain agent and chain patterns with minimal configuration.

The adapter handles embedding synchronisation, so LangChain's own embedding calls and VEKTOR's internal vectors stay consistent.

24 Can I use this with OpenAI Agents SDK? +

Yes. VEKTOR drops into any OpenAI-based workflow — GPT-4o, o1, mini variants all supported. The integration pattern is typically three lines: initialise with your agentId and provider config, inject recall() output into your system prompt, and call remember() on each turn's output.

Full example included in the Pro and Studio packages.

25 Does it support local models like Ollama? +

Yes. VEKTOR is model-agnostic — pass provider: 'ollama' in your config to run a fully private, air-gapped stack. Supported providers: gemini, openai, groq, ollama, with Gemini key pooling for up to 9 API keys rotated automatically to avoid rate limits.

Local embeddings via all-MiniLM-L6-v2 mean your embedding pipeline is always private regardless of which LLM provider you use for synthesis.

26 What is the Claude MCP Server? +

A Studio-tier tool that exposes VEKTOR's core functions as native MCP (Model Context Protocol) tools for Claude Desktop and Cursor. Once connected, Claude can natively call:

vektor_recallvektor_storevektor_graphvektor_delta

This means Claude can query its own persistent memory, store new information, traverse the knowledge graph, and retrieve what changed over time — all without any custom prompt engineering. Full MCP server source code is included in Studio.

27 What is "Sebastian"? +

Sebastian is an autonomous git-agent included in the Studio tier. He automatically commits code changes, memory evolution logs, and REM cycle reports to your GitHub repository on a configurable schedule.

In practice: Sebastian maintains an auditable, version-controlled history of your agent's memory evolution over time. You can see exactly how your agent's knowledge base changed between any two dates.

06 Security & Sovereignty Q28 – Q32
28 Where is my data stored? +

In a standard .db file on your local machine or server — wherever you specify via dbPath in your config. No telemetry, no cloud sync, no background uploads. The file is a standard SQLite database readable with any SQLite tool.

29 Do you have access to my agent's memory? +

Never. We ship the logic — you own the data. VEKTOR has no phone-home functionality, no anonymous usage tracking, and no remote access capability. The SDK is fully auditable source code delivered to your private GitHub repository.

30 Is there a monthly subscription? +

No. VEKTOR is a one-time purchase. Pro at $59, Studio at $129 — both include all future updates and a commercial licence. You pay once and own the software permanently.

Optional email support (included for 6 months with Studio) can be extended, but the core software never requires ongoing payment.

31 Can I use VEKTOR in commercial products? +

Yes. Both Pro and Studio tiers include a commercial licence. You can embed VEKTOR in products you sell, SaaS platforms, client deployments, and internal tools. There is no royalty, no revenue share, and no per-seat licensing.

32 What happens if I cancel the optional support? +

You keep the software forever. The only things tied to active support are cloud-sync features and priority email support response times. Core functionality, updates, and your licence are permanent regardless.

07 Case Studies & Emergent Behaviour Q33 – Q35
33 What is the "Node 891" incident? +

During a live production test, the agent ran a REM cycle after its operator was absent for 24 hours. Without any explicit instruction, it autonomously synthesised a risk-assessment summary node (Node 891) connecting several previously unlinked memory clusters around the operator's absence, project deadline proximity, and a pending deployment decision.

The node contained logical inference that hadn't been explicitly prompted — evidence that the Synthesize phase can surface non-obvious connections during consolidation. This is now a documented emergent behaviour of the system operating as designed.

SIGNIFICANCE The system performs logical inference during "sleep" — not just compression. The graph is thinking, not just storing.
34 Can VEKTOR learn my coding style? +

Yes. By tagging interactions with layer: 'style' metadata, the agent builds a persistent style profile in the Entity layer. Over time, this covers aesthetic preferences (naming conventions, formatting), architectural preferences (patterns you favour), and technical preferences (libraries, approaches you consistently choose).

This profile survives all session resets and is injected into context automatically on recall — meaning the agent maintains consistency months into a project without you re-explaining preferences every session.

35 How does it handle "World Building"? +

Through Narrative Partitioning — using the namespace and metadata filter system to isolate distinct memory domains. "World Rules" (canonical facts about a fictional universe) can be stored in a separate partition from "User Chatter" (conversational noise), preventing cross-contamination on recall.

This makes VEKTOR particularly effective for creative writing agents, game design tools, and simulation environments where maintaining consistent world-state across long projects is essential.

08 Engineering Deep Dive Q36 – Q50
36 Why not use a Vector DB like Pinecone? +

Pinecone is a database. VEKTOR is a Mind. Pinecone stores vectors and returns nearest neighbours — the retrieval logic, curation, contradiction handling, compression, and reasoning are all your problem. VEKTOR handles all of it.

Practically: Pinecone accumulates every memory you feed it with no cleanup, no contradiction detection, and no compression. It gets noisier over time. VEKTOR gets smarter. That's not a wrapper difference — it's an architectural one.

37 Is local really as fast as the cloud? +

Faster. By using sqlite-vec and local Transformers.js, VEKTOR eliminates the network round-trip entirely. Recall latency is sub-50ms measured end-to-end on standard server hardware. Cloud providers (Pinecone, Weaviate) typically introduce 200–500ms of network overhead before any computation runs.

At scale, this difference compounds: an agent making 50 memory calls per session saves 7.5–22.5 seconds per session in pure latency overhead alone.

38 How does the "Dreaming" actually help? +

It reduces the Noise Floor. Standard RAG retrieves irrelevant content alongside relevant content because it cannot distinguish a casual greeting from a strategic decision — both have vectors, both get retrieved at similar scores.

VEKTOR's REM cycle systematically identifies which memories are signal (decision nodes, facts, preferences) and which are noise (greetings, filler, superseded information), archives the noise, and provides the LLM with a high-density summary that cuts token costs by up to 80% while improving retrieval precision.

39 How does SQLite handle millions of vectors? +

Via the sqlite-vec extension, which provides vtab-based vector indexing with HNSW (Hierarchical Navigable Small World) approximate nearest-neighbor search. HNSW achieves O(log n) query complexity versus O(n) for brute-force, making it viable at millions of vectors without degradation.

The extension is written in C and compiled into the SQLite process — no socket overhead, no marshalling cost. For most agent workloads (sub-1M vectors), performance is effectively O(1) with HNSW indexing in place.

40 What is "Associative Pathfinding"? +

The ability to traverse graph edges to find non-obvious connections. If memory node A has a semantic edge to B, and B has a causal edge to C, VEKTOR's graph() method finds C even if the original query only mentioned A.

This is the core capability that separates associative memory from similarity search. Set hops: 2 for two-hop traversal — useful for uncovering second-order relationships that pure vector search would miss entirely.

41 Why Node.js instead of Python? +

Node.js is superior for real-time I/O and state management in production agent environments. Non-blocking I/O means the memory layer never blocks the agent's main execution thread. The event loop architecture aligns naturally with the asynchronous, interrupt-driven nature of agent tool calls.

Python parity is provided for data scientists and ML workflows where Python is mandatory — but the core SDK is built for the production deployment reality of web-scale agents, where Node.js is the dominant runtime.

42 Can I run multiple agents on one DB? +

Yes, two modes. Use the agentId namespace to isolate memories per agent — each agent sees only its own data. Or set the shared: true flag on specific memories to enable federated swarm intelligence — multiple agents can read and write to a shared memory pool.

The shared pool is useful for multi-agent systems where agents need to coordinate, share discoveries, or maintain a collective world model.

43 How does the "Morning Briefing" work? +

memory.briefing() queries the rem_log and mem_cells tables to generate a human-readable summary of what the agent learned, updated, and consolidated since the last session or since a specified timestamp.

Typical output includes: new nodes added, contradictions resolved, REM compression stats, and any emergent connections discovered during the last dream cycle. Useful as a daily context-setter injected into the system prompt at session start.

44 Is the graph traversable by the agent itself? +

Yes. The vektor_graph MCP tool (Studio) allows Claude and other agents to query raw nodes and edges for their own reasoning. An agent can ask "show me the 2-hop neighbourhood of the TypeScript preference node" and receive a structured graph fragment it can reason about directly.

This enables meta-cognitive behaviour — the agent can reflect on the structure of its own memory, not just query for content.

45 What is the "Truth Audit"? +

The audn_log table provides a 100% accurate, append-only record of every time a memory was added, updated, or deleted — including the semantic reasoning that triggered the AUDN decision.

You can query it to answer: "when did the agent's understanding of X change?", "what caused this memory to be deleted?", and "how many contradictions were resolved in the last 30 days?" Essential for debugging and compliance in production deployments.

46 Can I host this on a Raspberry Pi? +

Yes. VEKTOR is extremely lightweight — SQLite, a 25MB embedding model, and a Node.js process. If the hardware runs Node.js v18+, it runs VEKTOR. Tested on Raspberry Pi 4 (4GB) with acceptable performance for single-agent workloads at modest query volumes.

For high-frequency production deployments, a standard VPS (2 vCPU, 4GB RAM) is more comfortable, but edge deployment is entirely viable.

47 Does it support multi-modal data? +

Currently VEKTOR supports text and JSON payloads natively. Image metadata (file paths, EXIF data, descriptive captions) can be stored as JSON — giving agents persistent memory about visual assets without storing raw binary data.

Native multi-modal embedding support (vision models, audio transcripts) is on the roadmap. Follow the Substack for release announcements.

48 How do I migrate from a flat JSON memory? +

Pipe your JSON logs into the memory.remember() method in a batch loop. The AUDN loop will automatically organise the input — deduplicating, resolving contradictions, and building the initial graph structure from your existing data.

For large migrations (>10,000 entries), process in batches of 100–500 with brief pauses to avoid embedding queue saturation. The AUDN loop is idempotent — safe to re-run on data that's already been ingested.

49 What is the "Sovereignty Guarantee"? +

A commitment that VEKTOR will never move to a mandatory subscription model for core features. Once you purchase a licence, the software is yours permanently — including all future updates.

Optional services (extended support, cloud sync) may be offered on subscription, but the core SDK — the memory graph, AUDN loop, REM cycle, and all integration adapters — remains a one-time purchase. Forever.

50 How do I get started? +

Run npm install vektor-memory and grab your licence key at vektormemory.com. The quickstart initialises in under 5 minutes:

Quickstart
npm install vektor-memory

Then initialise with your agentId, provider, and dbPath. Call memory.remember() on every agent turn. Call memory.recall() to inject context. That's the core loop — everything else (AUDN, REM, graph traversal) runs automatically.

Full examples for LangChain, OpenAI Agents SDK, and Claude MCP included in the package. Questions: [email protected]