This is a local-first memory system that runs CozoDB embedded in Node.js, giving Claude persistent storage with hybrid search across vector embeddings, full-text, and graph traversals. It exposes entity CRUD, observation tracking, and three search modes: vector similarity via HNSW, multi-hop graph reasoning, and community-based summaries. The agentic retrieval layer auto-routes queries to the right strategy, and temporal validity lets you query any point in history. Embeddings run locally via ONNX (bge-m3 by default, swap to MiniLM if you're RAM-constrained). Includes LangChain and LlamaIndex adapters. Reach for this when you need structured, queryable memory that goes beyond append-only chat history, especially if you want graph relationships and time-travel without managing external databases.
Why Cozo Memory?
LLMs have short-term memory limits. Standard RAG retrieves documents but can't connect facts across time. Cozo Memory gives your AI agent persistent, structured memory – it remembers past conversations, infers relationships, detects contradictions, and explores its knowledge graph – fully on your machine, with optional local LLM integration via Ollama for intelligent actions (cleanup, reflection, summarization, agentic routing).Most memory stacks combine separate databases: SQLite for facts, Chroma for vector search, NetworkX for graphs. CozoDB replaces all of that with one embedded engine: relational, graph, vector, and full-text search in a single query language, one file, zero sync lag.
Local-first memory for Claude & AI agents with hybrid search, Graph-RAG, and time-travel – runs entirely on your machine. Optional Ollama integration enables LLM-powered actions (cleanup, reflect, summarize, agentic retrieval).
# Install globally
npm install -g cozo-memory
# Or run directly with npx (no installation needed)
npx cozo-memory
git clone https://github.com/tobs-code/cozo-memory
cd cozo-memory
npm install && npm run build
npm run start
Now add the server to your MCP client (e.g. Claude Desktop) – see Integration below.
🔍 Hybrid Search - Combines semantic (HNSW), full-text (FTS), and graph signals via Reciprocal Rank Fusion for intelligent retrieval
🧠 Agentic Retrieval - Auto-routing engine analyzes query intent via local LLM to select optimal search strategy (Vector, Graph, or Community)
⏱️ Time-Travel Queries - Version all changes via CozoDB Validity; query any point in history with full audit trails
🎯 GraphRAG-R1-Inspired Adaptive Retrieval - Intelligent system with Progressive Retrieval Attenuation (PRA) and Cost-Aware F1 (CAF) scoring, conceptually inspired by GraphRAG-R1 (Yu et al., WWW 2026) and adapted for CozoDB, that learns from usage
⏳ Temporal Conflict Resolution - Automatic detection and resolution of contradictory observations with semantic analysis and audit preservation
🏠 100% Local - Embeddings via ONNX/Transformers; data stays on your machine. Some advanced features (cleanup, reflect, summarize, agentic search) require an optional Ollama service for local LLM inference — but the core search, CRUD, and graph operations work without any LLM.
🧠 Multi-Hop Reasoning - Logic-aware graph traversal with vector pivots for deep relational reasoning
🗂️ Hierarchical Memory - Multi-level architecture (L0-L3) with intelligent compression and LLM-backed summarization
A common first question is: "Why not just combine existing tools?"
| If you need... | Typical separate stack | CozoDB Memory |
|---|---|---|
| Structured data & relations | SQLite / PostgreSQL | ✅ Built-in relational engine |
| Semantic / vector search | Chroma / Qdrant / Pinecone | ✅ HNSW + FTS + RRF in one engine |
| Graph traversal & reasoning | NetworkX / Neo4j | ✅ Native graph queries + PageRank |
| Time-travel / versioning | Custom audit tables | ✅ Built-in Validity time-travel |
| Unified query language | Multiple APIs + glue code | ✅ Single Datalog query across all dimensions |
The core insight: Most memory stacks bolt vector search onto a graph DB, or graph search onto a vector DB. CozoDB is different: it is a single engine that natively combines relational, graph, vector, and full-text search. That means:
Most "Memory" MCP servers fall into two categories:
This server fills the gap in between ("Sweet Spot"): A local, database-backed memory engine combining vector, graph, and keyword signals — powered by CozoDB's unified engine rather than a patchwork of separate databases.
| Feature | CozoDB Memory (This Project) | Official Reference (@modelcontextprotocol/server-memory) | mcp-memory-service (Community) | Database Adapters (Qdrant/Neo4j) |
|---|---|---|---|---|
| Backend | CozoDB (Graph + Vector + Relational + FTS in one engine) | JSON file (memory.jsonl) | SQLite / Cloudflare | Specialized DB (only Vector or Graph) |
| Search Logic | Agentic (Auto-Route): Hybrid + Graph + Summaries | Keyword only / Exact Graph Match | Vector + Keyword | Mostly only one dimension |
| Inference | Yes: Built-in engine for implicit knowledge | No | No ("Dreaming" is consolidation) | No (Retrieval only) |
| Community | Yes: Hierarchical Community Summaries | No | No | Only clustering (no summary) |
| Time-Travel | Yes: Queries at any point in time (Validity) | No (current state only) | History available, no native DB feature | No |
| Maintenance | Janitor: LLM-backed cleanup | Manual | Automatic consolidation | Mostly manual |
| Deployment | Local (Node.js + Embedded DB) | Local (Docker/NPX) | Local or Cloud | Often requires external DB server |
The core advantage is Intelligence and Traceability: By combining an Agentic Retrieval Layer with Hierarchical GraphRAG, the system can answer both specific factual questions and broad thematic queries with much higher accuracy than pure vector stores.
EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2 – only ~400 MB RAM needed (see Embedding Model Options)cozo-nodeSome advanced actions use a local LLM via Ollama for intelligent processing. The core server works without Ollama (CRUD, search, graph operations), but the following actions require it:
| Action | Purpose |
|---|---|
cleanup | LLM-backed observation consolidation |
reflect | Generate insights, detect contradictions |
summarize_communities | LLM-generated community summaries |
compact | Session / entity compaction with LLM summarization |
agentic_search | Query intent classification for auto-routing |
Setup (if you need these features):
# 1. Install Ollama from https://ollama.ai
# 2. Pull a model (e.g. small + fast for dev):
ollama pull demyagent-4b-i1:Q6_K
# 3. Ollama runs automatically on http://localhost:11434
If Ollama is not running, the affected actions gracefully fall back to non-LLM behavior (where possible) or return a clear error message.
# Install globally
npm install -g cozo-memory
# Or use npx without installation
npx cozo-memory
git clone https://github.com/tobs-code/cozo-memory
cd cozo-memory
npm install
npm run build
npm install
npm run build
npm run start
Notes:
@xenova/transformers downloads the embedding model (may take time)CozoDB Memory supports multiple embedding models via the EMBEDDING_MODEL environment variable:
| Model | Size | RAM | Dimensions | Best For |
|---|---|---|---|---|
Xenova/bge-m3 (default) | ~600 MB | ~1.7 GB | 1024 | High accuracy, production use |
Xenova/all-MiniLM-L6-v2 | ~80 MB | ~400 MB | 384 | Low-spec machines, development |
Xenova/bge-small-en-v1.5 | ~130 MB | ~600 MB | 384 | Balanced performance |
Configuration Options:
Option 1: Using .env file (Easiest for beginners)
# Copy the example file
cp .env.example .env
# Edit .env and set your preferred model
EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2
Option 2: MCP Server Config (For Claude Desktop / Kiro)
{
"mcpServers": {
"cozo-memory": {
"command": "npx",
"args": ["cozo-memory"],
"env": {
"EMBEDDING_MODEL": "Xenova/all-MiniLM-L6-v2"
}
}
}
}
Option 3: Command Line
# Use lightweight model for development
EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2 npm run start
Download Model First (Recommended):
# Set model in .env or via command line, then:
EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2 npm run download-model
Note: Changing models requires re-embedding existing data. The model is downloaded once on first use.
{
"mcpServers": {
"cozo-memory": {
"command": "npx",
"args": ["cozo-memory"]
}
}
}
{
"mcpServers": {
"cozo-memory": {
"command": "cozo-memory"
}
}
}
{
"mcpServers": {
"cozo-memory": {
"command": "node",
"args": ["C:/Path/to/cozo-memory/dist/index.js"]
}
}
}
Official adapters for seamless integration with popular AI frameworks:
🦜 LangChain Adapter
npm install @cozo-memory/langchain @cozo-memory/adapters-core
import { CozoMemoryChatHistory, CozoMemoryRetriever } from '@cozo-memory/langchain';
const chatHistory = new CozoMemoryChatHistory({ sessionName: 'user-123' });
const retriever = new CozoMemoryRetriever({ useGraphRAG: true, graphRAGDepth: 2 });
🦙 LlamaIndex Adapter
npm install @cozo-memory/llamaindex @cozo-memory/adapters-core
import { CozoVectorStore } from '@cozo-memory/llamaindex';
const vectorStore = new CozoVectorStore({ useGraphRAG: true });
Documentation: See adapters/README.md for complete examples and API reference.
Full-featured CLI for all operations:
# System operations
cozo-memory system health
cozo-memory system metrics
# Entity operations
cozo-memory entity create -n "MyEntity" -t "person"
cozo-memory entity get -i <entity-id>
# Search
cozo-memory search query -q "search term" -l 10
cozo-memory search agentic -q "agentic query"
# Graph operations
cozo-memory graph pagerank
cozo-memory graph communities
# Export/Import
cozo-memory export json -o backup.json
cozo-memory import file -i data.json -f cozo
# All commands support -f json or -f pretty for output formatting
See CLI help for complete command reference:
cozo-memory --help
Interactive TUI with mouse support powered by Python Textual:
# Install Python dependencies (one-time)
pip install textual
# Launch TUI
npm run tui
# or directly:
cozo-memory-tui
TUI Features:
graph TB
Client[MCP Client<br/>Claude Desktop, etc.]
Server[MCP Server<br/>FastMCP + Zod Schemas]
Services[Memory Services]
Embeddings[Embeddings<br/>ONNX Runtime]
Search[Hybrid Search<br/>RRF Fusion]
Cache[Semantic Cache<br/>L1 + L2]
Inference[Inference Engine<br/>Multi-Strategy]
DB[(CozoDB SQLite<br/>Relations + Validity<br/>HNSW Indices<br/>Datalog/Graph)]
Client -->|stdio| Server
Server --> Services
Services --> Embeddings
Services --> Search
Services --> Cache
Services --> Inference
Services --> DB
style Client fill:#e1f5ff,color:#000
style Server fill:#fff4e1,color:#000
style Services fill:#f0e1ff,color:#000
style DB fill:#e1ffe1,color:#000
See docs/ARCHITECTURE.md for detailed architecture documentation
The interface is reduced to 5 consolidated tools:
| Tool | Purpose | Key Actions |
|---|---|---|
mutate_memory | Write operations | create_entity, update_entity, delete_entity, add_observation, create_relation, transactions, sessions, tasks, update_observation, batch_delete, manage_tags, batch |
query_memory | Read operations | search, advancedSearch, context, graph_rag, graph_walking, agentic_search, adaptive_retrieval, list_entities, get_entity_detail, get_session_context, list_sessions |
analyze_graph | Graph analysis | explore, communities, pagerank, betweenness, hits, shortest_path, semantic_walk |
manage_system | Maintenance | health, metrics, stats, export, import, cleanup, defrag, reflect, snapshots |
edit_user_profile | User preferences | Edit global user profile with preferences and work style |
See docs/API.md for complete API reference with all parameters and examples
First Start Takes Long
LLM-powered actions require Ollama
cleanup, reflect, summarize_communities, compact, agentic_searchollama pull demyagent-4b-i1:Q6_K (or your preferred model)Windows-Specific
Performance Issues
health action to check cache hit ratesSee docs/BENCHMARKS.md for performance optimization tips
src/index.ts: MCP Server + Tool Registrationsrc/memory-service.ts: Core business logicsrc/db-service.ts: Database operationssrc/embedding-service.ts: Embedding Pipeline + Cachesrc/hybrid-search.ts: Search Strategies + RRFsrc/inference-engine.ts: Inference Strategiessrc/api_bridge.ts: Express API Bridge (optional)npm run build # TypeScript Build
npm run dev # ts-node Start of MCP Server
npm run start # Starts dist/index.js (stdio)
npm run bridge # Build + Start of API Bridge
npm run benchmark # Runs performance tests
npm run eval # Runs evaluation suite
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Apache 2.0 - See LICENSE for details.
Built with:
Research foundations:
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent