Cognitive Ai Memory

244authSTDIOregistry active

Summary

Gives Claude a memory layer that behaves like human memory, based on the Ebbinghaus forgetting curve. Memories strengthen when recalled and naturally decay over time unless they're marked important. The system deduplicates facts by subject, links related memories through an entity graph, and prunes stale context automatically. It's stdio-based, runs on SQLite by default, and plugs into Claude Desktop, Cursor, Cline, or Windsurf with a one-command setup. Exposes MCP tools for storing, searching, and managing memories, plus a browser dashboard at localhost:3033 for viewing strength scores and recall history. Benchmarks show 89% recall on LongMemEval and 2x better performance than Zep Cloud on multi-session conversations. Reach for this when you want Claude to remember preferences and context across conversations without manual prompting.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

YourMemory

Persistent memory for AI agents — built on the science of how humans remember.

What Is YourMemory?

Every session, your AI assistant starts from zero. It asks the same questions, forgets your preferences, re-learns your stack. There is no memory between conversations.

YourMemory fixes that with a one-command install that plugs into Claude, Cursor, Cline, Windsurf, or any MCP client. It gives your AI a persistent memory layer modelled on human cognition:

Things that matter stick — importance score controls how quickly a memory decays
Outdated facts get replaced — subject-aware deduplication merges or supersedes memories automatically
Related context surfaces together — entity graph links memories that share people, places, or concepts
Old memories fade naturally — Ebbinghaus forgetting curve prunes stale context every 24 hours

Zero infrastructure required. SQLite by default, Postgres for teams.

Benchmarks
Quick Start
Memory Dashboard
Ask Without an LLM Call
API Proxy — Guaranteed Memory
MCP Tools
How It Works
Multi-Agent Memory
Stack
Architecture
Troubleshooting
Contributing

Benchmarks

Three external datasets, all scripts open source and reproducible. Full methodology in BENCHMARKS.md.

LongMemEval-S — 500 questions, ~53 distractor sessions each

The hardest standard benchmark for long-term memory systems. Each question is backed by ~53 conversation sessions; the model must retrieve the right one(s) from the haystack.

Metric	Score
Recall@5 (any gold session in top-5)	89.4%
Recall-all@5 (all gold sessions in top-5)	84.8%
nDCG@5 (ranking quality)	87.4%

By question type (Recall@5):

Question Type	Recall@5	n
single-session-assistant	98.2%	56
knowledge-update	96.2%	78
multi-session	95.5%	133
single-session-preference	90.0%	30
temporal-reasoning	84.2%	133
single-session-user	72.9%	70

LoCoMo-10 — 1,534 QA pairs across 10 multi-session conversations

Conversations spanning weeks to months. Every system ingests the same session summaries in the same order.

System	Recall@5	95% CI
YourMemory (BM25 + vector + graph + decay)	59%	56–61%
Zep Cloud	28%	26–30%
Supermemory	31%*	28–33%
Mem0	18%*	16–20%

2× better recall than Zep Cloud across all 10 samples. * Supermemory and Mem0 exhausted free-tier quotas mid-benchmark; scores computed over full 1,534 pairs using 0 for unfinished samples.

HotpotQA — 200 multi-hop questions requiring two facts from different articles

System	BOTH_FOUND@5
YourMemory (vector + BM25 + entity graph)	71.5%
YourMemory (no entity edges)	59.5%

Entity graph edges add +12 pp — they traverse from Fact 1 to Fact 2 even when Fact 2 has low embedding similarity to the query.

Writeup: I built memory decay for AI agents using the Ebbinghaus forgetting curve

Quick Start

Supports Python 3.11–3.14. No Docker, no database setup. All memory stored locally in ~/.yourmemory/.

Before you install — what this does

Behavior	Detail
Activation	Requires a one-time token. Visit yourmemoryai.xyz, enter your email, verify with a 6-digit code, and copy your token.
Global rule injection	`yourmemory-setup` writes memory instructions into `~/.cursor/rules/memory.mdc` and other detected AI client config files (Claude, VS Code, etc.) so the assistant can call memory tools automatically. You can remove these files at any time.
MCP tool behavior	The `recall_memory` tool can be called by your AI assistant when persistent context would help. The assistant decides when to call it based on the request.
Telemetry	A UUID (no personal data) is sent on first setup only. Opt out: `YOURMEMORY_TELEMETRY=off`

Activation steps:

Visit yourmemoryai.xyz and enter your email
Check your inbox for a 6-digit verification code
Enter the code on the website — your token is shown instantly
Run the three commands below:

pip install yourmemory
yourmemory-register <your-token>
yourmemory-setup

Requirement — local model: YourMemory extracts memories with a local model via Ollama. Install Ollama and start it — yourmemory-setup then pulls the default model (qwen2.5:7b, ~4.7 GB) automatically. To use a lighter model you already have, set YOURMEMORY_OLLAMA_MODEL (e.g. llama3.2:3b) before setup.

Backend: yourmemory-setup asks whether to use DuckDB (zero setup, default) or Postgres (shared/production — you provide a DATABASE_URL; needs the pgvector extension).

Memory Dashboard

Two built-in browser UIs — no extra setup, start automatically with the MCP server.

Memory Browser — `http://localhost:3033/ui`

A full read/write view of everything stored in memory.

What you see	Details
Stats bar	Total · Strong ≥50% · Fading 5–50% · Near prune <10%
Agent tabs	All / User / per-agent views
Memory cards	Content · strength bar · category · recall count · last accessed
Filters	Category (fact / strategy / assumption / failure) · Sort by strength, recency, recall

Pass ?user=<id> to pre-load a specific user: http://localhost:3033/ui?user=sachit

Graph Visualiser — `http://localhost:3033/graph`

An interactive force-directed map of how memories connect.

http://localhost:3033/graph?memoryId=42&userId=sachit&depth=2

Root memory as a larger cyan node; neighbours color-coded by category
Edge thickness = connection strength
Click any node for full content; drag, zoom, reposition freely

Ask Without Calling the API

The only memory system that can answer questions without making any LLM API call.

yourmemory ask "what database does this project use"
# → YourMemory uses DuckDB locally and Postgres in production.

yourmemory ask "what port does the dashboard run on"
# → 3033

yourmemory ask "how do I fix a kubernetes deployment"
# → Not enough memory context to answer without Claude.

When memory is strong enough, it answers instantly — zero tokens, zero cloud cost, zero latency. When it isn't, it declines cleanly rather than hallucinating.

Query	Mem0 / Zep / LangMem	YourMemory
"What port does the server run on?"	Full LLM API call	Instant, $0
"What database does this project use?"	Full LLM API call	Instant, $0
"How do I fix a k8s deployment?"	Full LLM API call	Declines → Claude
Privacy	Query sent to cloud	Never leaves your machine

API Proxy — Guaranteed Memory

MCP tools are called at the AI's discretion. The API proxy removes that uncertainty — it intercepts every LLM call, injects relevant memories automatically, and handles store_memory / update_memory without any model configuration.

Start the YourMemory server (yourmemory), then point your LLM client at localhost:3033:

OpenAI

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="http://localhost:3033/proxy/openai"
)

# Memory is injected automatically — no other changes needed
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What database do I use?"}]
)

Anthropic

from anthropic import Anthropic

client = Anthropic(
    api_key="sk-ant-...",
    base_url="http://localhost:3033/proxy/anthropic"
)

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What database do I use?"}]
)

Per-user memory

Pass X-YourMemory-User to isolate memory per person:

client = OpenAI(
    api_key="sk-...",
    base_url="http://localhost:3033/proxy/openai",
    default_headers={"X-YourMemory-User": "sachit"}
)

How it works

On every request the proxy:

Recalls relevant memories and injects them into the system prompt — guaranteed, no tool call needed
Adds store_memory and update_memory as tools — the model calls them when it learns something new
Executes those tool calls locally and returns the final response transparently

Streaming note: recall injection works for all requests. Tool call interception (store/update) works for non-streaming requests only — streaming passes through and tools execute on the next turn.

MCP Tools

Three tools, called by your AI automatically.

Tool	When your AI calls it	What it does
`recall_memory(query, current_path?)`	Start of every task	Surfaces memories ranked by similarity × decay strength; spatial boost for path-matched memories
`store_memory(content, importance, category?, context_paths?)`	After learning something new	Embeds, deduplicates, stores with decay; tags optional file/dir paths
`update_memory(id, new_content, importance)`	When a stored fact is outdated	Re-embeds and replaces; logs old content to audit trail

# Store with spatial context
store_memory(
    "Sachit prefers tabs over spaces in Python",
    importance=0.9,
    category="fact",
    context_paths=["/projects/backend"]
)

# Next session — spatial boost fires when working in that directory
recall_memory("Python formatting", current_path="/projects/backend")
# → {"content": "Sachit prefers tabs over spaces in Python", "strength": 0.87}

Memory categories control decay rate

Category	Half-life	Best for
`strategy`	~38 days	Patterns that worked, architectural decisions
`fact`	~24 days	Preferences, identity, stable knowledge
`assumption`	~19 days	Inferred context, uncertain beliefs
`failure`	~11 days	Errors, wrong approaches, environment-specific issues

How It Works

Ebbinghaus Forgetting Curve

Memory strength decays exponentially. Importance and recall frequency slow that decay:

effective_λ  = base_λ × (1 − importance × 0.8)
strength     = clamp(importance × e^(−effective_λ × active_days) × (1 + recall_count × 0.2), 0, 1)
hybrid_score = 0.4 × bm25_norm + 0.6 × cosine_similarity

active_days counts only days the user was active — vacations don't cause memory loss. Memories below strength 0.05 are pruned automatically every 24 hours.

Session wrap-up: recalled memory IDs are tracked per session. When a session goes idle (30 min default), those memories get a recall_count boost. Set YOURMEMORY_SESSION_IDLE to change the window.

Recall throttling: identical (user, query) pairs are cached within a configurable window. Set YOURMEMORY_RECALL_COOLDOWN (seconds, default 0 = off).

Hybrid Retrieval: Vector + BM25 + Entity Graph

Retrieval runs in two rounds:

Round 1 — Hybrid search: cosine similarity + BM25 keyword scoring, returns top-k candidates above threshold.

Round 2 — Graph expansion: BFS traversal from Round 1 seeds surfaces memories that share context but not vocabulary — connected via semantic or entity edges.

recall("Python backend")
  Round 1 → [1] Python/MongoDB    (sim=0.61)
             [2] DuckDB/spaCy     (sim=0.19)
  Round 2 → [5] Docker/Kubernetes (sim=0.29 — below cut-off, surfaced via shared entity "backend")

Chain-aware pruning: a decayed memory is kept alive if any graph neighbour is above the prune threshold. Related memories age together.

Subject-Aware Deduplication

Before storing, YourMemory checks whether the new memory is about the same entity as the nearest existing one:

"Sachit uses DuckDB"      vs  "YourMemory uses DuckDB"
 subject: Sachit               subject: YourMemory
 → different entities → stored separately ✓

"YourMemory uses DuckDB"  vs  "YourMemory stores data in DuckDB"
 subject: YourMemory           subject: YourMemory
 → same entity → merged ✓

Subject comparison embeds the first two tokens of each sentence — no hardcoded word lists, generalises to any language.

Multi-Agent Memory

Multiple agents can share one YourMemory instance — each with isolated private memories and controlled access to shared context.

from src.services.api_keys import register_agent

result = register_agent(
    agent_id="coding-agent",
    user_id="sachit",
    can_read=["shared", "private"],
    can_write=["shared", "private"],
)
# → result["api_key"]  — ym_xxxx (shown once only)

# Agent stores a private failure memory
store_memory(
    "Staging uses self-signed cert — skip SSL verify",
    importance=0.7, category="failure",
    api_key="ym_xxxx", visibility="private"
)

# Recalls shared + its own private memories; other agents see shared only
recall_memory("staging SSL", api_key="ym_xxxx")

Stack

Component	Role
DuckDB	Default vector DB — zero setup, native cosine similarity
NetworkX	Default graph backend — persists at `~/.yourmemory/graph.pkl`
sentence-transformers	Local embeddings (`multi-qa-mpnet-base-dot-v1`, 768 dims)
spaCy	Local NLP for deduplication and entity extraction
APScheduler	Automatic 24h decay and pruning job
PostgreSQL + pgvector	Optional — for teams or large datasets
Neo4j	Optional graph backend

Architecture

Claude / Cline / Cursor / Any MCP client
    │
    ├── recall_memory(query, current_path?, api_key?)
    │       └── throttle check → embed → hybrid search (Round 1)
    │               → graph BFS expansion (Round 2)
    │               → score = sim × strength
    │               → spatial boost (+0.08) if current_path matches context_paths
    │               → temporal boost (+0.25) if query has time window expression
    │               → session tracking → recall_count bump on session end
    │
    ├── store_memory(content, importance, category?, context_paths?, api_key?)
    │       └── question? → reject
    │               subject-aware dedup → same entity? merge/reinforce : new
    │               embed() → INSERT → index_memory() → graph node + edges
    │               record_activity(user_id) → active days log
    │
    └── update_memory(id, new_content, importance)
            └── log old content → memory_history (audit trail)
                    embed(new_content) → UPDATE → refresh graph node

  Vector DB (Round 1)              Graph DB (Round 2)
  DuckDB (default)                 NetworkX (default)
    memories.duckdb                  graph.pkl
    ├── embedding FLOAT[768]         ├── nodes: memory_id, strength
    ├── importance FLOAT             └── edges: sim × verb_weight ≥ 0.4
    ├── recall_count INTEGER
    ├── context_paths JSON         Neo4j (opt-in)
    ├── created_at TIMESTAMP         └── bolt://localhost:7687
    ├── visibility VARCHAR
    ├── agent_id VARCHAR
    user_activity  (active days log)
    memory_history (supersession audit)

Troubleshooting

Writes hang / time out in Claude Desktop

Symptom: store_memory or update_memory never returns; the MCP server appears frozen.

Cause: DuckDB enforces a single-writer-per-process constraint. If you also have the YourMemory HTTP server running (e.g. for Claude Code hooks), both processes compete for the same write lock and one hangs indefinitely.

Fix — kill the lock holder and restart:

# Kill any lingering YourMemory process holding the DuckDB write lock
pkill -f yourmemory 2>/dev/null || true

# Remove stale DuckDB WAL/lock files if the process exited uncleanly
rm -f ~/.yourmemory/memories.duckdb.wal \
      ~/.yourmemory/memories.duckdb.lock 2>/dev/null || true

# Restart Claude Desktop

As of v1.4.57+, DuckDB connections time out after 8 seconds and surface this exact error message with the fix above instead of hanging forever.

If you run both Claude Desktop (MCP) and Claude Code (hooks) at the same time: Use the environment variable DATABASE_URL=sqlite:///~/.yourmemory/memories.db in your MCP server config. SQLite's WAL mode handles concurrent readers/writers cleanly and has no single-writer process limit.

Contributing

PRs are welcome. See CONTRIBUTORS.md for contributors who have already improved YourMemory.

Dataset References

LoCoMo — Maharana et al. (2024). LoCoMo: Long Context Multimodal Benchmark for Dialogue. Snap Research.
LongMemEval — Wu et al. (2024). LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory.
HotpotQA — Yang et al. (2018). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering.

License

Free for: personal use, education, academic research, open-source projects. Not permitted: commercial use without a separate written agreement.

Commercial licensing: mishrasachit1@gmail.com

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Configuration

DATABASE_URL*

PostgreSQL connection string with pgvector. Example: postgresql://localhost:5432/yourmemory

OLLAMA_URL

Ollama server URL for local embeddings (nomic-embed-text) and classification (llama3.2:3b). Default: http://localhost:11434

EXTRACT_MODEL

Ollama model used for fact/assumption classification. Default: llama3.2:3b

YourMemory

Persistent memory for AI agents — built on the science of how humans remember.

What Is YourMemory?

Every session, your AI assistant starts from zero. It asks the same questions, forgets your preferences, re-learns your stack. There is no memory between conversations.

YourMemory fixes that with a one-command install that plugs into Claude, Cursor, Cline, Windsurf, or any MCP client. It gives your AI a persistent memory layer modelled on human cognition:

Things that matter stick — importance score controls how quickly a memory decays
Outdated facts get replaced — subject-aware deduplication merges or supersedes memories automatically
Related context surfaces together — entity graph links memories that share people, places, or concepts
Old memories fade naturally — Ebbinghaus forgetting curve prunes stale context every 24 hours

Zero infrastructure required. SQLite by default, Postgres for teams.

Benchmarks
Quick Start
Memory Dashboard
Ask Without an LLM Call
API Proxy — Guaranteed Memory
MCP Tools
How It Works
Multi-Agent Memory
Stack
Architecture
Troubleshooting
Contributing

Benchmarks

Three external datasets, all scripts open source and reproducible. Full methodology in BENCHMARKS.md.

LongMemEval-S — 500 questions, ~53 distractor sessions each

The hardest standard benchmark for long-term memory systems. Each question is backed by ~53 conversation sessions; the model must retrieve the right one(s) from the haystack.

Metric	Score
Recall@5 (any gold session in top-5)	89.4%
Recall-all@5 (all gold sessions in top-5)	84.8%
nDCG@5 (ranking quality)	87.4%

By question type (Recall@5):

Question Type	Recall@5	n
single-session-assistant	98.2%	56
knowledge-update	96.2%	78
multi-session	95.5%	133
single-session-preference	90.0%	30
temporal-reasoning	84.2%	133
single-session-user	72.9%	70

LoCoMo-10 — 1,534 QA pairs across 10 multi-session conversations

Conversations spanning weeks to months. Every system ingests the same session summaries in the same order.

System	Recall@5	95% CI
YourMemory (BM25 + vector + graph + decay)	59%	56–61%
Zep Cloud	28%	26–30%
Supermemory	31%*	28–33%
Mem0	18%*	16–20%

2× better recall than Zep Cloud across all 10 samples. * Supermemory and Mem0 exhausted free-tier quotas mid-benchmark; scores computed over full 1,534 pairs using 0 for unfinished samples.

HotpotQA — 200 multi-hop questions requiring two facts from different articles

System	BOTH_FOUND@5
YourMemory (vector + BM25 + entity graph)	71.5%
YourMemory (no entity edges)	59.5%

Entity graph edges add +12 pp — they traverse from Fact 1 to Fact 2 even when Fact 2 has low embedding similarity to the query.

Writeup: I built memory decay for AI agents using the Ebbinghaus forgetting curve

Quick Start

Supports Python 3.11–3.14. No Docker, no database setup. All memory stored locally in ~/.yourmemory/.

Before you install — what this does

Behavior	Detail
Activation	Requires a one-time token. Visit yourmemoryai.xyz, enter your email, verify with a 6-digit code, and copy your token.
Global rule injection	`yourmemory-setup` writes memory instructions into `~/.cursor/rules/memory.mdc` and other detected AI client config files (Claude, VS Code, etc.) so the assistant can call memory tools automatically. You can remove these files at any time.
MCP tool behavior	The `recall_memory` tool can be called by your AI assistant when persistent context would help. The assistant decides when to call it based on the request.
Telemetry	A UUID (no personal data) is sent on first setup only. Opt out: `YOURMEMORY_TELEMETRY=off`

Activation steps:

Visit yourmemoryai.xyz and enter your email
Check your inbox for a 6-digit verification code
Enter the code on the website — your token is shown instantly
Run the three commands below:

pip install yourmemory
yourmemory-register <your-token>
yourmemory-setup

Requirement — local model: YourMemory extracts memories with a local model via Ollama. Install Ollama and start it — yourmemory-setup then pulls the default model (qwen2.5:7b, ~4.7 GB) automatically. To use a lighter model you already have, set YOURMEMORY_OLLAMA_MODEL (e.g. llama3.2:3b) before setup.

Backend: yourmemory-setup asks whether to use DuckDB (zero setup, default) or Postgres (shared/production — you provide a DATABASE_URL; needs the pgvector extension).

Memory Dashboard

Two built-in browser UIs — no extra setup, start automatically with the MCP server.

Memory Browser — `http://localhost:3033/ui`

A full read/write view of everything stored in memory.

What you see	Details
Stats bar	Total · Strong ≥50% · Fading 5–50% · Near prune <10%
Agent tabs	All / User / per-agent views
Memory cards	Content · strength bar · category · recall count · last accessed
Filters	Category (fact / strategy / assumption / failure) · Sort by strength, recency, recall

Pass ?user=<id> to pre-load a specific user: http://localhost:3033/ui?user=sachit

Graph Visualiser — `http://localhost:3033/graph`

An interactive force-directed map of how memories connect.

http://localhost:3033/graph?memoryId=42&userId=sachit&depth=2

Root memory as a larger cyan node; neighbours color-coded by category
Edge thickness = connection strength
Click any node for full content; drag, zoom, reposition freely

Ask Without Calling the API

The only memory system that can answer questions without making any LLM API call.

yourmemory ask "what database does this project use"
# → YourMemory uses DuckDB locally and Postgres in production.

yourmemory ask "what port does the dashboard run on"
# → 3033

yourmemory ask "how do I fix a kubernetes deployment"
# → Not enough memory context to answer without Claude.

When memory is strong enough, it answers instantly — zero tokens, zero cloud cost, zero latency. When it isn't, it declines cleanly rather than hallucinating.

Query	Mem0 / Zep / LangMem	YourMemory
"What port does the server run on?"	Full LLM API call	Instant, $0
"What database does this project use?"	Full LLM API call	Instant, $0
"How do I fix a k8s deployment?"	Full LLM API call	Declines → Claude
Privacy	Query sent to cloud	Never leaves your machine

API Proxy — Guaranteed Memory

Start the YourMemory server (yourmemory), then point your LLM client at localhost:3033:

OpenAI

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="http://localhost:3033/proxy/openai"
)

# Memory is injected automatically — no other changes needed
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What database do I use?"}]
)

Anthropic

from anthropic import Anthropic

client = Anthropic(
    api_key="sk-ant-...",
    base_url="http://localhost:3033/proxy/anthropic"
)

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What database do I use?"}]
)

Per-user memory

Pass X-YourMemory-User to isolate memory per person:

client = OpenAI(
    api_key="sk-...",
    base_url="http://localhost:3033/proxy/openai",
    default_headers={"X-YourMemory-User": "sachit"}
)

How it works

On every request the proxy:

Recalls relevant memories and injects them into the system prompt — guaranteed, no tool call needed
Adds store_memory and update_memory as tools — the model calls them when it learns something new
Executes those tool calls locally and returns the final response transparently

Streaming note: recall injection works for all requests. Tool call interception (store/update) works for non-streaming requests only — streaming passes through and tools execute on the next turn.

MCP Tools

Three tools, called by your AI automatically.

Tool	When your AI calls it	What it does
`recall_memory(query, current_path?)`	Start of every task	Surfaces memories ranked by similarity × decay strength; spatial boost for path-matched memories
`store_memory(content, importance, category?, context_paths?)`	After learning something new	Embeds, deduplicates, stores with decay; tags optional file/dir paths
`update_memory(id, new_content, importance)`	When a stored fact is outdated	Re-embeds and replaces; logs old content to audit trail

# Store with spatial context
store_memory(
    "Sachit prefers tabs over spaces in Python",
    importance=0.9,
    category="fact",
    context_paths=["/projects/backend"]
)

# Next session — spatial boost fires when working in that directory
recall_memory("Python formatting", current_path="/projects/backend")
# → {"content": "Sachit prefers tabs over spaces in Python", "strength": 0.87}

Memory categories control decay rate

Category	Half-life	Best for
`strategy`	~38 days	Patterns that worked, architectural decisions
`fact`	~24 days	Preferences, identity, stable knowledge
`assumption`	~19 days	Inferred context, uncertain beliefs
`failure`	~11 days	Errors, wrong approaches, environment-specific issues

How It Works

Ebbinghaus Forgetting Curve

Memory strength decays exponentially. Importance and recall frequency slow that decay:

effective_λ  = base_λ × (1 − importance × 0.8)
strength     = clamp(importance × e^(−effective_λ × active_days) × (1 + recall_count × 0.2), 0, 1)
hybrid_score = 0.4 × bm25_norm + 0.6 × cosine_similarity

active_days counts only days the user was active — vacations don't cause memory loss. Memories below strength 0.05 are pruned automatically every 24 hours.

Recall throttling: identical (user, query) pairs are cached within a configurable window. Set YOURMEMORY_RECALL_COOLDOWN (seconds, default 0 = off).

Hybrid Retrieval: Vector + BM25 + Entity Graph

Retrieval runs in two rounds:

Round 1 — Hybrid search: cosine similarity + BM25 keyword scoring, returns top-k candidates above threshold.

Round 2 — Graph expansion: BFS traversal from Round 1 seeds surfaces memories that share context but not vocabulary — connected via semantic or entity edges.

recall("Python backend")
  Round 1 → [1] Python/MongoDB    (sim=0.61)
             [2] DuckDB/spaCy     (sim=0.19)
  Round 2 → [5] Docker/Kubernetes (sim=0.29 — below cut-off, surfaced via shared entity "backend")

Chain-aware pruning: a decayed memory is kept alive if any graph neighbour is above the prune threshold. Related memories age together.

Subject-Aware Deduplication

Before storing, YourMemory checks whether the new memory is about the same entity as the nearest existing one:

"Sachit uses DuckDB"      vs  "YourMemory uses DuckDB"
 subject: Sachit               subject: YourMemory
 → different entities → stored separately ✓

"YourMemory uses DuckDB"  vs  "YourMemory stores data in DuckDB"
 subject: YourMemory           subject: YourMemory
 → same entity → merged ✓

Subject comparison embeds the first two tokens of each sentence — no hardcoded word lists, generalises to any language.

Multi-Agent Memory

Multiple agents can share one YourMemory instance — each with isolated private memories and controlled access to shared context.

from src.services.api_keys import register_agent

result = register_agent(
    agent_id="coding-agent",
    user_id="sachit",
    can_read=["shared", "private"],
    can_write=["shared", "private"],
)
# → result["api_key"]  — ym_xxxx (shown once only)

# Agent stores a private failure memory
store_memory(
    "Staging uses self-signed cert — skip SSL verify",
    importance=0.7, category="failure",
    api_key="ym_xxxx", visibility="private"
)

# Recalls shared + its own private memories; other agents see shared only
recall_memory("staging SSL", api_key="ym_xxxx")

Stack

Component	Role
DuckDB	Default vector DB — zero setup, native cosine similarity
NetworkX	Default graph backend — persists at `~/.yourmemory/graph.pkl`
sentence-transformers	Local embeddings (`multi-qa-mpnet-base-dot-v1`, 768 dims)
spaCy	Local NLP for deduplication and entity extraction
APScheduler	Automatic 24h decay and pruning job
PostgreSQL + pgvector	Optional — for teams or large datasets
Neo4j	Optional graph backend

Architecture

Claude / Cline / Cursor / Any MCP client
    │
    ├── recall_memory(query, current_path?, api_key?)
    │       └── throttle check → embed → hybrid search (Round 1)
    │               → graph BFS expansion (Round 2)
    │               → score = sim × strength
    │               → spatial boost (+0.08) if current_path matches context_paths
    │               → temporal boost (+0.25) if query has time window expression
    │               → session tracking → recall_count bump on session end
    │
    ├── store_memory(content, importance, category?, context_paths?, api_key?)
    │       └── question? → reject
    │               subject-aware dedup → same entity? merge/reinforce : new
    │               embed() → INSERT → index_memory() → graph node + edges
    │               record_activity(user_id) → active days log
    │
    └── update_memory(id, new_content, importance)
            └── log old content → memory_history (audit trail)
                    embed(new_content) → UPDATE → refresh graph node

  Vector DB (Round 1)              Graph DB (Round 2)
  DuckDB (default)                 NetworkX (default)
    memories.duckdb                  graph.pkl
    ├── embedding FLOAT[768]         ├── nodes: memory_id, strength
    ├── importance FLOAT             └── edges: sim × verb_weight ≥ 0.4
    ├── recall_count INTEGER
    ├── context_paths JSON         Neo4j (opt-in)
    ├── created_at TIMESTAMP         └── bolt://localhost:7687
    ├── visibility VARCHAR
    ├── agent_id VARCHAR
    user_activity  (active days log)
    memory_history (supersession audit)

Troubleshooting

Writes hang / time out in Claude Desktop

Symptom: store_memory or update_memory never returns; the MCP server appears frozen.

Fix — kill the lock holder and restart:

# Kill any lingering YourMemory process holding the DuckDB write lock
pkill -f yourmemory 2>/dev/null || true

# Remove stale DuckDB WAL/lock files if the process exited uncleanly
rm -f ~/.yourmemory/memories.duckdb.wal \
      ~/.yourmemory/memories.duckdb.lock 2>/dev/null || true

# Restart Claude Desktop

As of v1.4.57+, DuckDB connections time out after 8 seconds and surface this exact error message with the fix above instead of hanging forever.

Contributing

PRs are welcome. See CONTRIBUTORS.md for contributors who have already improved YourMemory.

Dataset References

LoCoMo — Maharana et al. (2024). LoCoMo: Long Context Multimodal Benchmark for Dialogue. Snap Research.
LongMemEval — Wu et al. (2024). LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory.
HotpotQA — Yang et al. (2018). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering.

License

Free for: personal use, education, academic research, open-source projects. Not permitted: commercial use without a separate written agreement.

Commercial licensing: mishrasachit1@gmail.com

Cognitive Ai Memory

YourMemory

What Is YourMemory?

Table of Contents

Benchmarks

LongMemEval-S — 500 questions, ~53 distractor sessions each

LoCoMo-10 — 1,534 QA pairs across 10 multi-session conversations

HotpotQA — 200 multi-hop questions requiring two facts from different articles

Quick Start

Before you install — what this does

Memory Dashboard

Memory Browser — http://localhost:3033/ui

Graph Visualiser — http://localhost:3033/graph

Ask Without Calling the API

API Proxy — Guaranteed Memory

OpenAI

Anthropic

Per-user memory

How it works

MCP Tools

Memory categories control decay rate

How It Works

Ebbinghaus Forgetting Curve

Hybrid Retrieval: Vector + BM25 + Entity Graph

Subject-Aware Deduplication

Multi-Agent Memory

Stack

Architecture

Troubleshooting

Writes hang / time out in Claude Desktop

Contributing

Dataset References

License

Configuration

Cognitive Ai Memory

YourMemory

What Is YourMemory?

Table of Contents

Benchmarks

LongMemEval-S — 500 questions, ~53 distractor sessions each

LoCoMo-10 — 1,534 QA pairs across 10 multi-session conversations

HotpotQA — 200 multi-hop questions requiring two facts from different articles

Quick Start

Before you install — what this does

Memory Dashboard

Memory Browser — http://localhost:3033/ui

Graph Visualiser — http://localhost:3033/graph

Ask Without Calling the API

API Proxy — Guaranteed Memory

OpenAI

Anthropic

Per-user memory

How it works

MCP Tools

Memory categories control decay rate

How It Works

Ebbinghaus Forgetting Curve

Hybrid Retrieval: Vector + BM25 + Entity Graph

Subject-Aware Deduplication

Multi-Agent Memory

Stack

Architecture

Troubleshooting

Writes hang / time out in Claude Desktop

Contributing

Dataset References

License

Configuration

Related AI & LLM Tools MCP Servers

Related AI & LLM Tools MCP Servers

Memory Browser — `http://localhost:3033/ui`

Graph Visualiser — `http://localhost:3033/graph`

Memory Browser — `http://localhost:3033/ui`

Graph Visualiser — `http://localhost:3033/graph`