A memory layer for AI agents that runs entirely offline and skips the usual LLM-based fact extraction step. Instead of sending every conversation turn through an API to summarize it, Midas uses local embeddings to decide what matters, retrieve relevant context by meaning, and trace every recalled memory back to its source turn. The MCP server exposes tools for storing facts, querying by semantic similarity, and managing belief revision without per-message API costs. Designed for long-running coding agents and assistants where you want durable memory across sessions but don't want to pay for or trust a third party to process your entire conversation history. Everything lives in a local SQLite file with a local embedding model, so there's no network dependency after the initial model download.
The local memory layer for long-horizon AI agents — remembers across sessions, keeps what's current, and won't act on stale memory.
No LLM at ingest · $0 per message · fully local · every recall traces to its source.
Your AI assistant forgets everything between sessions. Midas is the memory that lives next to it, on your machine. Your coding agent remembers the decisions, conventions, and bugs from three sessions ago — without piping every message through an LLM to "extract" facts. It costs nothing per message, nothing leaves your computer, every memory traces back to the exact turn it came from, and it won't let an agent act on memory that's stale or never confirmed.
uv tool install "midas-memory[mcp,local]" # the midas-mcp command, for any MCP client
# or, no Python: npx -y midas-memory-mcp # TypeScript port
# or, as a library: pip install "midas-memory[local]"
Install in your agent · See the benchmarks · Team / Enterprise
Most memory tools call an LLM to summarize every session — so you pay in tokens forever, add latency, ship every turn to a provider, and get back rewritten facts you can't audit. Midas makes the opposite bet, and that bet is what makes it cheap, private, and trustworthy:
Finding a buried fact is table stakes. A long-horizon coding agent needs memory it can act on safely and resume from cleanly — which is where similarity search alone falls short:
| You ask… | Midas answers with | Why top-k recall can't |
|---|---|---|
| "Can I run this destructive migration?" | Guard: allowed only if you confirmed it, and only if that confirmation is still current | provenance + currency aren't a similarity match |
| "What's the current state of project Apollo?" | memory_state: the live, non-superseded decisions / constraints / facts | a broad "current state" query matches no single turn |
| "What changed since our last session?" | memory_diff: beliefs added, and beliefs revised (old → new) | "what's new" isn't a content query at all |
| "How do I speed up the transactions list?" | the prior fix resurfaces, so the agent doesn't re-diagnose it | — |
These properties are measured, not asserted — the Agent Continuity Bench scores action-safety, decision-adherence, and repeated-mistake avoidance across a scripted multi-session project (deterministic, no LLM).
Deterministic, reader-independent retrieval (recall@k — fraction of the gold supporting turns
pulled into context) on the full public sets, vs a recency-window baseline:
| Benchmark (full set) | baseline | Midas |
|---|---|---|
LongMemEval-s — 500 questions, 246,750 turns | 0.01 | 0.92 |
| LoCoMo — 10 conversations, n=1,540 | 0.05 | 0.73 |
| BEAM — frontier benchmark, 100K → 10M tokens | 0.00 | 0.56 → 0.32 |
And the cross-system metric, judged answer-rate (same gpt-4o judge the leaderboards use):
| Judged answer | baseline | Midas |
|---|---|---|
LongMemEval-s (gpt-4o reader, ties LLM-ingest SOTA at $0 ingest) | — | 0.84 |
| BEAM-100K (gpt-4o judge, raw-turn floor, $0 ingest) | 0.05 | 0.40 |
All of it at 0 LLM calls, $0, and 0 data egress at ingest. Full numbers, per-category breakdowns, reproduce commands, and the head-to-head vs Mem0/Zep/Mastra are in BENCHMARKS.md.
Eval-first means we publish the misses too. Hybrid retrieval, reranking, thread-diversification, dual-granularity indexing, and naive distillation were all measured to not help (or to hurt) and are documented as such. That honesty is the point — see BENCHMARKS.md and
docs/frontier-2026.md.
Midas is a standard MCP server: every client launches the same midas-mcp command with a few env
vars — only where you put the config differs. The universal block:
{
"mcpServers": {
"midas": {
"command": "midas-mcp",
"env": {
"MIDAS_MCP_EMBEDDER": "local",
"MIDAS_MCP_DB": "/home/you/.midas/memory.sqlite3",
"MIDAS_MCP_MAX_RECORDS": "50000",
"MIDAS_MCP_MIN_IMPORTANCE": "2"
}
}
}
}
Claude Code (CLI, no file editing):
claude mcp add midas -s user \
-e MIDAS_MCP_EMBEDDER=local -e MIDAS_MCP_DB="$HOME/.midas/memory.sqlite3" \
-e MIDAS_MCP_MAX_RECORDS=50000 -e MIDAS_MCP_MIN_IMPORTANCE=2 -- midas-mcp
| Client | Where the config goes |
|---|---|
| Cursor | ~/.cursor/mcp.json (all projects) or .cursor/mcp.json — paste the JSON block |
| Claude Desktop | Settings → Developer → Edit Config (claude_desktop_config.json) — paste the block, restart |
| Codex CLI | codex mcp add midas -- midas-mcp, or a [mcp_servers.midas] block in ~/.codex/config.toml (TOML) |
| Windsurf | ~/.codeium/windsurf/mcp_config.json — paste the block, refresh |
| Anything else | point it at command midas-mcp with those env vars |
| No Python | npx -y midas-memory-mcp — the TypeScript port, same tools/schema (experimental: no semantic embeddings yet) |
⚠️ #1 gotcha: GUI apps don't share your shell
PATH. If a client says "command not found", use the absolute path fromwhich midas-mcp(macOS/Linux) orwhere midas-mcp(Windows). On Windows use forward slashes in JSON paths.
Once connected, Midas injects a short policy into the agent (recall first, then capture durable
facts/decisions/preferences/constraints/corrections). The agent captures freely; Midas decides what's
kept — it scores importance (no LLM), drops trivia, skips duplicates, revises stale beliefs, and forgets
the low-value tail to stay bounded. Before any memory-justified external or destructive action, the agent
calls check_memory_use and is blocked unless you confirmed it (and that confirmation is still
current).
Point Claude Code, Claude Desktop, Cursor… at the same MIDAS_MCP_DB file and they share one live
memory — each detects the others' writes (SQLite data_version) and refreshes, so a fact captured in your
IDE is recallable from your chat app seconds later, no restarts. Scope it per project/agent/user with
MIDAS_MCP_NAMESPACE.
Real run, reconstructed chrome — the capture/recall lines are verbatim output of two separate processes sharing one file.
Tools: remember, capture (policy-gated auto-store), recall (source-traceable), build_context
(compact, dated, today-anchored prompt block), memory_state (current project state), memory_diff
(what changed since), check_memory_use (guard), memory_policy, maintain (dedup + forgetting, returns
a deletion audit), stats, forget (chain-safe), forget_matching (topic-level erasure, dry-run by
default), forget_all. Prompts: memory_session, distill.
Env: MIDAS_MCP_DB · MIDAS_MCP_EMBEDDER (local / hashing / multilingual / any fastembed id) ·
MIDAS_MCP_MAX_RECORDS · MIDAS_MCP_MIN_IMPORTANCE · MIDAS_MCP_NAMESPACE · MIDAS_MCP_ANN=1 (sub-linear
IVF for huge stores) · MIDAS_MCP_SUPERSEDE · MIDAS_MCP_NLI=1 (NLI-gated revision) ·
MIDAS_MCP_AUTO_MAINTAIN=<min> (idle-time upkeep) · MIDAS_MCP_PINNED (pin standing directives).
from midas import Memory, LocalEmbedder
mem = Memory(embedder=LocalEmbedder()) # fully local. (Or Memory() for a zero-setup offline embedder.)
mem.remember("Decision: the primary database is PostgreSQL.", kind="constraint", importance=5)
mem.remember("The launch date moved to September 14.", kind="fact", importance=5)
mem.capture("lol ok cool") # filler — auto-scored below the floor, skipped (no LLM)
mem.assemble("when do we launch?", token_budget=128) # prompt-ready, dated, source-traceable
for hit in mem.recall("which database did we pick?", limit=3):
print(f"{hit.score:.2f} {hit.record.content}") # each hit traces to its source
from midas import Memory, LocalEmbedder
from midas.nli import LocalNLI
from midas.sqlite_store import SQLiteStore
from midas.state import memory_state, memory_diff # the control-plane views
# Durable, shareable, no native extension. Safe across threads & processes (live data_version refresh).
mem = Memory(store=SQLiteStore("memory.db"), embedder=LocalEmbedder(),
supersede=True, nli=LocalNLI()) # a turn that CONTRADICTS an old belief supersedes it
# Control-plane: the current state of a project, and what changed since a point in time (no LLM):
memory_state(mem, scope={"project": "apollo"}) # live, non-superseded decisions/constraints/facts
memory_diff(mem, since=last_session_epoch) # {added: [...], revised: [(old, new), ...]}
mem.forget_decayed(max_records=50_000) # evict lowest value (importance × recency); protects facts
mem.recall("when is the launch?", as_of=1_700_000_000) # bitemporal: "what did we believe on date X"
# Right-to-be-forgotten — preview, then erase, with an audit trail:
mem.forget_matching("the user's home address", dry_run=True)
mem.forget_matching("the user's home address")
# Back LangGraph's long-term memory with Midas:
from midas.integrations.langgraph_store import MidasStore
store = MidasStore(); store.put(("user", "123"), "pref", {"text": "prefers dark mode"})
The core stays open source under Apache-2.0: local SQLite memory, MCP tools, SDKs, and the eval harness are free to use, fork, and embed. Paid work is focused on what teams need around that core:
| Option | For | Status |
|---|---|---|
| OSS | Local agent memory, SDK/MCP integration, reproducible benchmarks | Available now |
| Pro | Encrypted sync, backups, profiles, easier multi-machine setup | Planned |
| Team | Shared namespaces, hosted MCP, admin controls, audit trails, support | Founding customers |
| Enterprise / VPC | On-prem or VPC deployment, SSO/SAML, RBAC, SLA, DPA, custom integration | By arrangement |
| Eval Pack | Benchmark an agent-memory stack against BEAM / LongMemEval with raw outputs and failure traces | By arrangement |
The commercial line is deliberate: Midas does not monetize by closing the memory core. It monetizes operational trust, deployment support, team controls, and benchmark-grade evaluation.
Midas is early but built narrow and measured-first. Where it stands, plainly:
docs/frontier-2026.md §2b.)memory_state / memory_diff,
the provenance guard that won't act on stale or unconfirmed memory, and the
Agent Continuity Bench that measures those properties. Local, auditable, and
honest about what's proven.eval/ (dev-only) runs Midas and competitors through synthetic / LoCoMo / LongMemEval / multiday /
conflicts-v1 / BEAM with deterministic recall@k + precision@k, cost/latency instrumentation, a
dumb-reader ablation (proves the numbers aren't reader-inflated), and an optional local-or-hosted LLM
judge. The anti-cheating checklist (no query rewriting, no LLM at ingest, no gold leakage, seeded sampling),
conflict handling, failure traces, and the verbatim MCP policy are in
docs/methodology.md.
python -m eval.runner --dataset longmemeval --variant s --local --midas-no-rerank --max-questions 40
python -m eval.runner --dataset beam --beam-tier 100K --local --dumb-reader # frontier benchmark
python -m eval.continuity # Agent Continuity Bench
Local-first: every memory lives in a SQLite file on your machine, recall returns the exact stored text,
and capture/recall/forget make no network calls. No account, API key, or telemetry. The only outbound
traffic is a one-time embedding-model download (for the local backend) and the package install. Full
details in PRIVACY.md · Apache-2.0.
MIDAS_MCP_DBPath to a SQLite file to persist memory across restarts (default: in-memory)
MIDAS_MCP_EMBEDDERdefault: localEmbedding backend: 'local' (bge ONNX, offline) or 'hashing' (default: local)
MIDAS_MCP_MAX_RECORDSCap the store; above it the lowest-value memories are auto-forgotten (no LLM)
MIDAS_MCP_MIN_IMPORTANCEdefault: 2Relevance floor 1-5 for auto-capture; turns scoring below it are skipped
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent