A persistent knowledge graph for AI coding sessions that solves the cold start problem. Exposes six MCP tools: suma_ingest writes architectural decisions and bug fixes to a weighted graph, suma_search retrieves context by natural language query, suma_talk combines search and learning in one call, and suma_correct supersedes wrong information without deletion. Runs as a hosted service on Cloud Run, so no local server setup. The K-WIL gravity algorithm ranks facts by recency, density, semantic similarity, and emotional weight rather than flat chunking. Reach for this when you're tired of re-explaining your auth flow or database schema every time you open a new Claude chat, or when multiple agents need to share context across sessions without explicit handoffs.
Stop re-explaining your project to Claude every time you start a new chat.
Your repos now have permanent memory. SUMA gives any MCP-compatible AI client (Claude Code, Cursor, Devin) a persistent knowledge graph that remembers architectural decisions, bug root causes, and project rules — across sessions, across machines, across your entire team.
Get an API key at sumapro.quadframe.work — free tier available.
Add to your .mcp.json:
{
"mcpServers": {
"suma-memory": {
"url": "https://sumapro.quadframe.work/mcp",
"headers": {
"Authorization": "Bearer sk_live_your_key_here"
}
}
}
}
That's it. No local server. No Docker. No npm install. SUMA runs on Cloud Run — stateless, auto-scaled, always available.
After installing, run this once per repo to seed your permanent context:
suma_ingest(text="Project: [name]. Framework: [Next.js / Flask / etc].
Auth lives in: [path/to/auth.py]. Database: [PostgreSQL / SQLite / etc].
Rules never to break: [e.g. never store plaintext keys, all routes require org_id filter].
Deployment target: [Cloud Run / Vercel / etc].")
From this point forward, every new session inherits this context. You never explain it again.
SUMA stores knowledge in a weighted graph. Every node has a gravity score across four dimensions:
When you call suma_search, the K-WIL gravity algorithm traverses the graph and returns the highest-relevance context — not a flat list of chunks, not a raw embedding match, but the facts that actually matter for what you're doing right now.
| Tool | What it does |
|---|---|
suma_ping | Health check — verify connection and API key |
suma_ingest | Add knowledge to the graph (architecture decisions, bug fixes, rules) |
suma_search | Retrieve relevant context by natural language query |
suma_talk | Search + learn in one call — retrieves context and updates graph |
suma_correct | Fix wrong information — supersedes original, queues replacement |
suma_clean | Remove noise nodes that pollute search results |
# After finalizing a decision:
suma_ingest(text="We chose REST over GraphQL. Root cause: GraphQL N+1 queries
caused 3x latency on /search. Architect ruling Apr 10 2026.")
# Next session, cold start — full context in one call:
suma_search(query="why did we switch to REST?")
# → Returns ruling with full context. No re-explaining.
# After fixing a hard bug:
suma_ingest(text="Cloud Run WebSocket bug: asyncio.run() in daemon thread killed
by Cloud Run recycling. Fix: use asyncio.get_event_loop() instead.
Never use asyncio.run() in long-lived Cloud Run services.")
# Six months later, same error:
suma_search(query="asyncio cloud run daemon thread crash")
# → Root cause retrieved instantly. Hours saved.
Architect, developer, and QA agents each write to SUMA using their own sessions. Their knowledge merges into one shared org graph. When QA asks "what did the architect decide about auth?", it retrieves the architect's ruling — zero explicit handoff required.
Anti-flood protection: Each source machine is rate-limited to 5 ingests per 60 seconds. Runaway agent loops are broken gracefully — the 6th request returns {"status": "throttled"} without crashing or corrupting the graph.
Multi-tenant isolation: Every node is scoped to org_id at the database layer. Two organizations on the same Cloud Run instance cannot access each other's data — enforced by SQL, not application logic.
Immutable audit trail: suma_correct and suma_clean never delete data. Nodes are superseded and invisible to the API while preserved in storage for compliance.
| Metric | Value |
|---|---|
| Compression ratio | 94.7% — 801 nodes replace 15.2M tokens |
| Cost saved per org | $14.47 across 538 queries |
| K-WIL fidelity | 96.3% — 26/27 facts recoverable from 5-node graph |
| Automated tests | 118 (102 Playwright E2E + 16 pytest) |
| Plan | Queries/month | Price |
|---|---|---|
| Starter | 20,000 | Free |
| Developer | 100,000 | $4.99/mo |
| Team | 500,000 | $29/mo |
| Enterprise | Unlimited | Contact |
Get your key: sumapro.quadframe.work
© 2025–2026 Suman Addanke / A2 Vibe Creators LLC
US Patent applications pending — 6 filed (2025–2026). Unauthorized commercial use prohibited.
SUMA_API_KEY*secretYour SUMA Pro API key — get one free at sumapro.quadframe.work
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent