CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Sophon

lacausecrypto/mcp-sophon
5STDIOregistry active
Summary

A deterministic context compression layer that runs as a single Rust binary with no ML dependencies at query time. Exposes 11 MCP tools including compress_prompt for structured input, compress_history for conversation memory, compress_output with 21 domain-aware filters for shell commands, and read_file_delta/write_file_delta for incremental edits. Built to stack in front of provider caching and memory systems like mem0, targeting the dynamic conversation blocks those layers don't touch. The real-data benchmarks show 84.7% token savings across a weighted blend of history, shell output, file reads, and search operations measured on the repo's own development cycle. Reach for this when you're burning tokens on repeated file reads, verbose tool output, or growing conversation windows in long-running agent sessions.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

Sophon

Deterministic context compression for MCP agents. One Rust binary. Zero ML at query time. Reproducible benchmarks, real-data measurements.

npm version npm total downloads GitHub release CI License: MIT Rust 1.75+ MCP Tests

Sophon is a deterministic context layer for agents speaking the Model Context Protocol. It compresses prompts, conversation memory, code digests, file deltas, and shell output — without an embedding model at query time, without a GPU, and without API keys.

Single 5.2 MB Rust binary. MCP-native. cl100k_base-accurate. Default build pulls no Python, no ML weights, no network.


What it does, in 30 seconds

ToolWhat it solves
compress_promptLong structured prompt → keep only sections relevant to the query
compress_historyGrowing conversation → summary + facts + recent window + optional retrieval
compress_outputShell stdout/stderr → 21 domain-aware filters (git, cargo, docker, kubectl, JSON, …)
read_file_delta / write_file_deltaRe-reads + edits → diffs only, never the whole file
encode_fragmentsRepeated boilerplate → single token reference
update_memoryAppend turn → JSONL persist + incremental rolling summary
navigate_codebaseRepo digest with tree-sitter / regex + PageRank, ranked by query

11 MCP tools total (full table below).


Real numbers — measured on this repo's own dev cycle

We built four independent benches that each capture a different chunk of an agent's tool traffic. All four run against this repo's actual git history + working tree on the operator's machine. Reproducible byte-for-byte by anyone with cargo build --release.

DimensionWhat it measuresSavedBench
historycompress_history over real commits94.6 %real_session_capture.py
shellcompress_output on real git/cargo/gh/ls stdout84.4 %real_session_shell.py
filereadscompress_prompt on real Rust / Python / Markdown / TOML files71.7 %real_session_filereads.py
searchcompress_output on real grep/find patterns79.5 %real_session_search.py
🎯 Weighted blend (35/30/20/15)typical agent session estimate84.7 %real_session_holistic.py

real_session_holistic.py runs all four sub-benches with --json, parses them, and produces the weighted blend. Default weights reflect this repo's observed shape; pass --weights "history=0.4,..." to model your own workload.

USD economy on Claude Opus 4.7

Saved per session
Naive input pricing ($15/MT)$2.03
With prompt caching (25-turn reads at $1.50/MT)$3.24

Pass --model sonnet or --model haiku to real_session_deep_dive.py if you're re-pricing for a cheaper tier.

Where each dimension falls short (we say it ourselves)

  • history measures only what git captures (commits + diffs) — typically ~5-10 % of a real session's tool traffic. The 94.6 % is the upper bound, not the typical case.
  • shell mixes commands that compress well (git diff 95 %) with commands that don't (gh repo view --json adds tokens, −9 %). 84.4 % is a real-world average, not a curated highlight.
  • filereads uncovered that compress_prompt on raw source files compresses by budget cap, not by query routing — same file with 3 different queries → identical output. Section detection only fires on structured input (Markdown headers, XML tags). Documented inline in the bench.
  • search depends entirely on YOUR repo's state. A repo with no TODOs gets 0 % on grep TODO.

The blended 84.7 % is napkin-math from a linear weighted average across four real measurements. Not a cherry-picked synthetic. Run the benches yourself to verify.

Other reproducible benchmarks (synthetic, on-thesis)

TestResultBench
compress_output across 18 command families90.1 % weighted aggregatecompress_output_per_command.py
25-turn synthetic Claude Code session68.1 % session tokens savedsession_token_economics.py
compress_prompt across 22 prompt shapes70.2 % mean, 36 ms mean latencyprompt_compression_extended.py
Code retrieval on "where is X?" questionsrecall@3 = 70 % (vs grep 10 %, FULL 20 %)repo_qa.py
vs LLMLingua-2 on structured prompts+8.9 pt accuracy at 35× lower latencyllmlingua_compare.py
Sophon + Anthropic prompt caching+24 % tokens / +49 % $ on top of cachingsophon_plus_prompt_caching.py
Sophon + mem0Additional savings on retrieved memoriessophon_plus_mem0.py

Why Sophon — "in front of X"

Sophon is not a memory platform, a recall system, an OCR stack, or a replacement for provider-side caching. It's a deterministic compressor that slots in front of whatever memory / cache / code-nav layer you already use, and attacks the tokens those layers can't.

In front of Anthropic / OpenAI prompt caching

Provider caching handles the static half of a request — system prompt, tool definitions, reused documents. It doesn't touch the dynamic half (growing conversation history, tool outputs). Sophon compresses exactly that half. The two stack cleanly.

+24 % tokens / +49 % $ saved on top of prompt caching on a 25-turn Claude session — because the uncached dynamic block is billed at 10× the cached rate. See sophon_plus_prompt_caching.py.

In front of mem0 / Letta / Zep / Graphiti

Memory systems retrieve the right memories. Sophon shrinks what gets sent to the LLM after retrieval. If mem0 returns 2 kB of raw memories, compress_prompt keeps only the sections the query actually references.

Honest caveat: on very short retrieved blocks (< ~200 tokens) Sophon's wrapper adds overhead and you should pass through. The bench reports this directly.

In front of Claude Code / Cursor / Cline

Primary use case. Every repeat file read becomes a read_file_delta; every shell command output goes through compress_output; every repeated boilerplate block gets a fragment_cache token. Install transparently with sophon hook install --agent claude --global.

In front of a RAG pipeline

navigate_codebase produces a PageRanked repo digest that a RAG retriever would otherwise spend expensive embedding calls to build. Tree-sitter / regex symbol extraction over 11 languages, sub-second.

When NOT to use Sophon

  • Long-form conversational recall above 80 % — Sophon caps at ~40 % on LOCOMO and we don't chase it. Run mem0 / Letta / Zep for recall, then optionally pipe their output through Sophon.
  • Multi-hop reasoning on massive documents — that's HippoRAG or GraphRAG.
  • OCR / PDF layout — out of scope. Use Docling / Marker / Unstructured upstream.
  • Very small inputs (< ~200 tokens) — Sophon's section scaffolding can cost more than it saves.

Quick start

Install via npm (recommended)

npm install -g mcp-sophon
sophon doctor          # verify install + show config

The postinstall script downloads the right prebuilt binary for your platform from the GitHub Releases page. Supported: macOS arm64/x64, Linux arm64/x64, Windows x64.

Build from source

git clone https://github.com/lacausecrypto/mcp-sophon
cd mcp-sophon/sophon
cargo build --release -p mcp-integration       # ~5.2 MB binary

Optional features:

# 11-language tree-sitter AST extraction (~25 MB):
cargo build --release -p mcp-integration --features codebase-navigator/tree-sitter

# BGE-small semantic embedder (~34 MB), activate with SOPHON_EMBEDDER=bge:
cargo build --release -p mcp-integration --features bge

# All features (~42 MB):
cargo build --release -p mcp-integration --features "codebase-navigator/tree-sitter,bge"

Requires Rust 1.75+.

Wire it into an MCP client

Most clients accept this snippet (Claude Desktop, Claude Code, Cursor, Cline, Continue):

{
  "mcpServers": {
    "sophon": {
      "command": "sophon",
      "args": ["serve"]
    }
  }
}

Run sophon doctor to print the right config path for your client.

Recommended runtime setup

# Persistent memory + on-disk retriever store + BM25+Hash hybrid
export SOPHON_MEMORY_PATH=~/.sophon/memory.jsonl
export SOPHON_RETRIEVER_PATH=~/.sophon/retriever
export SOPHON_HYBRID=1

sophon serve

Quick CLI

sophon exec -- cargo test                       # run + compress combined output
sophon compress-prompt --prompt ./system.txt --query "rust errors" --max-tokens 500
sophon hook install --agent claude --global     # transparent Claude Code integration
sophon stats --period session                   # token savings rollup

Programmatic (one-shot JSON-RPC)

echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"compress_prompt","arguments":{"prompt":"<rust>?: operator</rust><web>fetch()</web>","query":"rust errors","max_tokens":500}}}' \
  | sophon serve

What the binary ships

11 MCP tools, all stdio:

ToolWhat it does
compress_promptKeep query-relevant sections of a long prompt
compress_historySummary + facts + recent + optional retrieval over the conversation
compress_outputStrip noise from command stdout/stderr (21 domain filters + JsonStructural)
navigate_codebasetree-sitter / regex digest of a repo, PageRanked by query
update_memoryAppend messages, JSONL persist, optional rolling summary
read_file_deltaVersion/hash-aware file read, unchanged → minimal payload
write_file_deltaSend edits as diffs, not full files
encode_fragments / decode_fragmentsDetect repeated boilerplate, swap with tokens
count_tokenscl100k_base-accurate token count
get_token_statsSession-level savings rollup

Binary sizes by feature set:

BuildSize
Default (regex extractors, HashEmbedder)5.2 MB
+ tree-sitter (11 languages)~25 MB
+ BGE semantic embedder~34 MB
All features~42 MB

MCP protocol: 2025-06-18. notifications/cancelled actually drops the response (since v0.5.4). Structured JSON-RPC error codes (-32000..-32099 reserved for Sophon). Infallible dispatcher — a malformed request can't kill the stdio loop.


Configuration

Run sophon doctor to see every SOPHON_* env var currently set with validation warnings. Full catalogue (24 flags) lives in runtime_flags.rs. The flags worth knowing:

FlagEffectCost
SOPHON_RETRIEVER_PATH=/dirActivate the semantic retriever (chunk store on disk)~0
SOPHON_MEMORY_PATH=/file.jsonlPersistent conversation memory across sophon serve runs~0
SOPHON_HYBRID=1BM25 sparse-lexical + HashEmbedder fused via RRF~1 ms
SOPHON_ROLLING_SUMMARY=1Build rolling summary at update_memory time, not at query timeLLM call moved to ingest
SOPHON_CHUNK_TARGET=500Bigger chunks preserve cross-sentence context~0
SOPHON_EMBEDDER=bgeSwap HashEmbedder for BGE-small (needs --features bge)model load at startup
SOPHON_LLM_CMD="claude -p --model haiku"LLM shell-out command (used by summarizer when configured)per-call subprocess

Deprecated v0.4.0 recall-chasing flags — SOPHON_HYDE, SOPHON_FACT_CARDS, SOPHON_ENTITY_GRAPH, SOPHON_ADAPTIVE, SOPHON_LLM_RERANK, SOPHON_TAIL_SUMMARY, SOPHON_REACT, SOPHON_GRAPH_MEMORY, SOPHON_MULTIHOP_LLM — chase LOCOMO recall, an axis we no longer optimise. Still functional but sophon doctor flags them. Removed in a future major.


Honest limitations

The full list lives in BENCHMARK.md § 8. Headlines:

  • LOCOMO conversational recall caps at ~40 %. mem0 / HippoRAG hit 80-90 % with neural retrieval at query time — we chose determinism + sub-100 ms p99 instead. Pipe mem0 in front of Sophon if you need that recall.
  • HashEmbedder is keyword-bound. "favorite food" ↔ "weakness for ginger snaps" doesn't match. Activate BGE (SOPHON_EMBEDDER=bge) for semantic recall — costs +25 MB binary + model load.
  • No multimodal ingestion. Images / PDFs / audio out of scope. Run Docling / Marker / Unstructured upstream.
  • Rolling summary doesn't help on small sessions. When the un-summarised tail fits the budget, the rolling cache is a no-op. Useful for long-running sessions with SOPHON_LLM_CMD set.
  • Some commands don't compress. gh repo view --json adds tokens, git log --oneline saves 0.4 %. Sophon's job isn't to compress already-compact output — it's to compress redundant verbose output. The benches name the gaps explicitly.

Project layout

.
├── README.md           ← you are here
├── BENCHMARK.md        ← full per-section benchmark detail
├── CHANGELOG.md        ← version history + deprecated numbers
├── benchmarks/         ← reproducible scripts for every number above
├── npm/                ← npm wrapper package
└── sophon/crates/      ← 11-crate Rust workspace
    ├── prompt-compressor/    compress_prompt
    ├── memory-manager/       compress_history, update_memory, rolling summary
    ├── delta-streamer/       read/write_file_delta
    ├── fragment-cache/       encode/decode_fragments
    ├── semantic-retriever/   chunker + HashEmbedder + BM25 + entity graph
    ├── output-compressor/    21 command-aware filters + JsonStructural
    ├── codebase-navigator/   tree-sitter / regex + PageRank
    ├── cli-hooks/            transparent agent installer
    └── mcp-integration/      stdio server, async dispatch, cancellation

Contributing

PRs welcome. Run the test suite:

cd sophon && cargo test --workspace --lib --tests --exclude prompt-compressor   # 405 tests
cd sophon && cargo test --features codebase-navigator/tree-sitter               # +AST tests
cd sophon-py && .venv/bin/pytest tests/                                         # 4 Python tests

Every benchmark claim is reproducible — pointers to the scripts live in BENCHMARK.md. If a number doesn't reproduce on your machine, open an issue.

Particularly welcome:

  • TypeScript bindings (Python bindings ship in sophon-py/)
  • gh family filter (gh run list, gh pr list, gh repo view --json) — the bench shows this is currently a gap
  • SOPHON_EMBEDDER_CMD shell-out plugin pattern (mirror of SOPHON_LLM_CMD) for Voyage / OpenAI / Cohere
  • Multi-repo real_session_holistic.py runs against popular open-source repos

License

MIT. See LICENSE.

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Categories
Reverse Engineering
Registryactive
Packagemcp-sophon
TransportSTDIO
UpdatedApr 30, 2026
View on GitHub

Related Reverse Engineering MCP Servers

View all →
IDA Pro

taida957789/ida-mcp-server-plugin

Binds IDA Pro to MCP clients for remote binary analysis and control
186
IDA Headless

zboralski/ida-headless-mcp

Headless IDA Pro binary analysis via Model Context Protocol with multi-session concurrency and Python workers.
115
MCP Server for WinDbg Crash Analysis

svnscha/mcp-windbg

A Model Context Protocol server for Windows crash dump analysis using WinDbg/CDB
1.3k
IDA

mxiris-reverse-engineering/ida-mcp-server

A Model Context Protocol server for IDA
542
Binary Ninja

fosdickio/binary_ninja_mcp

A Binary Ninja plugin containing an MCP server that enables seamless integration with your favorite LLM/MCP client.
351
x64dbg

wasdubya/x64dbgmcp

Bridges x64dbg with LLMS to provide direct access to debugging functionality through prompts.
304