CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Code Context Engine

elara-labs/code-context-engine
160STDIOregistry active
Summary

Indexes your codebase locally and exposes search operations to AI coding agents through MCP, cutting input tokens by 94% in benchmarked sessions. Instead of re-reading entire files on every Claude Code or Cursor session, your agent queries a local vector index built with tree-sitter parsing and embeddings (bundled local model or Ollama). Run `cce init` in a project directory and it writes the right config for Claude Code, VS Code, Cursor, Gemini CLI, or Codex CLI, then installs git hooks to keep the index fresh. The context_search tool returns relevant code chunks, and session stats track actual dollar savings based on Sonnet pricing. Everything stays on your machine, no cloud sync involved.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

Code Context Engine

Code Context Engine

Index your codebase. AI searches instead of re-reading files.
94% token savings, reproducibly benchmarked.

Website · Docs · Why CCE? · Benchmark · GitHub


PyPI Downloads CI MCP Registry MIT License Stars

Python 3.11+ · macOS · Linux · Windows


Claude Code  VS Code  Cursor  Gemini CLI  Codex CLI  OpenCode  Tabnine

One command. Auto-detects your editor. Zero cloud, zero config.


CCE Demo


Use cases

Use caseHow CCE helps
💰Reduce Claude Code costs94% fewer input tokens per session
🔒Keep code privateEverything local, no cloud indexing
🔄Multi-editor teamsOne index across Claude Code, Cursor, VS Code, Gemini CLI
🧠Cross-session memoryDecisions and context survive restarts
⚡Faster responsesLess context = faster Claude replies
📊Track actual savingsDollar amounts, not estimates

Quick start

One command. 30 seconds.

uvx --from "code-context-engine[local]" cce init    # install + index + configure, one shot

Or if you prefer a persistent install:

uv tool install "code-context-engine[local]"    # or: pipx install "code-context-engine[local]"
cd /path/to/your/project
cce init

Restart your editor. Done. Every question now hits the index instead of re-reading files.

Already have Ollama? Skip [local] and use uv tool install code-context-engine instead. CCE auto-detects Ollama at localhost:11434 and uses nomic-embed-text.

System requirements

Python 3.11+ and a C compiler (for tree-sitter grammars).

PlatformSetup
macOSxcode-select --install
Ubuntu/Debiansudo apt install build-essential cmake
Fedora/RHELsudo dnf install gcc gcc-c++ cmake
WindowsVisual Studio Build Tools (C++ workload) + CMake

Tested on macOS, Linux, Windows with Python 3.11/3.12/3.13.

cce init auto-detects your editor and writes the right config. To target a specific agent, use --agent claude, --agent codex, --agent copilot, or --agent all.

EditorConfig writtenInstructions
Claude Code.mcp.jsonCLAUDE.md
VS Code / Copilot.vscode/mcp.json.github/copilot-instructions.md
Cursor.cursor/mcp.json.cursorrules
Gemini CLI.gemini/settings.jsonGEMINI.md
OpenAI Codex~/.codex/config.toml (user-global, per-project section)AGENTS.md
OpenCodeopencode.json
Tabnine.tabnine/agent/settings.jsonTABNINE.md

Multiple editors in the same project? All get configured in one command.

Codex note: Codex CLI reads MCP servers from ~/.codex/config.toml only — it has no per-project config. cce init adds one [mcp_servers.cce-<project>-<hash>] section per project so multiple projects coexist; cce uninstall removes only the section for the current project.

  my-project · 38 queries · last query 5m ago

  ⛁ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶  88% tokens saved

  Input savings   1.9M  tokens   $27.78
  Output savings  4.8k  tokens   $0.36
  ──────────────────────────────────────────
  Total saved   1.9M  tokens   $28.15

  Breakdown:
    retrieval              84%  ▰▰▰▰▰▰▰▰▰▰    1.8M   $26.76 · 12 calls
    chunk compression       3%  ▰▱▱▱▱▱▱▱▱▱   68.5k    $1.03 · 12 calls
    output compression*    <1%  ▰▱▱▱▱▱▱▱▱▱    4.8k    $0.36 · 12 calls

  Cost estimate based on Opus pricing (input $15.0/1M, output $75.0/1M)

Supports Anthropic, OpenAI, and Google model pricing. Configure via pricing.model in ~/.cce/config.yaml.


Why this matters

Input tokens are 85-95% of your Claude Code bill. CCE cuts them by 94% (benchmarked on FastAPI).

Without CCE:    Claude reads payments.py + shipping.py   = 45,000 tokens
With CCE:       context_search "payment flow"            =    800 tokens
Without CCEWith CCE
Session startupRe-reads files every timeQueries the index
Finding a functionRead entire 800-line fileGet the 40-line function
Cross-session memoryNoneDecisions + code areas persisted
Token cost (Sonnet, medium project)~$0.14/session~$0.04/session

Benchmark: FastAPI (reproducible)

We benchmarked CCE against FastAPI (53 source files, 180K tokens) with 20 real coding questions. No cherry-picking, no synthetic queries.

Methodology: For each query, "without CCE" means reading the full content of every file the query touches. "With CCE" means the relevant chunks after compression.

Important baseline note: The 94% number is measured against full-file reads, not against what Claude Code actually does. In practice, Claude Code already uses grep, partial file reads, and targeted tools, so the real-world savings compared to normal Claude Code behavior will be lower than 94%. We use full-file as the baseline because it's reproducible and deterministic (no agent behavior variability). The benchmark measures CCE's retrieval efficiency, not a head-to-head comparison with Claude Code's built-in exploration.

MetricResult
Retrieval savings94% (83,681 → 4,927 tokens/query)
Compression (additional, on retrieved chunks)89% (4,927 → 523 tokens/query)
Recall@10 (found the right files)0.90
Latency p500.4ms
Queries tested20

Per-Layer Savings (each measured independently)

LayerWhat it doesSavingsMethod
RetrievalFull files → relevant code chunks94%measured
Chunk CompressionRaw chunks → signatures + docstrings89%measured
GrammarDrops articles/fillers from memory text13%measured

Output compression (reducing Claude's reply length) provides additional savings (~65% estimated) but is not included in the headline number above.

Multi-language benchmarks

RepoLanguageFilesRetrieval savingsRecall@10
FastAPIPython5394%0.90
chiGo9476%0.67
fiberGo (monorepo)39693%0.07

Go's shorter files reduce the retrieval headroom (smaller baseline). Monorepos dilute recall at top-10 (fiber). Middleware queries with one-feature-per-file hit R=1.00 consistently.

Reproduce it yourself:

pip install code-context-engine
python benchmarks/run_benchmark.py --repo https://github.com/fastapi/fastapi.git --source-dir fastapi
python benchmarks/run_benchmark.py --repo https://github.com/go-chi/chi.git --source-dir .

Full results in benchmarks/results/. Queries and methodology in benchmarks/.


What you get

9 MCP tools that Claude uses automatically:

ToolWhat it does
context_searchHybrid vector + BM25 search with graph expansion
expand_chunkFull source for a compressed result
related_contextFind code via graph edges (calls, imports)
session_recallRecall decisions from past sessions
record_decisionSave a decision for future sessions
record_code_areaRecord which files were worked in
index_statusCheck index freshness
reindexRe-index a file or the full project
set_output_compressionAdjust response verbosity (off / lite / standard / max)

Live dashboard with donut charts, file health, and session history:

cce dashboard

CCE Dashboard

Dollar estimates with multi-provider pricing (Anthropic, OpenAI, Google):

cce savings --all    # see savings across all projects

How it works

  1. Index: Tree-sitter parses your code into semantic chunks (functions, classes, modules). Stored as vector embeddings locally.
  2. Search: Claude calls context_search. Hybrid vector + BM25 retrieval finds the right chunks. Code graph adds related files automatically.
  3. Compress: Chunks are truncated to signatures + docstrings (or LLM-summarized if Ollama is running).
  4. Remember: Decisions and code areas persist across sessions via session_recall.
  5. Track: Every query is logged. cce savings shows exactly how much you saved.

Re-indexing after edits takes under 1 second (96% embedding cache hit rate). Git hooks keep the index current automatically.


What makes CCE different

It saves where the money is

Output compression tools (like Caveman) save 20-75% on output tokens. Output is 5-15% of your bill. Net savings: ~11%.

CCE saves on input tokens (94% retrieval savings on FastAPI, reproducibly benchmarked). Input is 85-95% of your bill.

It actually understands your code

Not a text search. Tree-sitter AST parsing creates semantic chunks. Hybrid retrieval merges vector similarity with BM25 keyword matching via Reciprocal Rank Fusion. A confidence scorer blends similarity (50%), keyword match (30%), and recency (20%). Graph expansion walks CALLS/IMPORTS edges to pull in related code.

It remembers

record_decision("use JWT for auth", reason="session tokens flagged by legal") is stored in SQLite and surfaces via session_recall in the next session. No re-explaining your architecture.

It tracks real savings

Not estimates. Actual tokens served vs full-file baseline, broken down by buckets (retrieval, compression, output, memory, grammar). Dollar costs fetched from Anthropic's pricing page. Savings summary shown at every session start.

It is secure by default

Secret files (.env, *.pem, credentials.json) are never indexed. Content is scanned for AWS keys, GitHub tokens, Slack tokens, Stripe keys, JWTs, and generic credentials. PII (emails, IPs, SSNs, credit cards) is scrubbed from memory writes. All MCP file paths are validated against path traversal.


Under the hood

Content-Hash Embedding Cache

SHA-256 fingerprint per chunk, salted with model name. Re-index skips unchanged code. Binary float32 storage (10x smaller than JSON). Typical re-index: 96% cache hit, under 1 second.

sqlite-vec: 2 MB instead of 217 MB

Replaced LanceDB with sqlite-vec. Same cosine-distance quality, 99% smaller install. WAL mode + PRAGMA NORMAL for 80% write speedup. Vectors, FTS5, code graph, and compression cache all in three SQLite files.

Deterministic Grammar Compression

Memory entries compressed without LLM calls. Drops articles, fillers, pronouns. Three levels (lite/full/ultra, 20-60% savings). Code, paths, URLs preserved byte-for-byte. Same input always yields same output.

Fail-Closed Hook Design

5 Claude Code lifecycle hooks capture session context. Every hook runs curl ... || true, so a crashed server never blocks the user. SessionStart injects bootstrap context; others capture silently.

Multi-Provider Pricing

Dollar estimates in cce savings support 15+ models across Anthropic, OpenAI, and Google. Static pricing ships with CCE, live Anthropic pricing is fetched and cached 7 days. Configure pricing.model (e.g. gpt-4o, gemini-2.5-pro, sonnet) or override with pricing.input / pricing.output for custom rates.

Append-Only Savings Ledger

7 buckets track every token saved: retrieval, chunk compression, output compression, memory recall, grammar, turn summarization, progressive disclosure. Survives restarts. Powers CLI and dashboard analytics.


CLI at a glance

cce init                    # Index + install hooks + register MCP
cce                         # Status banner
cce savings                 # Token savings with dollar estimates
cce savings --all           # All projects
cce dashboard               # Web dashboard with live charts
cce search "auth flow"      # Test a query
cce status                  # Index health + config
cce services                # Ollama + dashboard + MCP status
cce commands add-rule '...' # Project rules for Claude
cce uninstall               # Clean removal of all CCE artifacts

Run cce list for the full command reference.


Configuration

Zero-config by default. Override what you need in ~/.cce/config.yaml or .context-engine.yaml:

compression:
  level: standard          # minimal | standard | full
  output: standard         # off | lite | standard | max
  ollama_url: http://localhost:11434   # point at a remote Ollama if desired

retrieval:
  top_k: 20
  confidence_threshold: 0.5

pricing:
  model: opus              # opus | sonnet | haiku | gpt-4o | gemini-2.5-pro | ...
  # input: 15.0            # override $/1M input tokens
  # output: 75.0           # override $/1M output tokens

Remote Ollama: If you run Ollama on another machine in your network, set compression.ollama_url (e.g. http://nas.local:11434) or export CCE_OLLAMA_URL — the env var wins. CCE probes the endpoint and falls back to truncation-only compression when it's unreachable, so a flaky link won't break indexing.


Output Compression

CCE also compresses Claude's responses (same concept as Caveman):

LevelStyleSavings
offFull output0%
liteNo filler or hedging~30%
standardFragments, drop articles~65%
maxTelegraphic~75%

Tell Claude: "switch to max compression" or "turn off compression". Code blocks and commands are never compressed.


Disk Footprint

ComponentSize
Core install (Ollama backend)~17 MB
With [local] extra (fastembed + ONNX)~189 MB
Embedding model (one-time download)~60 MB (fastembed) or managed by Ollama
Index per project (small/medium/large)5-60 MB

No GPU required. With Ollama, embeddings are handled by the Ollama server. With the [local] extra, the embedding model runs on CPU via ONNX Runtime.


Supported Languages

AST-aware chunking (tree-sitter parsed, 10 extensions):

LanguageExtensions
Python.py
JavaScript.js, .jsx
TypeScript.ts, .tsx
PHP.php
Go.go
Rust.rs
Java.java

Language-aware fallback chunking (40+ extensions):

CategoryLanguages
WebHTML, CSS, SCSS, LESS, Vue, Svelte
SystemsC, C++, C#, Zig, Nim
MobileSwift, Kotlin, Dart
FunctionalHaskell, Scala, Clojure, Elixir, Erlang, F#
ScriptingRuby, Perl, Lua, R, Bash/Zsh
Data/ConfigJSON, YAML, TOML, XML, SQL, GraphQL, Protobuf
DevOpsTerraform, HCL, Dockerfile
DocsMarkdown

All other text files are chunked by line range. Binary files are skipped.


Documentation

PageContent
How Much Are You Spending on AI Coding Tokens?The math on input vs output tokens
What is CCE? (Complete Guide)Setup, tools, how it works, FAQ
How to Save Claude Code TokensCost breakdown and savings guide
Benchmark Deep DiveFull FastAPI benchmark methodology
Comparison with AlternativesCCE vs Cursor, Aider, Continue, Greptile
ExamplesReal conversations with Claude
How It WorksFull 9-stage pipeline
CLI ReferenceEvery command with output
ConfigurationAll config options

FAQ

Does CCE affect response quality?

No. Quality stays the same or slightly improves.

CCE replaces "dump the entire file" with "search for the relevant function." The model still gets the code it needs (0.90 Recall@10 in benchmarks). Less irrelevant context means less noise competing for attention, which can improve the model's focus on your actual question.

How does output token savings work?

CCE writes output compression rules directly into your agent's instruction files (CLAUDE.md, AGENTS.md, .cursorrules, etc.) during cce init. These rules apply to the entire session, not just CCE tool responses, so every reply from the agent follows them.

Set the level in ~/.cce/config.yaml or .context-engine.yaml:

compression:
  output: max       # off | lite | standard | max

Then re-run cce init to update instruction files. Or change at runtime:

set_output_level output_level=max
LevelSavingsWhat it does
off0%No compression
lite~25%Removes filler/hedging/pleasantries + diff-only for code changes
standard~70%Drops articles, fragments, short synonyms + diff-only for code
max~80%Telegraphic style + diff-only for code

Default is standard. All levels include code output rules that tell the model to show only changed lines (not full file rewrites), which is where most output tokens go in coding sessions. The max level produces very terse prose (similar to "caveman mode"). Code blocks, paths, and commands are never compressed regardless of level.

Where do the savings come from?

Most savings are input tokens (what goes into the model):

LayerTypeTypical savings
RetrievalInput94% (full files → relevant chunks)
Chunk compressionInput89% (chunks → signatures)
Grammar compressionInput13% (article/filler removal)
Turn summarizationInputvaries (session history)
Progressive disclosureInputvaries (tool payloads)
Output compressionOutput25-80% (depends on level)

Output tokens cost 5x more per token (e.g. Opus: $15/1M input vs $75/1M output), so even a small output reduction has outsized cost impact.


Roadmap

  • Multi-repo benchmarks (FastAPI, chi, fiber)
  • More benchmarks (Django, Express)
  • Tree-sitter support for C, C++, Ruby, Swift, Kotlin
  • Docker support for remote mode

See CHANGELOG.md for shipped features.


Contributing

Contributions welcome. See https://github.com/elara-labs/code-context-engine/blob/main/CONTRIBUTING.md for setup.


License

MIT. See LICENSE.

Authors

  • Fazle Elahee
  • Raj

Acknowledgments

Claude Code · MCP · sqlite-vec · Tree-sitter · fastembed · Ollama


If CCE saves you tokens, give it a star.

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Categories
Search & Web Crawling
Registryactive
Packagecode-context-engine
TransportSTDIO
UpdatedMay 3, 2026
View on GitHub

Related Search & Web Crawling MCP Servers

View all →
Google Search

com.mcparmory/google-search

Scrape Google search results with SERP data, ads, and knowledge panels
25
Brave Search

io.github.pipeworx-io/brave-search

Brave Search MCP — independent web index (no Google/Bing dependency)
Serper Search and Scrape

marcopesani/mcp-server-serper

Serper MCP Server supporting search and webpage scraping
154
Brave Search Mcp Server

brave/brave-search-mcp-server

Brave Search MCP Server: web results, images, videos, rich results, AI summaries, and more.
1.2k
Google Search Console

com.mcparmory/google-search-console

Query search analytics, manage sitemaps, and inspect site URLs and status
25
Google Search Console

acamolese/google-search-console-mcp

Google Search Console MCP server: SEO audits, performance queries, URL inspection, indexing checks.
3