If you're tired of watching Claude burn 40+ tool calls and 100k tokens grep-crawling for architectural context it can't retain between sessions, ContextAtlas pre-indexes your codebase into single-call bundles. It fuses LSP symbol data with extracted ADR claims, git history, and test associations, so when Claude asks about OrderProcessor it gets the structure, the "must be idempotent" constraint from your design docs, recent commits, and test coverage in one shot instead of fifteen. The developer benchmarked this on hono, httpx, and cobra at 45-72% token reduction on architectural prompts with no quality regression. You get value even without ADRs since LSP plus git plus tests in one call beats the baseline, but the real win is surfacing design intent before Claude proposes changes that violate it. Ships as both CLI and MCP server generating the same atlas.json artifact.
Stop watching Claude burn tokens grepping for context it can't possibly find.
ContextAtlas turns your codebase into a single-call context bundle for Claude Code — fusing LSP-grade structure, architectural intent from your Architectural Decision Records (ADRs), git history, and test associations. Measured 45-72% token reduction with zero quality regression across benchmark axes on architectural prompts across the hono / httpx / cobra benchmark suite.
Quick start → · Benchmark results → · Why not graph-based? → · Architecture → · ADRs →
ContextAtlas ships two equivalent paths — CLI and Claude Code Skills —
both producing the same atlas.json. See Quick Start
for setup.
Claude Code currently learns your codebase by brute force. Every session starts fresh. Every "where is X?" triggers multiple grep calls. Every "what depends on Y?" is another flurry of file reads. On a mid-sized codebase, answering a single architectural question can consume 40+ tool calls and 100,000+ tokens before Claude has enough context to reason well.
Worse: the architectural intent that governs your code — the ADRs, the
design decisions, the "we did it this way because" — is invisible to
Claude. The rule that OrderProcessor must be idempotent lives in
docs/adr/. When Claude proposes a change, it has no way to know that
constraint exists.
Yesterday's understanding doesn't carry to today. Every conversation starts from zero. Your ADRs, your commit history, your test coverage — none of it is on the agent's table.
What if expensive understanding happened once, at index time, and every query became a dictionary lookup?
That's ContextAtlas.
ContextAtlas is an MCP server that gives Claude Code a curated atlas of your codebase — fusing LSP-grade structural precision with architectural intent extracted from your ADRs, docs, and git history, delivered to Claude in single-call context bundles.
Every bundle Claude receives combines four independent signals about a symbol:
One MCP call returns all four, fused. No ADRs in your repo yet? You still get LSP + git + tests in one call instead of fifteen — a meaningful baseline improvement. Add ADRs and the bundles get richer. The architecture is designed so any subset of signals produces value.
Given an ADR stating that OrderProcessor must be idempotent, a call
to get_symbol_context("OrderProcessor") returns:
SYM OrderProcessor@src/orders/processor.ts:42 class
SIG class OrderProcessor extends BaseProcessor<Order>
INTENT ADR-07 hard "must be idempotent"
RATIONALE "All order processing must be safely retryable."
INTENT ADR-12 soft "prefer async base class for new processors"
REFS 23 [billing:14 admin:9]
TOP ref:ts:src/billing/charges.ts:88
TOP ref:ts:src/admin/orders.ts:12
GIT hot last=2026-03-14
RECENT "Fix idempotency bug in retry path" a3f2c1d
TESTS src/orders/processor.test.ts (+11)
When Claude is asked to modify OrderProcessor, it sees the
idempotency constraint before proposing changes — not after a user
review catches the violation.
Who this is for. ContextAtlas is built for the average developer using Claude Code on real codebases — not just engineers at large orgs working on 500,000-file monorepos. Token-burn reduction scales with codebase size — dramatic on a 200-file framework, modest on a 30-file library. But architectural intent capture is size-invariant. A 30-file library can have meaningful architectural decisions worth surfacing, and Claude respecting them matters just as much as on a larger codebase.
Efficiency and quality are necessary but not sufficient. The substantive value of context-grounding shows up in design choices on non-trivial code-change tasks.
A/B trial during v0.3 development. Identical 3-paragraph prompt across two ContextAtlas clones: implement a known bug fix — locate the bug, design and implement the fix, write tests, document via ADR. The only setup difference: MCP availability.
| Arm | MCP | Approach selected |
|---|---|---|
| A (vanilla) | none | Recall-first approach (broader matching; fought the project's precision-thesis with noise) |
| B (CA-aided) | ContextAtlas | Precision-optimization approach (aligned with the project's pre-extracted-claims-with-structural-attribution thesis) |
Arm B's approach landed in main. Both arms functionally fixed the bug at similar wall-clock and token cost. The substantive difference was alignment with project design thesis — the CA-aided arm could read the relevant ADR + prior architectural work from the atlas, and made a choice that fit. The vanilla arm couldn't see that context and chose an approach that worked but fought the architecture.
Arm A's substantive consideration wasn't lost — captured as future-work investigation trigger. The recall-vs-precision tradeoff is preserved.
Full synthesis at v0.3 Round 3 dogfood evidence.
N=1 trial; this is anecdote, not benchmark. The systematic benchmark suite (hono / httpx / cobra) measures efficiency and quality (see §The Numbers below). This A/B trial measures the substantively-distinct design-alignment axis — which doesn't fit benchmark-suite methodology (every code-change task is repo- specific) but is the substantive value proposition for cohort developers building on real codebases.
We benchmark ContextAtlas against baseline Claude Code on three repositories chosen to reflect realistic developer workloads:
| Repo | Language | Source files | Role |
|---|---|---|---|
| honojs/hono | TypeScript | 186 | Mid-sized framework |
| encode/httpx | Python | 23 | Focused production library |
| spf13/cobra | Go | 19 | CLI framework |
Methodology. 24 prompts per repo, 6 task buckets, blind manual grading, pre-registered rubric, no cherry-picking. Full methodology in RUBRIC.md.
Phase 5 reference run on hono, six pre-registered prompts:
| Prompt | Bucket | Alpha calls | CA calls | Δ | Alpha $ | CA $ |
|---|---|---|---|---|---|---|
| h1-context-runtime | win | 18 | 9 | −50% | $2.36 | $1.52 |
| h2-router-contract | win | 11 | 5 | −55% | $0.60 | $0.53 |
| h3-middleware-onion | win | 5 | 5 | 0% | $0.38 | $0.47 |
| h4-validator-typeflow | win | 21 | 6 | −71% | $2.95 | $0.52 |
| h5-hono-generics | tie | 11 | 13 | +18% | $0.79 | $1.17 |
| h6-fetch-signature | trick | 3 | 4 | +33% | $0.17 | $0.29 |
| aggregate | 69 | 42 | −39% | $7.25 | $4.50 (−38%) |
The headline case: h4-validator-typeflow ran 7.3× cheaper ($2.95 → $0.52) at equivalent answer depth. CA opens with the governing ADR by number; the baseline reconstructs the architecture from source. Tie and trick buckets (h5, h6) show CA net-negative as the rubric predicted — CA over-engineers on questions where architectural intent doesn't carry load. Bucket-aware methodology surfaces these expected cases rather than burying them.
Cross-language replication: the same architectural-intent win mechanism holds on Python (Phase 6 — httpx) and Go (Phase 7 — cobra). Phase 8 re-ran the locked prompt sets against v0.3-sharpened atlases at the same pinned target SHAs: 45-72% token reduction on architectural-intent prompts across all three target languages. Full synthesis at phase-8-v0.3-reference-run.md.
v0.5 shipped the LLM-judge methodology under paired-mode anonymization (per ADR-19). Cross-cell rollup paired-t at N=27 differences per axis (5 anchor cells × n=5 trials × 2 conditions; hono h1 auto-stretch to n=8):
| Quality axis | Mean Δ (0-3 scale) | 95% CI | Tier |
|---|---|---|---|
| Factual correctness | +0.370 | [0.176, 0.565] | CLEAN |
| Hallucination | +0.296 | [0.032, 0.561] | Borderline |
| Actionability | +0.148 | [0.005, 0.291] | Borderline |
| Completeness | +0.037 | [-0.039, 0.113] | Not distinguishable |
Threshold pre-registration: the three-tier framing (≥0.05 CLEAN; 0.001-0.05 BORDERLINE; ≤0 NOT distinguishable) was locked before precision values were computed. No goalpost-shifting after data. 76% tie rate confirms anonymization worked — the judge couldn't tell which condition was which on three-quarters of comparisons.
Full per-axis numerics + 9 named findings at the Phase-9 reference doc. Honest methodology limits documented in §Methodology and Honest Limits below.
A few deliberate framings — what ContextAtlas is and isn't relative to neighboring tools:
vs. graph-based code intelligence (Graphify and similar). We're in the same category — both build pre-computed indexes over codebases for LLM agents via MCP. That's genuine category overlap, and we want to be straight about it. Where we differ:
graph_query, get_neighbors, shortest_path) that
callers compose.Whether our bets produce better results for a given workload is an empirical question. See the numbers above.
vs. session-memory tools (claude-mem, engram, anamnesis). Those capture accumulated session history — what Claude learned or did in past conversations. ContextAtlas provides static architectural ground truth extracted from your code, ADRs, and docs. Different information sources with occasional overlap (when session discussions become ADRs or commits), but fundamentally different problems. Session-memory tools also can't really be committed to a repo; ContextAtlas's atlas can.
vs. LSP-in-MCP (LSP-AI and similar). ContextAtlas uses LSP as its source of structural truth. If you just want LSP-in-MCP, those projects solve that well. ContextAtlas layers architectural intent and git history on top.
vs. embedding-based search. We evaluated this and chose symbol-keyed claims instead. Embeddings are fuzzy; LSP symbols are exact. Embedding-based ranking is a post-MVP enhancement contingent on benchmark evidence that it helps — see ADR-09 for the full rationale.
ContextAtlas produces a committable team artifact — atlas.json —
that lives in the repo alongside your code and ADRs. This is the piece
that turns ContextAtlas from a personal productivity tool into a team
asset.
atlas.json
with everything else. On first run, ContextAtlas imports the
committed atlas directly into their local cache — no extraction API
calls, no 10-minute wait. Productive from the moment they open
Claude Code.atlas.json as part of their
commit. Reviewers see both the code change and the atlas diff in
the PR.For teams that cannot commit the atlas, set atlas.committed: false
in the config. Every developer runs their own extraction. The team
artifact benefit is lost, but ContextAtlas still works as a personal
tool.
This model — committed team artifact with a local cache for query performance — is a categorical difference from both session-memory tools and knowledge-graph tools. Detailed in ADR-06.
INDEX TIME (once per source change)
──────────────────────────────────────
ADRs ──────────┐
Docstrings ────┤
Git commits ───┼──► Opus 4.7 extraction
LSP symbols ───┘ │
▼
atlas.json (committed to repo)
│
▼
SQLite + FTS5 BM25 (local cache)
QUERY TIME (every Claude call, zero API)
──────────────────────────────────────
Claude Code: get_symbol_context("X")
│
▼
One fused bundle, sub-100ms
(LSP refs + intent + git + tests)
Five layers, each with one job:
get_symbol_context, find_by_intent, and
impact_of_change tools exposed to Claude.Signal fusion at query time works as a substantively cheap lookup:
when Claude calls get_symbol_context("OrderProcessor"), the MCP
handler hits the LSP for live structural facts (definition,
references, types) + reads the symbol's pre-extracted intent claims
from SQLite + folds in git heat + tests. The bundle returned to
Claude is composed, not computed — substantive joins happened at
index time. This is the substantive distinction from graph-based
alternatives that expose primitives (get_neighbors, shortest_path)
which callers compose at query time.
The architectural promise: expensive understanding happens once at index time; queries are local dictionary lookups, zero API calls. This bounds cost, latency, and unpredictability — and it's a hard invariant, not an optimization.
Full design in DESIGN.md.
What ContextAtlas does and doesn't send off your machine:
Sent to Anthropic's API (at index time only):
.contextatlas.ymlNever sent anywhere:
Stored locally only:
.contextatlas/index.db by default)At query time — every get_symbol_context call Claude makes during
your work — ContextAtlas performs a local SQLite lookup plus local LSP
calls. No network traffic. No model calls. Your code never leaves your
machine during normal use.
Index-time extraction uses the Anthropic API per standard API terms. If your ADRs contain sensitive architectural decisions, they'll be processed under those terms like any other API-submitted content.
The three MCP tools are not three parallel features — they're one fused context substrate with three access patterns.
get_symbol_context — the primitive. "I know the symbol; give me
everything." Returns the full fused bundle (signature, ADR claims,
references, git heat, tests, types) in a single call. Multi-symbol
mode handles up to 10 symbols per request (per
ADR-15).
find_by_intent — the semantic-query composite. "I don't know
the symbol; find it by what it does." Ranks by BM25 against indexed
claim text in local SQLite FTS5 — no embedding service, no external
calls, deterministic results (per
ADR-09).
impact_of_change — the blast-radius composite. "I'm about to
change this; what breaks?" Adds git co-change patterns and test impact
on top of the primitive.
ContextAtlas atlas is a substrate you build once and refresh after code or ADR changes. ONE canonical entry point per cohort path; behavior adapts based on substrate state:
| CLI | Skills | |
|---|---|---|
| Cold-start | contextatlas index (full extraction) | /index-atlas (full extraction) |
| Refresh | contextatlas index (Phase 4 SHA-diff incremental) | /index-atlas (refresh-aware workflow) |
SHA-diff incremental refresh per ADR-12 is substantively cheaper than cold-start scaffolding. Unchanged ADR and docstring sources skip; only changed sources re-extracted.
Status: v0.9.0 shipped 2026-05-16. v1.0 public launch substrate complete. Package not yet published to npm; install instructions below describe the intended shape.
Runtime requirements:
typescript-language-server (declared as a
peer dependency rather than a direct one, so you control the
version). Install alongside ContextAtlas
(e.g. npm i -D typescript-language-server typescript).gopls on the PATH (install via
go install golang.org/x/tools/gopls@latest).ruby-lsp 0.26.x. Recommended install via Bundler in
your project's Gemfile (gem 'ruby-lsp', '~> 0.26.0', require: false under group :development). Rails projects additionally
benefit from ruby-lsp-rails 0.4.x. Ruby 3.3+ required (4.0+
recommended).csharp-ls 0.24.x on the PATH (Roslyn LSP
wrapper). Install via dotnet tool install --global csharp-ls.
.NET SDK 8 minimum (10+ recommended; matches cohort backend
pin). On Windows, the %USERPROFILE%\.dotnet\tools directory
must be on PATH — the adapter enriches PATH automatically for
Bash/Git-Bash where the SDK installer only configures
PowerShell.npm install -g contextatlas
contextatlas init
Then in Claude Code:
/generate-adrs # Skip if you already have ADRs (any path; see Using existing ADRs below)
/index-atlas # Build the atlas
/prime-atlas # Verify connection (once per session)
npm install -g contextatlas
export ANTHROPIC_API_KEY=sk-...
contextatlas init
contextatlas generate-adrs # Skip if you already have ADRs (any path; see Using existing ADRs below)
contextatlas index
contextatlas doctor # Verify health
ContextAtlas extracts architectural intent from whatever ADRs and
documentation you already have — generate-adrs is for repos
without existing ADR substrate.
docs/adr/? Skip generate-adrs;
ContextAtlas extracts your existing ADRs automatically.adrs.path in
.contextatlas.yml (default: docs/adr/).docs.include (default: README.md +
docs/**/*.md). Add custom paths to extract additional
documentation surfaces.The extraction pipeline produces structured claims from any prose source pointed at via config — existing substrate doesn't go unused. See Configuration below for full schema.
Configure ContextAtlas as an MCP server in your Claude Code settings.
Choose based on whether contextatlas is on your PATH:
Option A — global binary on PATH (e.g., installed via
npm install -g or npm link):
{
"mcpServers": {
"contextatlas": {
"command": "contextatlas"
}
}
}
Option B — direct dist invocation (no global install needed):
{
"mcpServers": {
"contextatlas": {
"command": "node",
"args": ["/absolute/path/to/contextatlas/dist/index.js"]
}
}
}
atlas.json is already committed (teammate ran it first, or it
came with the repo), ContextAtlas imports it instantly. No API
calls. You're ready in seconds.atlas.json can be committed so future
contributors skip this step.EXTRACTION_PROMPT prefix and typically run
~3x lower. v0.4 reference measurements: cobra $5.44 → $1.82,
httpx $5.53 → $1.85, hono $10.89 → $3.65 (3.0x ratio consistent
across targets). Treat projected costs as conservative upper bounds.Create .contextatlas.yml in your repo root:
version: 1
languages:
- typescript
- python
- go
- ruby
- csharp
adrs:
path: docs/adr/
format: markdown-frontmatter
docs:
include:
- README.md
- docs/**/*.md
git:
recent_commits: 5
atlas:
committed: true # default; commits atlas.json to your repo
Full reference at docs/config.md.
Credibility is built by stating what we don't claim.
Statistical methodology. All quality measurements are paired-t with 95% confidence intervals — no p-values. NHST at n=5 is statistically void; CIs preserve effect-size visibility. Threshold pre-registration honored verbatim (Option α strict three-tier framing locked before precision values computed).
Single-judge model. v0.5 quality measurements use Sonnet 4.6 as the judge with within-judge consistency ≥80% per axis (pass-1 vs pass-2). Cross-vendor judge-panel graduation is post-v1.0 work.
Three benchmark repos. All quantitative claims are bounded to hono (TypeScript, 186 files), httpx (Python, 23 files), and cobra (Go, 19 files), plus our own dogfood. Generalization beyond these is post-launch cohort work.
v0.5 substrate scope. Quality-axis measurements are 5 anchor cells × n≥5 trials × 2 conditions (hono h1 auto-stretch to n=8); not full-matrix replication. Matrix-completion graduation is post-v1.0.
v0.6 cross-cycle replication caveat. A targeted matrix-replication subset at v0.6 (8 cells × n=5) showed attenuation on 2 of 4 quality axes vs the v0.5 anchor cells (factual_correctness CLEAN→BORDERLINE; actionability BORDERLINE→NOT distinguishable). Root cause: the v0.5 measurements were against an earlier atlas version, and the cross-cycle methodology didn't control for atlas-substrate-version. Full causal investigation deferred to post-launch. Detail at Phase-10 reference doc.
v0.3 single-run methodology. Phase 8 reports n=1 per cell; blind-graded quality-axis measurement was added at v0.5. The Beta-vs- Beta+CA reporting at Phase 8 carries the atlas-file-visibility caveat (bias direction conservative — actual CA contribution likely larger than published numbers indicate).
Dogfooding is not a measured benchmark. Throughout development, ContextAtlas indexes its own ADRs and is used by Claude Code during work on ContextAtlas itself. This is a development practice, not part of the four-condition matrix — which runs only against the three external targets.
Favorable and unfavorable results both published. Phase 7's cross-harness asymmetry hypothesis was FALSIFIED on v0.3 substrate. v0.6's atlas-substrate-version confound was surfaced and disclosed. Tie and trick buckets routinely show CA net-negative; we report them inline rather than burying them.
Current: v0.9.0 (shipped 2026-05-16). v1.0 public launch substrate complete; launch execution work folds into v1.0.0 without a separate v0.9.1 tag.
Recent cycle highlights (full per-cycle scope at
docs/release-history.md):
docs/cycles/v0_X/). Launch positioning work in progress.generate-adrs feature with canonical depth-floor
enforcement via validate-adrs.contextatlas_commit_sha + Phase 8 cross-target validation
(45-72% range).Roadmap (post-v1.0):
find_by_intent (evidence-gated)why_does_this_fail,
onboard_to_feature, audit_changeFor detailed milestone arc and per-cycle scope: ROADMAP.md, docs/cycles/, docs/release-history.md, and research/v1.1-candidates.md.
The language adapter interface is a stable plugin surface — each new
language is an additive contribution, not a core change. See
docs/language-adapter-guide.md for
the contributor onboarding walkthrough.
| Language | Adapter | LSP Server | Shipped |
|---|---|---|---|
| TypeScript | TypeScriptAdapter | typescript-language-server | v0.1 |
| Python | PyrightAdapter | Pyright | v0.1 |
| Go | GoAdapter | gopls | v0.2 |
| Ruby | RubyAdapter | ruby-lsp (+ ruby-lsp-rails) | v0.9 |
| C# / .NET | CsharpAdapter | csharp-ls (Roslyn) | v1.1 |
ContextAtlas is MIT licensed and welcomes contributions. Areas where contribution will be especially valuable:
LanguageAdapter interface is small
and stable. Adding Java, .NET, Rust, Kotlin, or other language
support is a self-contained project. See
docs/language-adapter-guide.md.Benchmarks and methodology live in a separate repository:
github.com/traviswye/ContextAtlas-benchmarks.
That repo contains the harness code, locked prompt sets, published
measurement results, and the full methodology document (RUBRIC.md).
Keeping the harness out of this repo means the benchmarks measure the
published contextatlas package's actual behavior rather than an
internal monorepo build.
Built during the "Build anything with Opus 4.7" hackathon.
ContextAtlas uses:
MIT. See LICENSE.
ray0907/git-mcp-server
cyanheads/git-mcp-server
io.github.b1ff/atlassian-dc-mcp-bitbucket
io.github.b1ff/atlassian-dc-mcp-jira
com.mcparmory/atlassian-jira
sirlordt/vscode-terminal-mcp