CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Big Indexer

ahmedxuhri/bigindexer
2STDIOregistry active
Summary

BGI analyzes large codebases by grouping code units by behavioral role rather than just syntax references. It exposes MCP tools like task_fingerprint, behavioral_twins, and twin_context that help ground AI prompts in actual repository behavior patterns. The server uses COV tokens to classify behavior, DRS clustering with hard size caps to prevent giant merged clusters, and emits structured artifacts like bgi-graph.json and fuse-graph.json that show architectural boundaries and coupling seams. Reach for this when you need architecture-aware context for refactoring decisions or want to reduce hallucination risk by anchoring AI changes to proven patterns in your codebase. Supports Python, TypeScript, Go, Rust, Java and others with varying parser quality tiers.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

BGI - Big Indexer

ahmedxuhri/bigindexer MCP server PyPI version License Tests Interactive Demo MCP Registry PR Risk Bot Action

BGI is a static architecture analysis tool for large codebases. It groups code units by behavioral role and emits explicit architectural boundaries. Project domain: bigindexer.com

Use via MCP Registry

Big Indexer is published in the MCP Registry as io.github.ahmedxuhri/bigindexer.

pip install bigindexer==0.1.3
bgi mcp --graph bgi-graph.json --fuse-graph fuse-graph.json

Validation: https://bigindexer.com/validation

What problem this solves

Most architecture graphs fail at scale in two ways:

  • too many noisy edges
  • giant clusters that collapse unrelated components together

BGI is built to keep both under control, so the output remains usable on large repos.

What you can do with it

  1. "Where should this boundary be before we refactor?"
    BGI groups units by behavioral role (COV tokens + DRS clustering) so likely component boundaries are visible.
  2. "Which subsystem coupling is risky?"
    BGI surfaces high-coupling seams and fuse-boundary signals between clusters so integration risk is easier to spot.
  3. "How do we plug architecture data into automation?"
    BGI emits machine-readable artifacts (bgi-graph.json, fuse-graph.json) plus optional human context (bigindexer.md).
  4. "How do we make AI changes less random?"
    MCP tools (task_fingerprint, behavioral_twins, twin_context) ground prompts in in-repo behavior patterns.
  5. "Can I run this automatically on PRs as a live example?"
    Yes — use the dedicated action repo ahmedxuhri/bigindexer-pr-risk-bot to auto-comment PRs with blast radius, seams, and risk hints.

30-second demo

Run BGI on the included fixture repo:

git clone https://github.com/ahmedxuhri/bigindexer
cd bigindexer
pip install -e .
bgi scan tests/fixtures --lang python --out /tmp/bgi-example.json
head -50 /tmp/bgi-example.json

Observed result on this repository:

  • units: 12
  • edges: 14
  • clusters: 2
  • max cluster in sample: 6 units

One produced edge looks like:

{
  "source": "auth_module.py::AuthService::__init__",
  "target": "auth_module.py::AuthService::__del__",
  "key": "COV.INIT",
  "lock": "COV.TEARDOWN",
  "type": "HARD"
}

Why this matters: instead of raw syntax references only, you get behavioral relationships plus cluster structure that can drive architecture decisions.


Plain-English glossary

BGI termPlain meaning
COV tokenA behavior label for a unit (for example: FETCH, PERSIST, AUTHENTICATE)
Key-Lock edgeA behavioral connection between two units with complementary roles
DRS clusterA unit-level grouping by behavioral role. Mostly intra-file in practice. File-level architectural components are better expressed via the BGI edge graph or the fuse-graph boundary signal — see external benchmark
Fuse edge / fuse eventA refused merge because cluster growth hit the cap; treated as boundary signal
Spectral masksScope rules that limit where matching is allowed (global, directory, file)

Architecture in one view

Source files
   ->
Gate 1: fingerprint unit behavior (COV tokens)
   ->
Gate 2: create behavioral edges with scoped matching
   ->
Gate 3: cluster with hard size cap + boundary emission
   ->
Artifacts: bgi-graph.json, fuse-graph.json, bigindexer.md, optional routes/graphml/html

Core approach:

  1. TOKEN-CENSUS - classify token frequency per repo.
  2. SPECTRAL-MASKS - restrict match scope by token frequency.
  3. FUSE-MAP - cap cluster growth and record refused merges.
  4. MASK-4-GATE-3 - use import proximity as clustering signal.
  5. WATER-CLOCK + .scm - single-pass query extraction path in Gate 1.

Why BGI is different from common alternatives

CapabilityLSP / SCIP indexCall-graph + generic community detectionBGI
Fast symbol lookupStrongMediumAvailable (Phase 6 index)
Behavioral token modelNoUsually noYes
Hard-bounded clusteringNoUsually noYes (unit-level)
First-class boundary artifactNoUsually noYes (fuse-graph.json)
Scope-constrained edge generationLimitedRareYes (spectral masks)

External head-to-head benchmark (Louvain on BGI's edges vs Louvain on raw imports, scored against package layout): BGI's edges win on Python (django F1 0.38 vs 0.29, MoJoFM 0.45 vs 0.34) and currently tie/lose on Go due to lower cross-file edge density on tier-2 scanners. Full results and methodology in docs/VALIDATION_EVIDENCE.md.


Evidence (current, verifiable)

Large-repo scale evidence

Comparable kubernetes sample (go comparable mode, 162,917 units):

  • Gate 1: 141.964s
  • Gate 2: 67.261s (historical comparable baseline: 138.869s)
  • Gate 3: 9.359s
  • Total: 218.584s
  • Max cluster: 1.113%
  • Fuse events: 0

Artifact: output/validation/kubernetes-optionb-controlled-median-v21.json

Quality guard evidence (beyond raw speed)

  • Gate 2 scope safety tests block invalid cross-scope merges (see tests/test_gate2.py).
  • Gate 3 tests verify no legacy namespace over-merge without import evidence (see tests/test_gate3.py).
  • Current full suite status: python3 -m pytest tests/ -x -q (project baseline target remains passing).

Evidence summary

  • Current published validation set: 100 scored runs across 5 repos and 3 models.
  • Full 20-run post-shipment benchmark refresh for BGI-TWIN context (task → COV → top-3 twins + seam + rubric) is complete: actionability 4.75/5 (p04 slice: 4.8/5), boundary 1.0, hallucinations 0.
  • Independent-model replication is now complete on azure/gpt-4o (20 runs) and gemini/auto (20 runs): GPT-4o actionability 4.85/5, Gemini actionability 4.25/5, both with zero hallucinations; Gemini boundary 0.95 reflects one genuine django/p02 miss.
  • Still missing: labeled precision/recall benchmark on an external corpus and head-to-head quantitative benchmark vs external tools on the same labeled dataset.

Language support tiers (explicit)

BGI does not treat all languages equally; support is tiered:

  1. Query-backed (.scm): python, typescript, tsx, javascript, go, rust, java, csharp, php, ruby, kotlin, scala
  2. Tree-sitter scanner + rule path: c, lua, elixir
  3. Generic regex fallback by extension: swift, r, dart, bash, nim, zig, haskell, ocaml, fsharp, clojure, erlang, matlab, vb, crystal, cobol, groovy

Use this as a reliability signal: query-backed and dedicated scanner tiers are stronger than generic fallback.

Cross-file edge density caveat: the language tiers above describe parser quality. A separate axis is cross-file behavioral edge density — how many key-lock pairs the scanner produces that link units in different files. Tier-1 (.scm-backed) languages produce dense cross-file edges. Tier-2 scanner-backed languages currently produce sparser cross-file edges because their token mix is dominated by structural tokens (INTAKE/OUTPUT/CONDITIONAL/LOOP) that gate-2 deliberately scopes to same-file to prevent O(N²) noise. The user-visible MCP product (boundary detection, twin retrieval, AI-assistant context) still works on tier-2 languages — see the validation evidence — but cluster-recovery benchmarks against import-graph baselines reflect this density gap. Concrete numbers in docs/VALIDATION_EVIDENCE.md.


Limitations and non-goals

  1. BGI is static analysis; it does not ingest runtime traces.
  2. Cross-file semantic resolution is heuristic and language-dependent.
  3. Cluster-size health is measured; full external precision/recall is not yet published.
  4. Shared-host benchmarking introduces variance; decisions should use controlled medians.

Install

pip install -e .

Quickstart commands

# scan
bgi scan /path/to/repo --lang auto --out bgi-graph.json

# optional outputs
bgi scan /path/to/repo --lang auto \
  --fuse-graph fuse-graph.json \
  --routes routes.json \
  --graphml graph.graphml \
  --html

# incremental
bgi scan /path/to/repo --lang auto --incremental --cache .bgi-cache.json

# diff
bgi diff /path/before /path/after --lang auto --out diff.json

# run MCP server over generated artifacts
bgi mcp --graph bgi-graph.json --fuse-graph fuse-graph.json

Example MCP usage pattern (from your client prompt):

Use MCP tool twin_context for:
"Add endpoint that validates input and persists data."
Return top twin candidate, seam suggestion, and rubric checklist.

Telemetry

BGI ships with opt-in, off-by-default anonymous telemetry. To enable:

export BGI_TELEMETRY=1
bgi mcp --graph bgi-graph.json --fuse-graph fuse-graph.json

What's collected when enabled: BGI version, OS, repo size bucket, and a 12-char hash of your repo's git remote (so we can deduplicate "same repo seen twice" without ever knowing which repo). What's never collected: file paths, source code, repo names, user identity, or IP addresses. Full schema and disable instructions in docs/TELEMETRY.md.


Documentation map

  • MEMORANDUM.md - design contracts and invariants
  • docs/LANGUAGE_SUPPORT.md - language implementation details
  • docs/CONTRIBUTING_LANGUAGES.md - language contribution guide
  • docs/INDEX_SCHEMA.md - interactive index schema
  • docs/QUERY_PLANNER.md - query planner scoring
  • docs/MCP_SETUP.md - MCP server setup and usage
  • docs/MCP_WITH_CONTINUE.md - 5-minute Continue + BGI walkthrough
  • docs/TELEMETRY.md - opt-in telemetry: what we collect and how to disable
  • https://bigindexer.com/validation - public validation evidence
  • docs/MCP_QUICKSTART_DEMO.md - 5-minute demo walkthrough
  • docs/MCP_EXAMPLE_TRANSCRIPTS.md - real-world MCP tool invocation examples
  • docs/MCP_REAL_TRANSCRIPT.md - unedited transcript from FastAPI analysis
  • scripts/mcp-demo.sh - automated demo script for multiple CLIs and repositories

License and Copyright

  • License: Apache License 2.0 (LICENSE)
  • Contributor terms: Developer Certificate of Origin (DCO) enforced on pull requests
Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Registryactive
Packagebigindexer
TransportSTDIO
UpdatedMay 13, 2026
View on GitHub