Doctree Mcp

1authSTDIOregistry active

Summary

Gives Claude a navigable tree over your markdown, CSV, and JSONL docs instead of dumping everything into context. Exposes BM25 search, table-of-contents traversal, and node-by-node content retrieval so the agent can drill down breadcrumb-style rather than guess from flat chunks. No vector DB, no embeddings, no LLM calls at index time. The bundled skills teach procedural knowledge: search first, inspect the outline, navigate to specific sections, retrieve what you need. Also includes wiki write tools with duplicate detection and schema validation if you want the agent maintaining runbooks. Works over stdio for local dev or deploys as Streamable HTTP for team use. Think research librarian behavior, not keyword lottery.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

doctree-mcp

Agentic document retrieval over markdown, CSV, and JSONL. BM25 + tree navigation via MCP — no vector DB, no embeddings, no LLM calls at index time.

The pitch: MCP provides the structural primitives (a navigable tree, BM25, glossary, row lookup). The bundled skills provide the procedural knowledge (how to walk that tree). Together the agent behaves like a trained research librarian — not a one-shot searcher. See The Skill + MCP Pattern.

Quick Start

Have docs already? Point a client at them:

# In your AI tool's MCP config — see docs/CLIENTS.md for per-tool snippets
{ "mcpServers": { "doctree": {
    "command": "bunx", "args": ["doctree-mcp"],
    "env": { "DOCS_ROOT": "./docs", "WIKI_WRITE": "1" }
} } }

Restart the tool → ask "search the docs for X" or invoke the doc-read prompt.

Starting fresh? Scaffold a Karpathy-style LLM wiki:

bunx doctree-mcp init          # configure current tool
bunx doctree-mcp init --all    # configure every supported client
bunx doctree-mcp init --dry-run

Creates docs/wiki/ (LLM-maintained) + docs/raw-sources/ (your inputs), writes the MCP config, installs a post-write lint hook, appends wiki conventions to CLAUDE.md / AGENTS.md / .cursor/rules/.

Operation Modes

Mode	Use when	Guide
stdio (default)	Local dev, agent on your machine	Client setup
HTTP (Streamable HTTP)	Teams, CI, hosted agents	Deployment — Railway · Fly · Render · Cloudflare Containers · Docker
CLI	`init`, `lint`, debug-index	Operation modes

Full decision tree: Operation Modes.

How It Works — Retrieve · Curate · Add

Agent: "How does token refresh work?"

→ search_documents("token refresh")
  #1  auth/middleware.md § Token Refresh Flow       score: 12.4
  #2  auth/oauth.md       § Refresh Token Lifecycle  score: 8.7

→ get_tree("docs:auth:middleware")
  [n1] # Auth Middleware
    [n4] ## Token Refresh Flow
      [n5] ### Automatic Refresh

→ navigate_tree("docs:auth:middleware", "n4")   ← n4 + descendants

Core read tools (always on):

Tool	Purpose
`search_documents`	BM25 keyword search + facet filters + glossary expansion (markdown · CSV · JSONL)
`get_tree`	Table of contents — headings, word counts, summaries
`get_node_content`	Full text of a specific section by node ID
`navigate_tree`	A section plus all descendants in one call
`lookup_row`	O(1) exact-key lookup for structured data rows (e.g. `PROJ-44`)

Wiki write tools (opt-in with WIKI_WRITE=1):

Tool	Purpose
`find_similar`	Duplicate detection with overlap ratios
`draft_wiki_entry`	Scaffold: suggested path, inferred frontmatter, glossary hits
`write_wiki_entry`	Validated write: path containment, schema, duplicate guards, dry-run

Safety: path containment · frontmatter validation · duplicate detection · dry-run · overwrite protection.

Deprecated aliases (list_documents, find_files, find_symbol) are superseded by search_documents — still functional, no longer recommended.

The Skill + MCP Pattern

Most retrieval tools hand the agent a search box and hope for the best. doctree-mcp hands it a tree, and the bundled skills teach it how to walk one.

MCP = structural primitives. search_documents, get_tree, navigate_tree, get_node_content, lookup_row return tree positions the agent reasons over — not finished answers.
Skills = procedural knowledge. /doc-read, /doc-write, /doc-lint encode breadcrumb drill-down: search → outline → navigate → retrieve. The agent learns the policy, not just the API.

That pairing doesn't exist cleanly elsewhere:

Approach	Primitive	Skill teaches	Gap
Managed hybrid RAG (Cloudflare AI Search, Nia)	Flat chunks + similarity	—	Black-box score, no audit trail
Tool-returns-answer (Context7)	2 tools returning answers	Query shape	Agent can't reason about skipped content
Skill-over-CLI (QMD)	CLI over flat search	Query expansion	No tree to navigate
doctree-mcp + `/doc-read`	Navigable tree	Breadcrumbs, multi-instance routing, wiki compilation	—

Why iterative retrieval wins:

Context rot. Stuffing a 1M-token window with chunks degrades output. Breadcrumb navigation keeps working memory small.
Auditability. search_documents → get_tree → navigate_tree → get_node_content is a replayable trail. A cosine score is not. Regulated domains can ship the former.
Progressive disclosure. Fewer navigable primitives beat tool sprawl (cf. Cloudflare Code Mode).

Multi-instance = client-side federation. Register several doctree servers under different names; the /doc-read skill encodes the routing policy. Add or remove instances without touching the skill. See Client setup → Multi-instance routing.

The LLM Wiki Pattern

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Raw Sources    │     │  The Wiki        │     │  The Schema     │
│  (immutable)    │ ──→ │  (LLM-maintained)│ ←── │  (you define)   │
│  notes · logs   │     │  runbooks · refs │     │  CLAUDE.md rules │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Inspired by Karpathy's LLM Wiki. Full walkthrough: docs/LLM-WIKI-GUIDE.md.

Configuration (summary)

---
title: "Descriptive Title"
description: "One-line summary — boosts ranking"
tags: [relevant, terms]
type: runbook          # runbook | guide | reference | tutorial | architecture | adr
category: auth
---

All non-reserved frontmatter fields become filter facets:

search_documents("auth", filters: { type: "runbook", tags: ["production"] })

Common env vars:

Variable	Default	Description
`DOCS_ROOT`	`./docs`	Docs folder
`DOCS_GLOB`	`*/.md`	Comma-separated globs (`*/.md,*/.csv,*/.jsonl`)
`DOCS_ROOTS`	—	Weighted multi-collection (`./wiki:1.0,./rfcs:0.5`)
`PORT`	`3100`	HTTP mode port
`WIKI_WRITE`	(unset)	`1` enables write tools
`GLOSSARY_PATH`	`$DOCS_ROOT/glossary.json`	Query-expansion glossary

Full reference: docs/CONFIGURATION.md.

Glossary — place glossary.json in docs root for bidirectional query expansion:

{ "CLI": ["command line interface"], "K8s": ["kubernetes"] }

Acronym definitions like "TLS (Transport Layer Security)" are also auto-extracted.

Structured data — CSV/JSONL files become documents where each row is a tree node. Column roles (id, title, description, facets, URL) are auto-detected from headers. See docs/STRUCTURED-DATA.md.

Running from Source

git clone https://github.com/joesaby/doctree-mcp.git
cd doctree-mcp && bun install

DOCS_ROOT=./docs bun run serve          # stdio
DOCS_ROOT=./docs bun run serve:http     # HTTP (port 3100)
DOCS_ROOT=./docs bun run index          # CLI: inspect indexed output
bun test

Performance

Operation	Time	Token cost
Full index (900 docs)	2–5s	0
Incremental re-index	~50ms	0
Search	5–30ms	~300–1K tokens
Tree outline	<1ms	~200–800 tokens

Docs

Setup & operation

Operation Modes — stdio · HTTP · CLI
Client Setup — Claude Code · Cursor · Windsurf · Codex · OpenCode · Claude Desktop
Deployment — Railway · Fly.io · Render · Cloudflare Containers · Docker
Configuration — env vars, frontmatter, ranking tuning

Patterns & concepts

LLM Wiki Guide — agent-maintained knowledge base walkthrough
Structured Data — CSV / JSONL indexing
Architecture & Design — BM25 internals, tree navigation
Competitive Analysis — PageIndex, QMD, GitMCP, Context7, managed RAG

Source

Prompts — MCP prompt templates
Skills: /doc-read · /doc-write · /doc-lint

Standing on Shoulders

PageIndex — hierarchical tree navigation
Pagefind by CloudCannon — BM25 scoring, positional index, facets
Bun.markdown by Oven — native CommonMark parser
Karpathy's LLM Wiki — the LLM-maintained wiki pattern

License

MIT

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Configuration

DOCS_ROOT*

Path to your markdown repository root

DOCS_GLOB

Glob pattern for finding markdown files (default: **/*.md)

DOCS_ROOTS

Multiple weighted collections: ./docs:1.0,./api:0.8 (alternative to DOCS_ROOT)

GLOSSARY_PATH

Path to glossary.json for query expansion (default: $DOCS_ROOT/glossary.json)

doctree-mcp

Agentic document retrieval over markdown, CSV, and JSONL. BM25 + tree navigation via MCP — no vector DB, no embeddings, no LLM calls at index time.

Quick Start

Have docs already? Point a client at them:

# In your AI tool's MCP config — see docs/CLIENTS.md for per-tool snippets
{ "mcpServers": { "doctree": {
    "command": "bunx", "args": ["doctree-mcp"],
    "env": { "DOCS_ROOT": "./docs", "WIKI_WRITE": "1" }
} } }

Restart the tool → ask "search the docs for X" or invoke the doc-read prompt.

Starting fresh? Scaffold a Karpathy-style LLM wiki:

bunx doctree-mcp init          # configure current tool
bunx doctree-mcp init --all    # configure every supported client
bunx doctree-mcp init --dry-run

Operation Modes

Mode	Use when	Guide
stdio (default)	Local dev, agent on your machine	Client setup
HTTP (Streamable HTTP)	Teams, CI, hosted agents	Deployment — Railway · Fly · Render · Cloudflare Containers · Docker
CLI	`init`, `lint`, debug-index	Operation modes

Full decision tree: Operation Modes.

How It Works — Retrieve · Curate · Add

Agent: "How does token refresh work?"

→ search_documents("token refresh")
  #1  auth/middleware.md § Token Refresh Flow       score: 12.4
  #2  auth/oauth.md       § Refresh Token Lifecycle  score: 8.7

→ get_tree("docs:auth:middleware")
  [n1] # Auth Middleware
    [n4] ## Token Refresh Flow
      [n5] ### Automatic Refresh

→ navigate_tree("docs:auth:middleware", "n4")   ← n4 + descendants

Core read tools (always on):

Tool	Purpose
`search_documents`	BM25 keyword search + facet filters + glossary expansion (markdown · CSV · JSONL)
`get_tree`	Table of contents — headings, word counts, summaries
`get_node_content`	Full text of a specific section by node ID
`navigate_tree`	A section plus all descendants in one call
`lookup_row`	O(1) exact-key lookup for structured data rows (e.g. `PROJ-44`)

Wiki write tools (opt-in with WIKI_WRITE=1):

Tool	Purpose
`find_similar`	Duplicate detection with overlap ratios
`draft_wiki_entry`	Scaffold: suggested path, inferred frontmatter, glossary hits
`write_wiki_entry`	Validated write: path containment, schema, duplicate guards, dry-run

Safety: path containment · frontmatter validation · duplicate detection · dry-run · overwrite protection.

Deprecated aliases (list_documents, find_files, find_symbol) are superseded by search_documents — still functional, no longer recommended.

The Skill + MCP Pattern

Most retrieval tools hand the agent a search box and hope for the best. doctree-mcp hands it a tree, and the bundled skills teach it how to walk one.

MCP = structural primitives. search_documents, get_tree, navigate_tree, get_node_content, lookup_row return tree positions the agent reasons over — not finished answers.
Skills = procedural knowledge. /doc-read, /doc-write, /doc-lint encode breadcrumb drill-down: search → outline → navigate → retrieve. The agent learns the policy, not just the API.

That pairing doesn't exist cleanly elsewhere:

Approach	Primitive	Skill teaches	Gap
Managed hybrid RAG (Cloudflare AI Search, Nia)	Flat chunks + similarity	—	Black-box score, no audit trail
Tool-returns-answer (Context7)	2 tools returning answers	Query shape	Agent can't reason about skipped content
Skill-over-CLI (QMD)	CLI over flat search	Query expansion	No tree to navigate
doctree-mcp + `/doc-read`	Navigable tree	Breadcrumbs, multi-instance routing, wiki compilation	—

Why iterative retrieval wins:

Context rot. Stuffing a 1M-token window with chunks degrades output. Breadcrumb navigation keeps working memory small.
Auditability. search_documents → get_tree → navigate_tree → get_node_content is a replayable trail. A cosine score is not. Regulated domains can ship the former.
Progressive disclosure. Fewer navigable primitives beat tool sprawl (cf. Cloudflare Code Mode).

The LLM Wiki Pattern

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Raw Sources    │     │  The Wiki        │     │  The Schema     │
│  (immutable)    │ ──→ │  (LLM-maintained)│ ←── │  (you define)   │
│  notes · logs   │     │  runbooks · refs │     │  CLAUDE.md rules │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Inspired by Karpathy's LLM Wiki. Full walkthrough: docs/LLM-WIKI-GUIDE.md.

Configuration (summary)

---
title: "Descriptive Title"
description: "One-line summary — boosts ranking"
tags: [relevant, terms]
type: runbook          # runbook | guide | reference | tutorial | architecture | adr
category: auth
---

All non-reserved frontmatter fields become filter facets:

search_documents("auth", filters: { type: "runbook", tags: ["production"] })

Common env vars:

Variable	Default	Description
`DOCS_ROOT`	`./docs`	Docs folder
`DOCS_GLOB`	`*/.md`	Comma-separated globs (`*/.md,*/.csv,*/.jsonl`)
`DOCS_ROOTS`	—	Weighted multi-collection (`./wiki:1.0,./rfcs:0.5`)
`PORT`	`3100`	HTTP mode port
`WIKI_WRITE`	(unset)	`1` enables write tools
`GLOSSARY_PATH`	`$DOCS_ROOT/glossary.json`	Query-expansion glossary

Full reference: docs/CONFIGURATION.md.

Glossary — place glossary.json in docs root for bidirectional query expansion:

{ "CLI": ["command line interface"], "K8s": ["kubernetes"] }

Acronym definitions like "TLS (Transport Layer Security)" are also auto-extracted.

Structured data — CSV/JSONL files become documents where each row is a tree node. Column roles (id, title, description, facets, URL) are auto-detected from headers. See docs/STRUCTURED-DATA.md.

Running from Source

git clone https://github.com/joesaby/doctree-mcp.git
cd doctree-mcp && bun install

DOCS_ROOT=./docs bun run serve          # stdio
DOCS_ROOT=./docs bun run serve:http     # HTTP (port 3100)
DOCS_ROOT=./docs bun run index          # CLI: inspect indexed output
bun test

Performance

Operation	Time	Token cost
Full index (900 docs)	2–5s	0
Incremental re-index	~50ms	0
Search	5–30ms	~300–1K tokens
Tree outline	<1ms	~200–800 tokens

Docs

Setup & operation

Operation Modes — stdio · HTTP · CLI
Client Setup — Claude Code · Cursor · Windsurf · Codex · OpenCode · Claude Desktop
Deployment — Railway · Fly.io · Render · Cloudflare Containers · Docker
Configuration — env vars, frontmatter, ranking tuning

Patterns & concepts

LLM Wiki Guide — agent-maintained knowledge base walkthrough
Structured Data — CSV / JSONL indexing
Architecture & Design — BM25 internals, tree navigation
Competitive Analysis — PageIndex, QMD, GitMCP, Context7, managed RAG

Source

Prompts — MCP prompt templates
Skills: /doc-read · /doc-write · /doc-lint

Standing on Shoulders

PageIndex — hierarchical tree navigation
Pagefind by CloudCannon — BM25 scoring, positional index, facets
Bun.markdown by Oven — native CommonMark parser
Karpathy's LLM Wiki — the LLM-maintained wiki pattern

License

MIT

Doctree Mcp

doctree-mcp

Quick Start

Operation Modes

How It Works — Retrieve · Curate · Add

The Skill + MCP Pattern

The LLM Wiki Pattern

Configuration (summary)

Running from Source

Performance

Docs

Standing on Shoulders

License

Configuration

Doctree Mcp

doctree-mcp

Quick Start

Operation Modes

How It Works — Retrieve · Curate · Add

The Skill + MCP Pattern

The LLM Wiki Pattern

Configuration (summary)

Running from Source

Performance

Docs

Standing on Shoulders

License

Configuration

Related AI & LLM Tools MCP Servers

Related AI & LLM Tools MCP Servers