Rag

STDIOregistry active

Summary

A production-grade RAG implementation that handles the messy parts: multi-format ingestion (DOCX, PDF with OCR, audio via Whisper), Korean-optimized chunking, and hybrid BM25 plus vector search with RRF. The indexing pipeline uses AI-powered chunk contextualization to generate bilingual keywords and preserve document context across chunking boundaries. A two-commit model saves expensive OCR and transcription work before hitting embedding APIs, so a rate limit or crash doesn't force you to re-process a 600-page scan. Supports Ollama, OpenAI, Gemini, and Anthropic for both embedding and contextualization. Exposes search tools over MCP stdio and runs headless indexing via CLI. Handles multiple knowledge bases in a single server process with SQLite WAL for concurrent read access during writes.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

FieldCure MCP RAG Server

A Model Context Protocol (MCP) server for indexing and searching local document collections. Supports DOCX, HWPX, PDF (with OCR), Excel, PowerPoint, and audio (Whisper transcription, Windows-only), with hybrid keyword + semantic search optimized for Korean and English.

Built with C# and the official MCP C# SDK.

Commands

fieldcure-mcp-rag
├── serve         --base-path <path>                         # Multi-KB MCP search server (stdio)
├── exec          --path <kb-path> [--force] [--partial ...]  # Headless indexing for a single KB
├── exec-queue    --queue-file <path> [--sweep-all]           # Process deferred indexing queue
├── prune-orphans --base-path <path>                         # Delete orphan KB folders
└── smoke-ocr     --pdf <scanned.pdf>                        # Self-test: OCR a scanned PDF (Windows)

serve — read-only MCP server serving all knowledge bases under the base path. Single process handles multiple KBs via kb_id parameter. Can run while exec is indexing (SQLite WAL).
exec — scans source folders, chunks documents, contextualizes with AI, embeds, stores in SQLite. --partial re-runs only downstream stages when models change, preserving OCR output.
exec-queue — sequential orchestrator consuming a deferred indexing queue. One entry at a time, no GPU contention. --sweep-all processes deferred entries too (used at app shutdown).
prune-orphans — deletes orphan KB folders (GUID-named, no config.json). Protected folders (., _ prefix, -backup-) are never touched.
smoke-ocr — diagnostic mode. Loads a scanned PDF through the OCR fallback parser, prints recognized text to stdout, and exits 0 on a non-empty result. Surfaces DllNotFoundException / BadImageFormatException distinctly so a missing or arch-mismatched native is immediately visible. Useful for verifying that the OCR native path is wired correctly on a given host (notably win-arm64 dnx installs).

Features

Search

Hybrid BM25 + vector search with Reciprocal Rank Fusion (RRF)
BM25-only fallback when no embedding provider is configured
Korean-optimized chunking (sentence boundary, decimal protection, parenthesis-aware)
SIMD-accelerated cosine similarity via System.Numerics.Vector
FTS5 trigram index for substring and CJK-friendly keyword matching

Indexing

Incremental indexing with SHA256 change detection
AI-powered chunk contextualization with bilingual keyword enrichment (see Chunk Contextualization)
2-commit pipeline preserves expensive upstream work across embedding failures (see How Indexing Works)
Math equation extraction from DOCX/HWPX as [math: LaTeX] blocks
PDF with OCR fallback (Tesseract eng+kor) for scanned pages
Audio transcription (.mp3, .wav, .m4a, .ogg, .flac, .webm) via Whisper.net — Windows-only. Model size (Tiny→Large) is auto-selected from detected GPU/RAM/cores at startup; each transcript chunk records audio.model_size and audio.transcribed_at for future reindex auditing
Cross-process indexing lock with stale PID auto-cleanup
Orphan cleanup for deleted files

Queue Orchestrator

All indexing requests flow through start_reindex MCP tool — no direct exec spawn
Scope merge rules: full ⊃ contextualization ⊃ embedding (duplicate requests upgrade, not duplicate)
PID-based orchestrator lock with reuse defense (orchestrator.lock)
Logical KB deletion (config.json removal) + prune-orphans physical cleanup
Deferred indexing for app-shutdown batch processing (--sweep-all)

Operations

Multi-KB serve: single process serves all knowledge bases under a base path, lazy-loaded per KB
SQLite WAL mode allows search during indexing
Graceful shutdown via cancel file
Per-KB config.json with provider configuration

Integration

Ollama native — embedding via /api/embed, contextualization via /api/chat with keep_alive and num_ctx support. Requires Ollama 0.4.0+.
OpenAI-compatible — embedding via /v1/embeddings, contextualization via /v1/chat/completions. Works with OpenAI, Azure OpenAI, Groq, LM Studio, Together AI.
Gemini native — embedding via /v1beta/models/{model}:embedContent with task_type asymmetric retrieval (RETRIEVAL_DOCUMENT / RETRIEVAL_QUERY) and Matryoshka dimension truncation (768 / 1536 / 3072). gemini-embedding-2, multilingual, 8k token input.
Anthropic — contextualization via /v1/messages.
API keys via environment variables — OPENAI_API_KEY, ANTHROPIC_API_KEY, etc. Batch indexing commands (exec, exec-queue) are env-var-only. Interactive MCP search can fall back to MCP elicitation when the client supports it.
Standard MCP stdio transport (JSON-RPC over stdin/stdout)

Chunk Contextualization

Standard RAG chunking loses context — a sentence about "the protocol" becomes ambiguous when ripped from its surrounding paragraphs. This server addresses that with Unified Chunk Contextualization: a single LLM call per chunk that produces both contextual framing and bilingual (Korean + English) keywords in one pass.

The result is stored alongside the original chunk text:

Original text is preserved for accurate retrieval display
Contextualized text is what gets embedded and indexed in BM25
Bilingual keywords enable cross-lingual search — a Korean query can retrieve English documents and vice versa

This is enabled by setting contextualizer in config.json. It can be disabled (set provider/model to empty) if you prefer raw chunk indexing.

How Indexing Works

The exec command runs a 5-stage pipeline per file:

Extract — text from document (DOCX, PDF OCR, audio transcription, etc.)
Chunk — split into ~1000 char windows
Contextualize — LLM enrichment (optional, see above)
Embed — vector embedding via API
Persist — save to SQLite

For large files, Stage 1 alone can take 20+ minutes — OCR on a 596-page scanned PDF, or Whisper transcription of a multi-hour audio recording. The first audio file in any KB also pays a one-time ggml model download (cached under {UserProfile}/.fieldcure/whisper-models/). To prevent expensive upstream work from being lost when later stages fail, the pipeline uses a 2-commit model:

Stages 1-3 (Extract → Chunk → Contextualize)
        ↓
[Commit 1] chunks saved as PendingEmbedding
        ↓
Stage 4 (Embed)
   ├─ success → [Commit 2a] promote chunks to Indexed
   └─ failure → chunks remain PendingEmbedding (retry next exec)

Why this matters: A 25-minute OCR result is persisted on disk before any embedding API call. If Stage 4 fails (network error, rate limit, token limit, process crash, even power loss), the chunks survive. The next exec hash-skips the file (no OCR re-run) and the deferred retry pass attempts only Stage 4.

Per-Chunk Failure Isolation (Binary Split)

If a single chunk in a file exceeds the embedding model's token limit (e.g., a math-dense page in a textbook), the binary split algorithm isolates that one chunk:

EmbedBatch([0..1249])         → 400 "input[846] too long"
  ├─ EmbedBatch([0..624])     → OK (promote 625)
  └─ EmbedBatch([625..1249])  → 400
      ├─ EmbedBatch([625..937])  → 400
      │   ... (binary search narrows toward chunk 846)
      │   └─ EmbedBatch([846..846]) → 400 (mark chunk 846 Failed)
      └─ EmbedBatch([938..1249]) → OK (promote 312)

Result: 1249 chunks indexed, only chunk 846 marked Failed. The file's status becomes Degraded — partially searchable instead of completely missing.

Deferred Retry Pass

Each exec ends with a retry pass over any chunks left in PendingEmbedding state from previous runs:

Reads enriched text from DB — no OCR or contextualization re-run
Calls the embedding API only — typically seconds, not minutes
Up to 3 retries per chunk; on exhaustion, the chunk is marked Failed
Auth errors (401/403) flag the provider as unavailable and skip the rest of the pass

File States

Status	Meaning	Hash-skip behavior
`Ready`	Fully indexed	Skip if hash matches
`Degraded`	Some chunks failed (binary-split isolated)	Skip if hash matches
`PartiallyDeferred`	Chunks pending embedding retry	Main loop skips; deferred pass picks up
`Failed`	Extraction or repeated embedding failure	Skip; requires `--force` to retry
`NeedsAction`	User intervention required	Skip with separate counter

Schema Versioning

Each KB DB carries a PRAGMA user_version tag. The exec command migrates older schemas automatically as part of InitializeSchema(). The serve command opens DBs read-only and never triggers migration — older-schema KBs continue to serve search queries correctly while their new-feature columns remain unused.

Installation

dotnet tool (recommended)

dotnet tool install -g FieldCure.Mcp.Rag

From source

git clone https://github.com/fieldcure/fieldcure-mcp-rag.git
cd fieldcure-mcp-rag
dotnet build

Requirements

.NET 8.0 Runtime or later
OCR: Windows x64 only — Tesseract OCR for scanned PDFs loads lazily on first use (Windows only). On other platforms, PDFs with embedded text work normally; scanned pages without a text layer are silently skipped.
An embedding provider (Ollama, OpenAI, etc.) — optional, BM25 search works without it
Ollama 0.4.0 or later (if using Ollama for embedding or contextualization)

Quick Start

Index a folder and search it without any embedding setup (BM25 only):

# 1. Install
dotnet tool install -g FieldCure.Mcp.Rag

# 2. Create a minimal config
$kbPath = "$env:LOCALAPPDATA\FieldCure\Mcp.Rag\demo"
New-Item -ItemType Directory -Force -Path $kbPath
@'
{
  "id": "demo",
  "name": "Demo KB",
  "sourcePaths": ["C:\\my-docs"]
}
'@ | Set-Content "$kbPath\config.json"

# 3. Index
fieldcure-mcp-rag exec --path $kbPath

# 4. Start the search server
fieldcure-mcp-rag serve --base-path "$env:LOCALAPPDATA\FieldCure\Mcp.Rag"

For full retrieval quality with semantic search and contextualization, add embedding and contextualizer blocks to config.json — see Usage below.

Usage

1. Create a knowledge base folder

%LOCALAPPDATA%\FieldCure\Mcp.Rag\{kb-id}\config.json

{
  "id": "my-kb-001",
  "name": "Project Docs",
  "created": "2026-04-03T00:00:00Z",
  "sourcePaths": ["C:\\Users\\me\\Documents\\project-docs"],
  "contextualizer": {
    "provider": "anthropic",
    "model": "claude-haiku-4-5-20251001",
    "apiKeyPreset": "Claude"
  },
  "embedding": {
    "provider": "openai",
    "model": "text-embedding-3-small",
    "apiKeyPreset": "OpenAI"
  }
}

API keys are resolved from environment variables: apiKeyPreset: "OpenAI" → OPENAI_API_KEY, "Claude" → ANTHROPIC_API_KEY, "Gemini" (or "Google") → GEMINI_API_KEY.

Gemini embedding example — asymmetric retrieval with 1536-dim Matryoshka truncation (50% storage of full 3072 with identical MTEB score):

"embedding": {
  "provider": "gemini",
  "model": "gemini-embedding-2",
  "apiKeyPreset": "Gemini",
  "dimension": 1536
}

Dimension	MTEB	Storage	Use case
768	67.99	25%	Storage-constrained
1536	68.17	50%	Recommended default
3072	68.17	100%	Maximum quality (pre-normalized)
In `serve` mode, `search_documents` can also prompt via MCP elicitation when the client supports it. In `exec` and `exec-queue`, missing keys must be provided via environment variables.

2. Index documents

fieldcure-mcp-rag exec --path "C:\Users\me\AppData\Local\FieldCure\Mcp.Rag\my-kb-001"

3. Start MCP search server

fieldcure-mcp-rag serve --base-path "C:\Users\me\AppData\Local\FieldCure\Mcp.Rag"

A single serve process handles all knowledge bases under the base path. Tools accept a kb_id parameter to target a specific KB.

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "rag": {
      "command": "fieldcure-mcp-rag",
      "args": ["serve", "--base-path", "C:\\Users\\me\\AppData\\Local\\FieldCure\\Mcp.Rag"],
      "env": {
        "OPENAI_API_KEY": "sk-...",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

config.json Reference

Field	Description
`id`	Knowledge base identifier
`name`	Display name
`sourcePaths`	List of folders to index (multiple supported)
`contextualizer.provider`	`"anthropic"`, `"openai"`, `"ollama"`, or empty to disable
`embedding.provider`	`"openai"`, `"ollama"`, `"gemini"`, or empty to disable
`embedding.dimension`	Output dimension. `0` = provider default. Gemini supports MRL truncation: 768 / 1536 / 3072.
`contextualizer.model`	Model ID, or empty to disable contextualization
`contextualizer.apiKeyPreset`	Maps to env var: `"OpenAI"` → `OPENAI_API_KEY`, `"Claude"` → `ANTHROPIC_API_KEY`
`contextualizer.baseUrl`	API base URL override (null = provider default)
`embedding.*`	Same structure as contextualizer
`embedding.maxChunkChars`	Max chars per chunk before pre-split (default: 4000)
`embedding.batchSize`	Max chunks per embedding API call (default: auto from provider table)
`embedding.keepAlive`	Ollama only: VRAM retention duration (default: `"5m"`)
`embedding.numCtx`	Ollama only: context window tokens (default: 8192). Contextualizer only.
`systemPrompt`	Custom system prompt for contextualization (null = built-in default)

Tools

All tools (except list_knowledge_bases) require a kb_id parameter to specify the target knowledge base.

Tool	Description
`list_knowledge_bases`	List all available KBs with status (file/chunk counts, indexing status)
`search_documents`	Hybrid BM25 + vector search with RRF. Supports `search_mode`: `auto`, `bm25`, `vector`
`get_document_chunk`	Retrieve full content of a specific chunk by ID
`start_reindex`	Queue an indexing request. Scope merge, force/deferred flags, orchestrator auto-spawn
`cancel_reindex`	Remove a pending (not-yet-started) queue entry
`get_index_info`	Index metadata, queue state (status/position/deferred/last_error), contextualization health
`check_changes`	Dry-run filesystem scan. Lightweight, no API calls

Search Modes

`search_mode`	Behavior
`auto`	Hybrid when embedding available, else BM25. Recommended
`bm25`	Keyword-only (FTS5). No embedding call
`vector`	Semantic-only. Errors if no embedding provider

Supported Formats

Document formats are provided by FieldCure.DocumentParsers:

DOCX — Microsoft Word (with math equation extraction)
HWPX — Korean standard document (OWPML, with math equation extraction)
XLSX — Excel spreadsheets
PPTX — PowerPoint presentations
PDF — PDF text extraction with ## Page N headers; OCR fallback for scanned pages (Tesseract, eng+kor)
TXT, MD — Plain text / Markdown

Project Structure

src/FieldCure.Mcp.Rag/
├── Program.cs                     # CLI entry (exec | exec-queue | serve | prune-orphans)
├── MultiKbContext.cs              # Multi-KB manager (lazy load, Classify, lazy unload)
├── ExecQueueRunner.cs             # Deferred queue orchestrator
├── OrphanCleanupRunner.cs         # prune-orphans CLI
├── Configuration/
│   ├── RagConfig.cs               # config.json model (KeepAlive, NumCtx fields)
│   └── OllamaDefaults.cs          # Shared defaults (KeepAlive="5m", NumCtx=8192)
├── Indexing/
│   ├── IndexingEngine.cs          # 5-stage pipeline (2-commit model)
│   └── EmbeddingBatchSplitter.cs  # Binary-split per-chunk failure isolation
├── Contextualization/
│   ├── IChunkContextualizer.cs
│   ├── OpenAiChunkContextualizer.cs   # /v1/chat/completions
│   ├── OllamaChunkContextualizer.cs   # /api/chat (keep_alive + num_ctx)
│   ├── AnthropicChunkContextualizer.cs
│   └── NullChunkContextualizer.cs
├── Embedding/
│   ├── IEmbeddingProvider.cs
│   ├── OpenAiCompatibleEmbeddingProvider.cs  # /v1/embeddings
│   ├── OllamaEmbeddingProvider.cs            # /api/embed (keep_alive)
│   ├── NullEmbeddingProvider.cs
│   └── EmbeddingBatchSizes.cs
├── Storage/
│   └── SqliteVectorStore.cs       # SQLite + FTS5 + SIMD cosine similarity
├── Search/
│   ├── HybridSearcher.cs          # BM25 + Vector → RRF
│   └── RrfFusion.cs
├── Chunking/
│   ├── TextChunker.cs
│   └── ChunkLimits.cs
└── Tools/
    ├── ListKnowledgeBasesTool.cs
    ├── SearchDocumentsTool.cs
    ├── GetDocumentChunkTool.cs
    ├── StartReindexTool.cs        # Queue entry point + orchestrator spawn
    ├── CancelReindexTool.cs       # Remove pending queue entry
    ├── GetIndexInfoTool.cs        # Includes queue state
    └── CheckChangesTool.cs

Data Storage

Knowledge base data is stored at %LOCALAPPDATA%\FieldCure\Mcp.Rag\{kb-id}\:

config.json — knowledge base configuration
rag.db — SQLite database (chunks, embeddings, FTS5 index, file hashes, indexing lock)

Queue and lock files at %LOCALAPPDATA%\FieldCure\Mcp.Rag\:

.deferred-queue.json — pending indexing requests
orchestrator.lock — PID lock for the queue orchestrator

Development

# Build
dotnet build

# Test
dotnet test

# Pack as dotnet tool
dotnet pack src/FieldCure.Mcp.Rag -c Release

License

MIT

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Configuration

OPENAI_API_KEY

ANTHROPIC_API_KEY

GEMINI_API_KEY

VOYAGE_API_KEY

GROQ_API_KEY

FieldCure MCP RAG Server

Built with C# and the official MCP C# SDK.

Commands

fieldcure-mcp-rag
├── serve         --base-path <path>                         # Multi-KB MCP search server (stdio)
├── exec          --path <kb-path> [--force] [--partial ...]  # Headless indexing for a single KB
├── exec-queue    --queue-file <path> [--sweep-all]           # Process deferred indexing queue
├── prune-orphans --base-path <path>                         # Delete orphan KB folders
└── smoke-ocr     --pdf <scanned.pdf>                        # Self-test: OCR a scanned PDF (Windows)

serve — read-only MCP server serving all knowledge bases under the base path. Single process handles multiple KBs via kb_id parameter. Can run while exec is indexing (SQLite WAL).
exec — scans source folders, chunks documents, contextualizes with AI, embeds, stores in SQLite. --partial re-runs only downstream stages when models change, preserving OCR output.
exec-queue — sequential orchestrator consuming a deferred indexing queue. One entry at a time, no GPU contention. --sweep-all processes deferred entries too (used at app shutdown).
prune-orphans — deletes orphan KB folders (GUID-named, no config.json). Protected folders (., _ prefix, -backup-) are never touched.
smoke-ocr — diagnostic mode. Loads a scanned PDF through the OCR fallback parser, prints recognized text to stdout, and exits 0 on a non-empty result. Surfaces DllNotFoundException / BadImageFormatException distinctly so a missing or arch-mismatched native is immediately visible. Useful for verifying that the OCR native path is wired correctly on a given host (notably win-arm64 dnx installs).

Features

Search

Hybrid BM25 + vector search with Reciprocal Rank Fusion (RRF)
BM25-only fallback when no embedding provider is configured
Korean-optimized chunking (sentence boundary, decimal protection, parenthesis-aware)
SIMD-accelerated cosine similarity via System.Numerics.Vector
FTS5 trigram index for substring and CJK-friendly keyword matching

Indexing

Incremental indexing with SHA256 change detection
AI-powered chunk contextualization with bilingual keyword enrichment (see Chunk Contextualization)
2-commit pipeline preserves expensive upstream work across embedding failures (see How Indexing Works)
Math equation extraction from DOCX/HWPX as [math: LaTeX] blocks
PDF with OCR fallback (Tesseract eng+kor) for scanned pages
Audio transcription (.mp3, .wav, .m4a, .ogg, .flac, .webm) via Whisper.net — Windows-only. Model size (Tiny→Large) is auto-selected from detected GPU/RAM/cores at startup; each transcript chunk records audio.model_size and audio.transcribed_at for future reindex auditing
Cross-process indexing lock with stale PID auto-cleanup
Orphan cleanup for deleted files

Queue Orchestrator

All indexing requests flow through start_reindex MCP tool — no direct exec spawn
Scope merge rules: full ⊃ contextualization ⊃ embedding (duplicate requests upgrade, not duplicate)
PID-based orchestrator lock with reuse defense (orchestrator.lock)
Logical KB deletion (config.json removal) + prune-orphans physical cleanup
Deferred indexing for app-shutdown batch processing (--sweep-all)

Operations

Multi-KB serve: single process serves all knowledge bases under a base path, lazy-loaded per KB
SQLite WAL mode allows search during indexing
Graceful shutdown via cancel file
Per-KB config.json with provider configuration

Integration

Ollama native — embedding via /api/embed, contextualization via /api/chat with keep_alive and num_ctx support. Requires Ollama 0.4.0+.
OpenAI-compatible — embedding via /v1/embeddings, contextualization via /v1/chat/completions. Works with OpenAI, Azure OpenAI, Groq, LM Studio, Together AI.
Gemini native — embedding via /v1beta/models/{model}:embedContent with task_type asymmetric retrieval (RETRIEVAL_DOCUMENT / RETRIEVAL_QUERY) and Matryoshka dimension truncation (768 / 1536 / 3072). gemini-embedding-2, multilingual, 8k token input.
Anthropic — contextualization via /v1/messages.
API keys via environment variables — OPENAI_API_KEY, ANTHROPIC_API_KEY, etc. Batch indexing commands (exec, exec-queue) are env-var-only. Interactive MCP search can fall back to MCP elicitation when the client supports it.
Standard MCP stdio transport (JSON-RPC over stdin/stdout)

Chunk Contextualization

The result is stored alongside the original chunk text:

Original text is preserved for accurate retrieval display
Contextualized text is what gets embedded and indexed in BM25
Bilingual keywords enable cross-lingual search — a Korean query can retrieve English documents and vice versa

This is enabled by setting contextualizer in config.json. It can be disabled (set provider/model to empty) if you prefer raw chunk indexing.

How Indexing Works

The exec command runs a 5-stage pipeline per file:

Extract — text from document (DOCX, PDF OCR, audio transcription, etc.)
Chunk — split into ~1000 char windows
Contextualize — LLM enrichment (optional, see above)
Embed — vector embedding via API
Persist — save to SQLite

Stages 1-3 (Extract → Chunk → Contextualize)
        ↓
[Commit 1] chunks saved as PendingEmbedding
        ↓
Stage 4 (Embed)
   ├─ success → [Commit 2a] promote chunks to Indexed
   └─ failure → chunks remain PendingEmbedding (retry next exec)

Per-Chunk Failure Isolation (Binary Split)

If a single chunk in a file exceeds the embedding model's token limit (e.g., a math-dense page in a textbook), the binary split algorithm isolates that one chunk:

EmbedBatch([0..1249])         → 400 "input[846] too long"
  ├─ EmbedBatch([0..624])     → OK (promote 625)
  └─ EmbedBatch([625..1249])  → 400
      ├─ EmbedBatch([625..937])  → 400
      │   ... (binary search narrows toward chunk 846)
      │   └─ EmbedBatch([846..846]) → 400 (mark chunk 846 Failed)
      └─ EmbedBatch([938..1249]) → OK (promote 312)

Result: 1249 chunks indexed, only chunk 846 marked Failed. The file's status becomes Degraded — partially searchable instead of completely missing.

Deferred Retry Pass

Each exec ends with a retry pass over any chunks left in PendingEmbedding state from previous runs:

Reads enriched text from DB — no OCR or contextualization re-run
Calls the embedding API only — typically seconds, not minutes
Up to 3 retries per chunk; on exhaustion, the chunk is marked Failed
Auth errors (401/403) flag the provider as unavailable and skip the rest of the pass

File States

Status	Meaning	Hash-skip behavior
`Ready`	Fully indexed	Skip if hash matches
`Degraded`	Some chunks failed (binary-split isolated)	Skip if hash matches
`PartiallyDeferred`	Chunks pending embedding retry	Main loop skips; deferred pass picks up
`Failed`	Extraction or repeated embedding failure	Skip; requires `--force` to retry
`NeedsAction`	User intervention required	Skip with separate counter

Schema Versioning

Installation

dotnet tool (recommended)

dotnet tool install -g FieldCure.Mcp.Rag

From source

git clone https://github.com/fieldcure/fieldcure-mcp-rag.git
cd fieldcure-mcp-rag
dotnet build

Requirements

.NET 8.0 Runtime or later
OCR: Windows x64 only — Tesseract OCR for scanned PDFs loads lazily on first use (Windows only). On other platforms, PDFs with embedded text work normally; scanned pages without a text layer are silently skipped.
An embedding provider (Ollama, OpenAI, etc.) — optional, BM25 search works without it
Ollama 0.4.0 or later (if using Ollama for embedding or contextualization)

Quick Start

Index a folder and search it without any embedding setup (BM25 only):

# 1. Install
dotnet tool install -g FieldCure.Mcp.Rag

# 2. Create a minimal config
$kbPath = "$env:LOCALAPPDATA\FieldCure\Mcp.Rag\demo"
New-Item -ItemType Directory -Force -Path $kbPath
@'
{
  "id": "demo",
  "name": "Demo KB",
  "sourcePaths": ["C:\\my-docs"]
}
'@ | Set-Content "$kbPath\config.json"

# 3. Index
fieldcure-mcp-rag exec --path $kbPath

# 4. Start the search server
fieldcure-mcp-rag serve --base-path "$env:LOCALAPPDATA\FieldCure\Mcp.Rag"

For full retrieval quality with semantic search and contextualization, add embedding and contextualizer blocks to config.json — see Usage below.

Usage

1. Create a knowledge base folder

%LOCALAPPDATA%\FieldCure\Mcp.Rag\{kb-id}\config.json

{
  "id": "my-kb-001",
  "name": "Project Docs",
  "created": "2026-04-03T00:00:00Z",
  "sourcePaths": ["C:\\Users\\me\\Documents\\project-docs"],
  "contextualizer": {
    "provider": "anthropic",
    "model": "claude-haiku-4-5-20251001",
    "apiKeyPreset": "Claude"
  },
  "embedding": {
    "provider": "openai",
    "model": "text-embedding-3-small",
    "apiKeyPreset": "OpenAI"
  }
}

API keys are resolved from environment variables: apiKeyPreset: "OpenAI" → OPENAI_API_KEY, "Claude" → ANTHROPIC_API_KEY, "Gemini" (or "Google") → GEMINI_API_KEY.

Gemini embedding example — asymmetric retrieval with 1536-dim Matryoshka truncation (50% storage of full 3072 with identical MTEB score):

"embedding": {
  "provider": "gemini",
  "model": "gemini-embedding-2",
  "apiKeyPreset": "Gemini",
  "dimension": 1536
}

Dimension	MTEB	Storage	Use case
768	67.99	25%	Storage-constrained
1536	68.17	50%	Recommended default
3072	68.17	100%	Maximum quality (pre-normalized)
In `serve` mode, `search_documents` can also prompt via MCP elicitation when the client supports it. In `exec` and `exec-queue`, missing keys must be provided via environment variables.

2. Index documents

fieldcure-mcp-rag exec --path "C:\Users\me\AppData\Local\FieldCure\Mcp.Rag\my-kb-001"

3. Start MCP search server

fieldcure-mcp-rag serve --base-path "C:\Users\me\AppData\Local\FieldCure\Mcp.Rag"

A single serve process handles all knowledge bases under the base path. Tools accept a kb_id parameter to target a specific KB.

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "rag": {
      "command": "fieldcure-mcp-rag",
      "args": ["serve", "--base-path", "C:\\Users\\me\\AppData\\Local\\FieldCure\\Mcp.Rag"],
      "env": {
        "OPENAI_API_KEY": "sk-...",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}

config.json Reference

Field	Description
`id`	Knowledge base identifier
`name`	Display name
`sourcePaths`	List of folders to index (multiple supported)
`contextualizer.provider`	`"anthropic"`, `"openai"`, `"ollama"`, or empty to disable
`embedding.provider`	`"openai"`, `"ollama"`, `"gemini"`, or empty to disable
`embedding.dimension`	Output dimension. `0` = provider default. Gemini supports MRL truncation: 768 / 1536 / 3072.
`contextualizer.model`	Model ID, or empty to disable contextualization
`contextualizer.apiKeyPreset`	Maps to env var: `"OpenAI"` → `OPENAI_API_KEY`, `"Claude"` → `ANTHROPIC_API_KEY`
`contextualizer.baseUrl`	API base URL override (null = provider default)
`embedding.*`	Same structure as contextualizer
`embedding.maxChunkChars`	Max chars per chunk before pre-split (default: 4000)
`embedding.batchSize`	Max chunks per embedding API call (default: auto from provider table)
`embedding.keepAlive`	Ollama only: VRAM retention duration (default: `"5m"`)
`embedding.numCtx`	Ollama only: context window tokens (default: 8192). Contextualizer only.
`systemPrompt`	Custom system prompt for contextualization (null = built-in default)

Tools

All tools (except list_knowledge_bases) require a kb_id parameter to specify the target knowledge base.

Tool	Description
`list_knowledge_bases`	List all available KBs with status (file/chunk counts, indexing status)
`search_documents`	Hybrid BM25 + vector search with RRF. Supports `search_mode`: `auto`, `bm25`, `vector`
`get_document_chunk`	Retrieve full content of a specific chunk by ID
`start_reindex`	Queue an indexing request. Scope merge, force/deferred flags, orchestrator auto-spawn
`cancel_reindex`	Remove a pending (not-yet-started) queue entry
`get_index_info`	Index metadata, queue state (status/position/deferred/last_error), contextualization health
`check_changes`	Dry-run filesystem scan. Lightweight, no API calls

Search Modes

`search_mode`	Behavior
`auto`	Hybrid when embedding available, else BM25. Recommended
`bm25`	Keyword-only (FTS5). No embedding call
`vector`	Semantic-only. Errors if no embedding provider

Supported Formats

Document formats are provided by FieldCure.DocumentParsers:

DOCX — Microsoft Word (with math equation extraction)
HWPX — Korean standard document (OWPML, with math equation extraction)
XLSX — Excel spreadsheets
PPTX — PowerPoint presentations
PDF — PDF text extraction with ## Page N headers; OCR fallback for scanned pages (Tesseract, eng+kor)
TXT, MD — Plain text / Markdown

Project Structure

src/FieldCure.Mcp.Rag/
├── Program.cs                     # CLI entry (exec | exec-queue | serve | prune-orphans)
├── MultiKbContext.cs              # Multi-KB manager (lazy load, Classify, lazy unload)
├── ExecQueueRunner.cs             # Deferred queue orchestrator
├── OrphanCleanupRunner.cs         # prune-orphans CLI
├── Configuration/
│   ├── RagConfig.cs               # config.json model (KeepAlive, NumCtx fields)
│   └── OllamaDefaults.cs          # Shared defaults (KeepAlive="5m", NumCtx=8192)
├── Indexing/
│   ├── IndexingEngine.cs          # 5-stage pipeline (2-commit model)
│   └── EmbeddingBatchSplitter.cs  # Binary-split per-chunk failure isolation
├── Contextualization/
│   ├── IChunkContextualizer.cs
│   ├── OpenAiChunkContextualizer.cs   # /v1/chat/completions
│   ├── OllamaChunkContextualizer.cs   # /api/chat (keep_alive + num_ctx)
│   ├── AnthropicChunkContextualizer.cs
│   └── NullChunkContextualizer.cs
├── Embedding/
│   ├── IEmbeddingProvider.cs
│   ├── OpenAiCompatibleEmbeddingProvider.cs  # /v1/embeddings
│   ├── OllamaEmbeddingProvider.cs            # /api/embed (keep_alive)
│   ├── NullEmbeddingProvider.cs
│   └── EmbeddingBatchSizes.cs
├── Storage/
│   └── SqliteVectorStore.cs       # SQLite + FTS5 + SIMD cosine similarity
├── Search/
│   ├── HybridSearcher.cs          # BM25 + Vector → RRF
│   └── RrfFusion.cs
├── Chunking/
│   ├── TextChunker.cs
│   └── ChunkLimits.cs
└── Tools/
    ├── ListKnowledgeBasesTool.cs
    ├── SearchDocumentsTool.cs
    ├── GetDocumentChunkTool.cs
    ├── StartReindexTool.cs        # Queue entry point + orchestrator spawn
    ├── CancelReindexTool.cs       # Remove pending queue entry
    ├── GetIndexInfoTool.cs        # Includes queue state
    └── CheckChangesTool.cs

Data Storage

Knowledge base data is stored at %LOCALAPPDATA%\FieldCure\Mcp.Rag\{kb-id}\:

config.json — knowledge base configuration
rag.db — SQLite database (chunks, embeddings, FTS5 index, file hashes, indexing lock)

Queue and lock files at %LOCALAPPDATA%\FieldCure\Mcp.Rag\:

.deferred-queue.json — pending indexing requests
orchestrator.lock — PID lock for the queue orchestrator

Development

# Build
dotnet build

# Test
dotnet test

# Pack as dotnet tool
dotnet pack src/FieldCure.Mcp.Rag -c Release

License

MIT

Rag

FieldCure MCP RAG Server

Commands

Features

Search

Indexing

Queue Orchestrator

Operations

Integration

Chunk Contextualization

How Indexing Works

Per-Chunk Failure Isolation (Binary Split)

Deferred Retry Pass

File States

Schema Versioning

Installation

dotnet tool (recommended)

From source

Requirements

Quick Start

Usage

1. Create a knowledge base folder

2. Index documents

3. Start MCP search server

Claude Desktop

config.json Reference

Tools

Search Modes

Supported Formats

Project Structure

Data Storage

Development

See Also

License

Configuration

Rag

FieldCure MCP RAG Server

Commands

Features

Search

Indexing

Queue Orchestrator

Operations

Integration

Chunk Contextualization

How Indexing Works

Per-Chunk Failure Isolation (Binary Split)

Deferred Retry Pass

File States

Schema Versioning

Installation

dotnet tool (recommended)

From source

Requirements

Quick Start

Usage

1. Create a knowledge base folder

2. Index documents

3. Start MCP search server

Claude Desktop

config.json Reference

Tools

Search Modes

Supported Formats

Project Structure

Data Storage

Development

See Also

License

Configuration

Related AI & LLM Tools MCP Servers

Related AI & LLM Tools MCP Servers