A production-grade RAG implementation that handles the messy parts: multi-format ingestion (DOCX, PDF with OCR, audio via Whisper), Korean-optimized chunking, and hybrid BM25 plus vector search with RRF. The indexing pipeline uses AI-powered chunk contextualization to generate bilingual keywords and preserve document context across chunking boundaries. A two-commit model saves expensive OCR and transcription work before hitting embedding APIs, so a rate limit or crash doesn't force you to re-process a 600-page scan. Supports Ollama, OpenAI, Gemini, and Anthropic for both embedding and contextualization. Exposes search tools over MCP stdio and runs headless indexing via CLI. Handles multiple knowledge bases in a single server process with SQLite WAL for concurrent read access during writes.
A Model Context Protocol (MCP) server for indexing and searching local document collections. Supports DOCX, HWPX, PDF (with OCR), Excel, PowerPoint, and audio (Whisper transcription, Windows-only), with hybrid keyword + semantic search optimized for Korean and English.
Built with C# and the official MCP C# SDK.
fieldcure-mcp-rag
├── serve --base-path <path> # Multi-KB MCP search server (stdio)
├── exec --path <kb-path> [--force] [--partial ...] # Headless indexing for a single KB
├── exec-queue --queue-file <path> [--sweep-all] # Process deferred indexing queue
├── prune-orphans --base-path <path> # Delete orphan KB folders
└── smoke-ocr --pdf <scanned.pdf> # Self-test: OCR a scanned PDF (Windows)
kb_id parameter. Can run while exec is indexing (SQLite WAL).--partial re-runs only downstream stages when models change, preserving OCR output.--sweep-all processes deferred entries too (used at app shutdown).., _ prefix, -backup-) are never touched.0 on a non-empty result. Surfaces DllNotFoundException / BadImageFormatException distinctly so a missing or arch-mismatched native is immediately visible. Useful for verifying that the OCR native path is wired correctly on a given host (notably win-arm64 dnx installs).System.Numerics.Vector[math: LaTeX] blocks.mp3, .wav, .m4a, .ogg, .flac, .webm) via Whisper.net — Windows-only. Model size (Tiny→Large) is auto-selected from detected GPU/RAM/cores at startup; each transcript chunk records audio.model_size and audio.transcribed_at for future reindex auditingstart_reindex MCP tool — no direct exec spawnorchestrator.lock)prune-orphans physical cleanup--sweep-all)cancel fileconfig.json with provider configuration/api/embed, contextualization via /api/chat with keep_alive and num_ctx support. Requires Ollama 0.4.0+./v1/embeddings, contextualization via /v1/chat/completions. Works with OpenAI, Azure OpenAI, Groq, LM Studio, Together AI./v1beta/models/{model}:embedContent with task_type asymmetric retrieval (RETRIEVAL_DOCUMENT / RETRIEVAL_QUERY) and Matryoshka dimension truncation (768 / 1536 / 3072). gemini-embedding-2, multilingual, 8k token input./v1/messages.OPENAI_API_KEY, ANTHROPIC_API_KEY, etc. Batch indexing commands (exec, exec-queue) are env-var-only. Interactive MCP search can fall back to MCP elicitation when the client supports it.Standard RAG chunking loses context — a sentence about "the protocol" becomes ambiguous when ripped from its surrounding paragraphs. This server addresses that with Unified Chunk Contextualization: a single LLM call per chunk that produces both contextual framing and bilingual (Korean + English) keywords in one pass.
The result is stored alongside the original chunk text:
This is enabled by setting contextualizer in config.json. It can be disabled (set provider/model to empty) if you prefer raw chunk indexing.
The exec command runs a 5-stage pipeline per file:
For large files, Stage 1 alone can take 20+ minutes — OCR on a 596-page scanned PDF, or Whisper transcription of a multi-hour audio recording. The first audio file in any KB also pays a one-time ggml model download (cached under {UserProfile}/.fieldcure/whisper-models/). To prevent expensive upstream work from being lost when later stages fail, the pipeline uses a 2-commit model:
Stages 1-3 (Extract → Chunk → Contextualize)
↓
[Commit 1] chunks saved as PendingEmbedding
↓
Stage 4 (Embed)
├─ success → [Commit 2a] promote chunks to Indexed
└─ failure → chunks remain PendingEmbedding (retry next exec)
Why this matters: A 25-minute OCR result is persisted on disk before any embedding API call. If Stage 4 fails (network error, rate limit, token limit, process crash, even power loss), the chunks survive. The next exec hash-skips the file (no OCR re-run) and the deferred retry pass attempts only Stage 4.
If a single chunk in a file exceeds the embedding model's token limit (e.g., a math-dense page in a textbook), the binary split algorithm isolates that one chunk:
EmbedBatch([0..1249]) → 400 "input[846] too long"
├─ EmbedBatch([0..624]) → OK (promote 625)
└─ EmbedBatch([625..1249]) → 400
├─ EmbedBatch([625..937]) → 400
│ ... (binary search narrows toward chunk 846)
│ └─ EmbedBatch([846..846]) → 400 (mark chunk 846 Failed)
└─ EmbedBatch([938..1249]) → OK (promote 312)
Result: 1249 chunks indexed, only chunk 846 marked Failed. The file's status becomes Degraded — partially searchable instead of completely missing.
Each exec ends with a retry pass over any chunks left in PendingEmbedding state from previous runs:
Failed| Status | Meaning | Hash-skip behavior |
|---|---|---|
Ready | Fully indexed | Skip if hash matches |
Degraded | Some chunks failed (binary-split isolated) | Skip if hash matches |
PartiallyDeferred | Chunks pending embedding retry | Main loop skips; deferred pass picks up |
Failed | Extraction or repeated embedding failure | Skip; requires --force to retry |
NeedsAction | User intervention required | Skip with separate counter |
Each KB DB carries a PRAGMA user_version tag. The exec command migrates older schemas automatically as part of InitializeSchema(). The serve command opens DBs read-only and never triggers migration — older-schema KBs continue to serve search queries correctly while their new-feature columns remain unused.
dotnet tool install -g FieldCure.Mcp.Rag
git clone https://github.com/fieldcure/fieldcure-mcp-rag.git
cd fieldcure-mcp-rag
dotnet build
Index a folder and search it without any embedding setup (BM25 only):
# 1. Install
dotnet tool install -g FieldCure.Mcp.Rag
# 2. Create a minimal config
$kbPath = "$env:LOCALAPPDATA\FieldCure\Mcp.Rag\demo"
New-Item -ItemType Directory -Force -Path $kbPath
@'
{
"id": "demo",
"name": "Demo KB",
"sourcePaths": ["C:\\my-docs"]
}
'@ | Set-Content "$kbPath\config.json"
# 3. Index
fieldcure-mcp-rag exec --path $kbPath
# 4. Start the search server
fieldcure-mcp-rag serve --base-path "$env:LOCALAPPDATA\FieldCure\Mcp.Rag"
For full retrieval quality with semantic search and contextualization, add embedding and contextualizer blocks to config.json — see Usage below.
%LOCALAPPDATA%\FieldCure\Mcp.Rag\{kb-id}\config.json
{
"id": "my-kb-001",
"name": "Project Docs",
"created": "2026-04-03T00:00:00Z",
"sourcePaths": ["C:\\Users\\me\\Documents\\project-docs"],
"contextualizer": {
"provider": "anthropic",
"model": "claude-haiku-4-5-20251001",
"apiKeyPreset": "Claude"
},
"embedding": {
"provider": "openai",
"model": "text-embedding-3-small",
"apiKeyPreset": "OpenAI"
}
}
API keys are resolved from environment variables: apiKeyPreset: "OpenAI" → OPENAI_API_KEY, "Claude" → ANTHROPIC_API_KEY, "Gemini" (or "Google") → GEMINI_API_KEY.
Gemini embedding example — asymmetric retrieval with 1536-dim Matryoshka truncation (50% storage of full 3072 with identical MTEB score):
"embedding": {
"provider": "gemini",
"model": "gemini-embedding-2",
"apiKeyPreset": "Gemini",
"dimension": 1536
}
| Dimension | MTEB | Storage | Use case |
|---|---|---|---|
| 768 | 67.99 | 25% | Storage-constrained |
| 1536 | 68.17 | 50% | Recommended default |
| 3072 | 68.17 | 100% | Maximum quality (pre-normalized) |
In serve mode, search_documents can also prompt via MCP elicitation when the client supports it. In exec and exec-queue, missing keys must be provided via environment variables. |
fieldcure-mcp-rag exec --path "C:\Users\me\AppData\Local\FieldCure\Mcp.Rag\my-kb-001"
fieldcure-mcp-rag serve --base-path "C:\Users\me\AppData\Local\FieldCure\Mcp.Rag"
A single serve process handles all knowledge bases under the base path. Tools accept a kb_id parameter to target a specific KB.
Add to claude_desktop_config.json:
{
"mcpServers": {
"rag": {
"command": "fieldcure-mcp-rag",
"args": ["serve", "--base-path", "C:\\Users\\me\\AppData\\Local\\FieldCure\\Mcp.Rag"],
"env": {
"OPENAI_API_KEY": "sk-...",
"ANTHROPIC_API_KEY": "sk-ant-..."
}
}
}
}
| Field | Description |
|---|---|
id | Knowledge base identifier |
name | Display name |
sourcePaths | List of folders to index (multiple supported) |
contextualizer.provider | "anthropic", "openai", "ollama", or empty to disable |
embedding.provider | "openai", "ollama", "gemini", or empty to disable |
embedding.dimension | Output dimension. 0 = provider default. Gemini supports MRL truncation: 768 / 1536 / 3072. |
contextualizer.model | Model ID, or empty to disable contextualization |
contextualizer.apiKeyPreset | Maps to env var: "OpenAI" → OPENAI_API_KEY, "Claude" → ANTHROPIC_API_KEY |
contextualizer.baseUrl | API base URL override (null = provider default) |
embedding.* | Same structure as contextualizer |
embedding.maxChunkChars | Max chars per chunk before pre-split (default: 4000) |
embedding.batchSize | Max chunks per embedding API call (default: auto from provider table) |
embedding.keepAlive | Ollama only: VRAM retention duration (default: "5m") |
embedding.numCtx | Ollama only: context window tokens (default: 8192). Contextualizer only. |
systemPrompt | Custom system prompt for contextualization (null = built-in default) |
All tools (except list_knowledge_bases) require a kb_id parameter to specify the target knowledge base.
| Tool | Description |
|---|---|
list_knowledge_bases | List all available KBs with status (file/chunk counts, indexing status) |
search_documents | Hybrid BM25 + vector search with RRF. Supports search_mode: auto, bm25, vector |
get_document_chunk | Retrieve full content of a specific chunk by ID |
start_reindex | Queue an indexing request. Scope merge, force/deferred flags, orchestrator auto-spawn |
cancel_reindex | Remove a pending (not-yet-started) queue entry |
get_index_info | Index metadata, queue state (status/position/deferred/last_error), contextualization health |
check_changes | Dry-run filesystem scan. Lightweight, no API calls |
search_mode | Behavior |
|---|---|
auto | Hybrid when embedding available, else BM25. Recommended |
bm25 | Keyword-only (FTS5). No embedding call |
vector | Semantic-only. Errors if no embedding provider |
Document formats are provided by FieldCure.DocumentParsers:
## Page N headers; OCR fallback for scanned pages (Tesseract, eng+kor)src/FieldCure.Mcp.Rag/
├── Program.cs # CLI entry (exec | exec-queue | serve | prune-orphans)
├── MultiKbContext.cs # Multi-KB manager (lazy load, Classify, lazy unload)
├── ExecQueueRunner.cs # Deferred queue orchestrator
├── OrphanCleanupRunner.cs # prune-orphans CLI
├── Configuration/
│ ├── RagConfig.cs # config.json model (KeepAlive, NumCtx fields)
│ └── OllamaDefaults.cs # Shared defaults (KeepAlive="5m", NumCtx=8192)
├── Indexing/
│ ├── IndexingEngine.cs # 5-stage pipeline (2-commit model)
│ └── EmbeddingBatchSplitter.cs # Binary-split per-chunk failure isolation
├── Contextualization/
│ ├── IChunkContextualizer.cs
│ ├── OpenAiChunkContextualizer.cs # /v1/chat/completions
│ ├── OllamaChunkContextualizer.cs # /api/chat (keep_alive + num_ctx)
│ ├── AnthropicChunkContextualizer.cs
│ └── NullChunkContextualizer.cs
├── Embedding/
│ ├── IEmbeddingProvider.cs
│ ├── OpenAiCompatibleEmbeddingProvider.cs # /v1/embeddings
│ ├── OllamaEmbeddingProvider.cs # /api/embed (keep_alive)
│ ├── NullEmbeddingProvider.cs
│ └── EmbeddingBatchSizes.cs
├── Storage/
│ └── SqliteVectorStore.cs # SQLite + FTS5 + SIMD cosine similarity
├── Search/
│ ├── HybridSearcher.cs # BM25 + Vector → RRF
│ └── RrfFusion.cs
├── Chunking/
│ ├── TextChunker.cs
│ └── ChunkLimits.cs
└── Tools/
├── ListKnowledgeBasesTool.cs
├── SearchDocumentsTool.cs
├── GetDocumentChunkTool.cs
├── StartReindexTool.cs # Queue entry point + orchestrator spawn
├── CancelReindexTool.cs # Remove pending queue entry
├── GetIndexInfoTool.cs # Includes queue state
└── CheckChangesTool.cs
Knowledge base data is stored at %LOCALAPPDATA%\FieldCure\Mcp.Rag\{kb-id}\:
config.json — knowledge base configurationrag.db — SQLite database (chunks, embeddings, FTS5 index, file hashes, indexing lock)Queue and lock files at %LOCALAPPDATA%\FieldCure\Mcp.Rag\:
.deferred-queue.json — pending indexing requestsorchestrator.lock — PID lock for the queue orchestrator# Build
dotnet build
# Test
dotnet test
# Pack as dotnet tool
dotnet pack src/FieldCure.Mcp.Rag -c Release
Part of the AssistStudio ecosystem.
OPENAI_API_KEYANTHROPIC_API_KEYGEMINI_API_KEYVOYAGE_API_KEYGROQ_API_KEYio.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent