CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Enterprise Internal Knowledge Base: Production-Ready RAG + MCP

kimsb2429/internal-knowledge-base
registry active
Summary

This is a reference implementation for enterprise RAG that treats eval scores and observability as first-class CI artifacts. It exposes three MCP primitives: a query tool that hits pgvector with optional reranking, document resources keyed by source_id, and a cite_from_chunks prompt. The real value is the merge gate: every PR runs a 110-question golden set in CI and blocks if top-k recall regresses more than 5pp. Langfuse traces are public and shareable. The corpus is 238 VA education manuals already chunked and embedded in a fixture dump, so you can clone and immediately test prompt changes against known-good baselines. Reach for this when you need a working example of how to stop RAG quality from silently degrading between deploys.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

Enterprise Internal Knowledge Base — Production-Ready RAG + MCP

A public Retrieval-Augmented Generation pipeline exposed as an MCP server. Sample content from Veterans Affairs education manuals.

The repo implements evaluation, observability, and structure-aware ingestion. Cost/latency tuning, tenant-level access control, and other production concerns are discussed in the article linked below.

📖 Full writeup in Towards AI: Enterprise Internal Knowledge Base RAG MCP: POC-to-Production


Why this exists

RAG demos tend to focus on the quality of the retrieval pipeline, without recognizing that production RAG fails on the next ten steps: prompt or model changes that pass code review but tank answer quality, cost and latency drift that cannot be traced to specific queries, cross-tenant leakage that only surfaces in audit. This repo shows what catching them looks like in practice.

The corpus is public (VA Education manuals — 238 documents, 9,000+ chunks) so anyone can clone, run, and adapt the pipeline.


Quickstart

git clone https://github.com/kimsb2429/internal-knowledge-base
cd internal-knowledge-base

# 1. Start Postgres + pgvector
docker compose up -d

# 2. Python env + dependencies
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 3. Restore corpus fixture (~2 min — 238 docs + 9k chunks pre-embedded)
docker exec -i ikb_pgvector pg_restore -U ikb -d ikb < evals/fixture_v1.dump

# 4. Smoke-test the MCP server
python scripts/test_mcp_server.py     # 7/7 tests pass

# 5. Start the MCP server (stdio transport)
python scripts/mcp_server.py

Consuming from Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "ikb": {
      "command": "python",
      "args": ["/absolute/path/to/internal-knowledge-base/scripts/mcp_server.py"]
    }
  }
}

Then ask Claude things like "What RPO handles GI Bill claims in Texas?" — the MCP server returns ranked chunks with citations.


Architecture

Ingestion (one-time per corpus):

graph LR
    A[KnowVA crawler<br/>HTML + PDF] --> B[Source-specific<br/>preprocessor]
    B --> C[Structure-aware<br/>chunker]
    C --> D[mxbai-embed-large<br/>local, 1024-dim]
    D --> E[(pgvector)]
    F[Anthropic Contextual<br/>Retrieval] -.-> E
    E -.-> F
    style E fill:#e1f5fe

Query (per MCP tool call):

graph LR
    A[Claude Desktop<br/>MCP client] --> B[FastMCP server]
    B --> C[pgvector top-K]
    C --> D[Reranker<br/>mxbai or FlashRank]
    D --> E[Claude Sonnet<br/>generation]
    E --> A
    E --> F[Langfuse trace]
    style F fill:#fff9c4

Stack:

  • Vector store: Postgres + pgvector (Docker, port 5433); content_tsv GIN index for hybrid-ready
  • Embeddings: mxbai-embed-large (1024 dims, local via sentence-transformers) — $0 API cost
  • Reranker: mxbai-rerank-base-v2 (full eval) / FlashRank MiniLM (CI fast mode, 22M ONNX, ~2s/query)
  • Generation: Claude Sonnet
  • MCP server: FastMCP 3.2.4 — Tools (query), Resources (document://{source_id}), Prompts (cite_from_chunks)
  • Observability: Langfuse Cloud, per-trace public sharing
  • Eval: DeepEval + 110-query golden set + GitHub Actions merge gate

Eval scores

Full 110-question golden set, contextualized chunks + reranker:

MetricScore
Faithfulness0.95
Answer Relevance0.91
Context Precision0.61
Context Recall0.52
Context Relevance0.56

🔗 Live Langfuse trace (public, no login).

Notable result: Anthropic's Contextual Retrieval pattern produced modest lift on top of reranking (+4.8pp AnsRel, +4.1pp CtxPrec) at this scale — well short of the +35% recall their published numbers suggested. Reported as found; juiced numbers would defeat the point.


Eval-in-CI as a merge gate

Every PR runs the golden set in fast mode (FlashRank reranker, ~3-4 min wall, $0.30 in Sonnet calls) against a fixture DB. PRs that regress more than ±5pp on top1/topk/keyword_recall, or +10pp on idk_rate, are blocked.

Forever-artifact: PR #5 — a deliberate failing-then-passing PR. Red CI catches a 20pp top1 regression; green CI confirms the fix. The Actions tab is the proof.

Workflow: .github/workflows/eval-gate.yml.


What this repo doesn't cover

A few production-shape items are seams, not implementations:

  • Multi-tenant scoping — auth_context parameter present on every MCP tool, typed, currently unused (labels the SSO/ACL seam)
  • Ingestion concurrency — single-threaded chunker + embedder; production would use a modulus-distributed worker pool
  • Hybrid search wiring — content_tsv GIN index is live; BM25 + RRF fusion at query time stays a post-launch addition

The writeup linked above covers these topics.


Repo layout

docs/                    Research, evidence base, deep-dives
data/                    Crawled corpus + golden query set
scripts/
  crawl_knowva.py            eGain v11 API crawler
  enrich_metadata.py         Headings, ACL, authority tier, content_category
  knowva_preprocess.py       Source-specific HTML normalization
  chunk_documents.py         Structure-aware splitter (preserves table colspan/rowspan)
  embed_and_store.py         mxbai-embed-large → pgvector
  contextualize_chunks.py    Anthropic Batches API for Contextual Retrieval
  rerank.py                  mxbai-rerank + FlashRank
  retrieve.py / generate.py  RAG path
  mcp_server.py              FastMCP exposure
  run_eval.py / score_eval.py / check_regression.py   Eval harness + CI gate
evals/                   Fixture DB dump + baseline JSON
.github/workflows/       eval-gate.yml — merge-gate workflow

Reproducing from raw corpus (~30 min)

Each script is idempotent and resume-safe.

python scripts/crawl_knowva.py            # Crawl raw HTML (skip if data/knowva_manuals/articles/ exists)
python scripts/enrich_metadata.py         # Add headings, ACL, authority tier
python scripts/knowva_preprocess.py       # Normalize HTML quirks
python scripts/chunk_documents.py         # Structure-aware split
python scripts/embed_and_store.py         # mxbai → pgvector
python scripts/contextualize_chunks.py    # Anthropic Batches API (~$12, optional but recommended)

Then python scripts/run_eval.py --fast to verify the eval baseline reproduces.


Further reading

  • Full demo writeup: Enterprise Internal Knowledge Base RAG MCP: POC-to-Production (Towards AI, Medium)
  • docs/2026-04-11-engineering-rag-evidence-and-howtos.md — engineering analysis, evidence base, Zero-to-MCP plan
  • docs/2026-04-12-rag-pipeline-buy-vs-build.md — buy-vs-build map per pipeline stage
  • docs/deep-dive/2026-04-16-docs-vs-code-rag-adjudication.md — when unified RAG stops working

License

MIT — see LICENSE.

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Categories
AI & LLM ToolsDocuments & Knowledge
Registryactive
UpdatedApr 22, 2026
View on GitHub

Related AI & LLM Tools MCP Servers

View all →
SkillFM LLM Cost Optimizer

io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage

LLM cost optimizer for OpenAI, Anthropic, token usage, BYOK, and SkillFM Beacon audits.
Llm Orchestration Agent

io.github.mikerawsonnz/llm-orchestration-agent

Run a prompt through a LangChain (system + human) chain over Gemini on Vertex AI; optional LangSmith
Authenticated Llm Agent

io.github.mikerawsonnz/authenticated-llm-agent

JWT-gated LLM gateway: authenticate (bcrypt/JWT), then run a LangChain-on-Vertex Gemini completion.
Copilot Memory MCP

labforgedev/copilot-memory-mcp

Persistent semantic memory for AI agents using local ChromaDB vector search. No cloud required.
1
Agent Prompt Injection Firewall Mcp

csoai-org/agent-prompt-injection-firewall-mcp

The WAF for agents. Pattern-based + heuristic firewall scans prompts, RAG documents, tool argume...
Authenticated Multi Llm Agent

io.github.mikerawsonnz/authenticated-multi-llm-agent

Google-OAuth-gated LLM gateway: verify a Google ID token, then run a Gemini (Vertex AI) completion f