CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Haiku Rag

ggozad/haiku.rag
538STDIOregistry active
Summary

Haiku.Rag is an agentic retrieval-augmented generation (RAG) system built on LanceDB, Pydantic AI, and Docling that provides hybrid search (vector and full-text), question answering with citations, multi-agent research workflows, and code execution capabilities for complex analytical tasks. The server exposes these capabilities as MCP tools for AI assistants, supporting multiple embedding and LLM providers with local-first operation, document structure awareness, and visual grounding on original page images. It solves the problem of enabling AI systems to perform intelligent document analysis, citation-aware question answering, and multi-step research tasks across document collections through a unified CLI, Python API, and MCP interface.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

Haiku RAG

Tests codecov

Agentic RAG built on LanceDB, Pydantic AI, and Docling.

New: vision and multimodal search. Picture-aware ingestion captures embedded figure bytes; vision-capable QA models receive them alongside text. Multimodal embedders put picture vectors in the same space as text, enabling text-as-query → figure hits and image-as-query retrieval.

Features

  • Hybrid search — Vector + full-text with Reciprocal Rank Fusion
  • Multimodal & cross-modal search — Multimodal embedders (vLLM, VoyageAI, Cohere) put picture vectors in the same space as text; supports text-as-query → figure hits and image-as-query
  • Question answering — RAG skill with citations (page numbers, section headings)
  • Vision QA — Vision-capable models receive figure bytes alongside chunk text
  • Reranking — MxBAI, Cohere, Zero Entropy, or vLLM
  • Analysis skill — Complex analytical tasks via sandboxed Python code execution (aggregation, computation, multi-document analysis)
  • Conversational RAG — Chat TUI and web application for multi-turn conversations with session memory
  • Document structure — Stores full DoclingDocument, enabling structure-aware context expansion
  • Multiple providers — Embeddings: Ollama, OpenAI, VoyageAI, Cohere, LM Studio, vLLM (multimodal via multimodal: true on vLLM/VoyageAI/Cohere). QA: any model supported by Pydantic AI
  • Local-first — Embedded LanceDB, no servers required. Also supports S3, GCS, Azure, and LanceDB Cloud
  • CLI & Python API — Full functionality from command line or code
  • MCP server — Expose as tools for AI assistants (Claude Desktop, etc.)
  • Visual grounding — View chunks highlighted on original page images
  • Production ingester — Long-lived haiku-ingester service with persistent SQLite queue, async worker pool with retries and a dead-letter queue, FS / HTTP / S3 / WebDAV source adapters, FastAPI control plane, and a browser dashboard for operators. See docs/ingester.md.
  • Time travel — Query the database at any historical point with --before
  • Inspector — TUI for browsing documents, chunks, and search results

Installation

Python 3.12 or newer required

Full Package (Recommended)

pip install haiku.rag

Includes all features: document processing, all embedding providers, and rerankers.

Using uv? uv pip install haiku.rag

Slim Package (Minimal Dependencies)

pip install haiku.rag-slim

Install only the extras you need. See the Installation documentation for available options.

Quick Start

Note: Requires an embedding provider (Ollama, OpenAI, etc.). See the Tutorial for setup instructions.

# Index a PDF
haiku-rag add-src paper.pdf

# Search
haiku-rag search "attention mechanism"

# Ask questions with citations
haiku-rag ask "What datasets were used for evaluation?"

# Analyze — complex analytical tasks via code execution
haiku-rag analyze "How many documents mention transformers?"

# Interactive chat — multi-turn conversations with memory
haiku-rag chat

# Continuously ingest from configured sources (FS, HTTP, S3, WebDAV)
haiku-ingester serve

See Configuration for customization options.

Python API

from haiku.rag.client import HaikuRAG

async with HaikuRAG("knowledge.lancedb", create=True) as rag:
    # Index documents
    await rag.create_document_from_source("paper.pdf")
    await rag.create_document_from_source("https://arxiv.org/pdf/1706.03762")

    # Search — returns chunks with provenance
    results = await rag.search("self-attention")
    for result in results:
        print(f"{result.score:.2f} | p.{result.page_numbers} | {result.content[:100]}")

    # QA with citations
    answer, citations = await rag.ask("What is the complexity of self-attention?")
    print(answer)
    for cite in citations:
        print(f"  [{cite.chunk_id}] p.{cite.page_numbers}: {cite.content[:80]}")

For details on the skills the client wraps, see the Skills docs.

MCP Server

Use with AI assistants like Claude Desktop:

haiku-rag mcp --stdio

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "haiku-rag": {
      "command": "haiku-rag",
      "args": ["mcp", "--stdio"]
    }
  }
}

Provides tools for document management, search, QA, and analysis directly in your AI assistant.

Examples

See the examples directory for working examples:

  • Docker Setup - Complete Docker deployment with continuous ingestion (haiku-ingester) and MCP server
  • Web Application - Full-stack conversational RAG with CopilotKit frontend

Documentation

Full documentation at: https://ggozad.github.io/haiku.rag/

  • Quickstart - Provider setup and first ingestion
  • Installation - Packages and extras
  • Configuration - YAML reference
  • CLI - Command reference
  • Python API - Complete API docs
  • Skills - The RAG and analysis skills the client wraps
  • Tuning - Retrieval and answer-quality tuning
  • Ingester - Production ingester for continuous indexing from FS, HTTP, S3, and WebDAV
  • MCP - Model Context Protocol integration
  • Remote processing - Offload conversion to docling-serve
  • Applications - Chat TUI, web app, and inspector
  • Benchmarks - Performance benchmarks
  • Changelog - Version history

License

This project is licensed under the MIT License.

mcp-name: io.github.ggozad/haiku-rag

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Categories
AI & LLM Tools
Registryactive
Packagehaiku-rag
TransportSTDIO
UpdatedJun 9, 2026
View on GitHub

Related AI & LLM Tools MCP Servers

View all →
SkillFM LLM Cost Optimizer

io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage

LLM cost optimizer for OpenAI, Anthropic, token usage, BYOK, and SkillFM Beacon audits.
Llm Orchestration Agent

io.github.mikerawsonnz/llm-orchestration-agent

Run a prompt through a LangChain (system + human) chain over Gemini on Vertex AI; optional LangSmith
Authenticated Llm Agent

io.github.mikerawsonnz/authenticated-llm-agent

JWT-gated LLM gateway: authenticate (bcrypt/JWT), then run a LangChain-on-Vertex Gemini completion.
Copilot Memory MCP

labforgedev/copilot-memory-mcp

Persistent semantic memory for AI agents using local ChromaDB vector search. No cloud required.
1
Agent Prompt Injection Firewall Mcp

csoai-org/agent-prompt-injection-firewall-mcp

The WAF for agents. Pattern-based + heuristic firewall scans prompts, RAG documents, tool argume...
Authenticated Multi Llm Agent

io.github.mikerawsonnz/authenticated-multi-llm-agent

Google-OAuth-gated LLM gateway: verify a Google ID token, then run a Gemini (Vertex AI) completion f