Haiku.Rag is an agentic retrieval-augmented generation (RAG) system built on LanceDB, Pydantic AI, and Docling that provides hybrid search (vector and full-text), question answering with citations, multi-agent research workflows, and code execution capabilities for complex analytical tasks. The server exposes these capabilities as MCP tools for AI assistants, supporting multiple embedding and LLM providers with local-first operation, document structure awareness, and visual grounding on original page images. It solves the problem of enabling AI systems to perform intelligent document analysis, citation-aware question answering, and multi-step research tasks across document collections through a unified CLI, Python API, and MCP interface.
Agentic RAG built on LanceDB, Pydantic AI, and Docling.
New: vision and multimodal search. Picture-aware ingestion captures embedded figure bytes; vision-capable QA models receive them alongside text. Multimodal embedders put picture vectors in the same space as text, enabling text-as-query → figure hits and image-as-query retrieval.
multimodal: true on vLLM/VoyageAI/Cohere). QA: any model supported by Pydantic AIhaiku-ingester service with persistent SQLite queue, async worker pool with retries and a dead-letter queue, FS / HTTP / S3 / WebDAV source adapters, FastAPI control plane, and a browser dashboard for operators. See docs/ingester.md.--beforePython 3.12 or newer required
pip install haiku.rag
Includes all features: document processing, all embedding providers, and rerankers.
Using uv? uv pip install haiku.rag
pip install haiku.rag-slim
Install only the extras you need. See the Installation documentation for available options.
Note: Requires an embedding provider (Ollama, OpenAI, etc.). See the Tutorial for setup instructions.
# Index a PDF
haiku-rag add-src paper.pdf
# Search
haiku-rag search "attention mechanism"
# Ask questions with citations
haiku-rag ask "What datasets were used for evaluation?"
# Analyze — complex analytical tasks via code execution
haiku-rag analyze "How many documents mention transformers?"
# Interactive chat — multi-turn conversations with memory
haiku-rag chat
# Continuously ingest from configured sources (FS, HTTP, S3, WebDAV)
haiku-ingester serve
See Configuration for customization options.
from haiku.rag.client import HaikuRAG
async with HaikuRAG("knowledge.lancedb", create=True) as rag:
# Index documents
await rag.create_document_from_source("paper.pdf")
await rag.create_document_from_source("https://arxiv.org/pdf/1706.03762")
# Search — returns chunks with provenance
results = await rag.search("self-attention")
for result in results:
print(f"{result.score:.2f} | p.{result.page_numbers} | {result.content[:100]}")
# QA with citations
answer, citations = await rag.ask("What is the complexity of self-attention?")
print(answer)
for cite in citations:
print(f" [{cite.chunk_id}] p.{cite.page_numbers}: {cite.content[:80]}")
For details on the skills the client wraps, see the Skills docs.
Use with AI assistants like Claude Desktop:
haiku-rag mcp --stdio
Add to your Claude Desktop configuration:
{
"mcpServers": {
"haiku-rag": {
"command": "haiku-rag",
"args": ["mcp", "--stdio"]
}
}
}
Provides tools for document management, search, QA, and analysis directly in your AI assistant.
See the examples directory for working examples:
haiku-ingester) and MCP serverFull documentation at: https://ggozad.github.io/haiku.rag/
This project is licensed under the MIT License.
mcp-name: io.github.ggozad/haiku-rag
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent