Haiku Rag

538STDIOregistry active

Summary

Haiku.Rag is an agentic retrieval-augmented generation (RAG) system built on LanceDB, Pydantic AI, and Docling that provides hybrid search (vector and full-text), question answering with citations, multi-agent research workflows, and code execution capabilities for complex analytical tasks. The server exposes these capabilities as MCP tools for AI assistants, supporting multiple embedding and LLM providers with local-first operation, document structure awareness, and visual grounding on original page images. It solves the problem of enabling AI systems to perform intelligent document analysis, citation-aware question answering, and multi-step research tasks across document collections through a unified CLI, Python API, and MCP interface.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Haiku RAG

Agentic RAG built on LanceDB, Pydantic AI, and Docling.

New: vision and multimodal search. Picture-aware ingestion captures embedded figure bytes; vision-capable QA models receive them alongside text. Multimodal embedders put picture vectors in the same space as text, enabling text-as-query → figure hits and image-as-query retrieval.

Features

Hybrid search — Vector + full-text with Reciprocal Rank Fusion
Multimodal & cross-modal search — Multimodal embedders (vLLM, VoyageAI, Cohere) put picture vectors in the same space as text; supports text-as-query → figure hits and image-as-query
Question answering — RAG skill with citations (page numbers, section headings)
Vision QA — Vision-capable models receive figure bytes alongside chunk text
Reranking — MxBAI, Cohere, Zero Entropy, or vLLM
Analysis skill — Complex analytical tasks via sandboxed Python code execution (aggregation, computation, multi-document analysis)
Conversational RAG — Chat TUI and web application for multi-turn conversations with session memory
Document structure — Stores full DoclingDocument, enabling structure-aware context expansion
Multiple providers — Embeddings: Ollama, OpenAI, VoyageAI, Cohere, LM Studio, vLLM (multimodal via multimodal: true on vLLM/VoyageAI/Cohere). QA: any model supported by Pydantic AI
Local-first — Embedded LanceDB, no servers required. Also supports S3, GCS, Azure, and LanceDB Cloud
CLI & Python API — Full functionality from command line or code
MCP server — Expose as tools for AI assistants (Claude Desktop, etc.)
Visual grounding — View chunks highlighted on original page images
Production ingester — Long-lived haiku-ingester service with persistent SQLite queue, async worker pool with retries and a dead-letter queue, FS / HTTP / S3 / WebDAV source adapters, FastAPI control plane, and a browser dashboard for operators. See docs/ingester.md.
Time travel — Query the database at any historical point with --before
Inspector — TUI for browsing documents, chunks, and search results

Installation

Python 3.12 or newer required

Full Package (Recommended)

pip install haiku.rag

Includes all features: document processing, all embedding providers, and rerankers.

Using uv? uv pip install haiku.rag

Slim Package (Minimal Dependencies)

pip install haiku.rag-slim

Install only the extras you need. See the Installation documentation for available options.

Quick Start

Note: Requires an embedding provider (Ollama, OpenAI, etc.). See the Tutorial for setup instructions.

# Index a PDF
haiku-rag add-src paper.pdf

# Search
haiku-rag search "attention mechanism"

# Ask questions with citations
haiku-rag ask "What datasets were used for evaluation?"

# Analyze — complex analytical tasks via code execution
haiku-rag analyze "How many documents mention transformers?"

# Interactive chat — multi-turn conversations with memory
haiku-rag chat

# Continuously ingest from configured sources (FS, HTTP, S3, WebDAV)
haiku-ingester serve

See Configuration for customization options.

Python API

from haiku.rag.client import HaikuRAG

async with HaikuRAG("knowledge.lancedb", create=True) as rag:
    # Index documents
    await rag.create_document_from_source("paper.pdf")
    await rag.create_document_from_source("https://arxiv.org/pdf/1706.03762")

    # Search — returns chunks with provenance
    results = await rag.search("self-attention")
    for result in results:
        print(f"{result.score:.2f} | p.{result.page_numbers} | {result.content[:100]}")

    # QA with citations
    answer, citations = await rag.ask("What is the complexity of self-attention?")
    print(answer)
    for cite in citations:
        print(f"  [{cite.chunk_id}] p.{cite.page_numbers}: {cite.content[:80]}")

For details on the skills the client wraps, see the Skills docs.

MCP Server

Use with AI assistants like Claude Desktop:

haiku-rag mcp --stdio

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "haiku-rag": {
      "command": "haiku-rag",
      "args": ["mcp", "--stdio"]
    }
  }
}

Provides tools for document management, search, QA, and analysis directly in your AI assistant.

Examples

See the examples directory for working examples:

Docker Setup - Complete Docker deployment with continuous ingestion (haiku-ingester) and MCP server
Web Application - Full-stack conversational RAG with CopilotKit frontend

Documentation

Full documentation at: https://ggozad.github.io/haiku.rag/

Quickstart - Provider setup and first ingestion
Installation - Packages and extras
Configuration - YAML reference
CLI - Command reference
Python API - Complete API docs
Skills - The RAG and analysis skills the client wraps
Tuning - Retrieval and answer-quality tuning
Ingester - Production ingester for continuous indexing from FS, HTTP, S3, and WebDAV
MCP - Model Context Protocol integration
Remote processing - Offload conversion to docling-serve
Applications - Chat TUI, web app, and inspector
Benchmarks - Performance benchmarks
Changelog - Version history

License

This project is licensed under the MIT License.

mcp-name: io.github.ggozad/haiku-rag

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Haiku RAG

Agentic RAG built on LanceDB, Pydantic AI, and Docling.

New: vision and multimodal search. Picture-aware ingestion captures embedded figure bytes; vision-capable QA models receive them alongside text. Multimodal embedders put picture vectors in the same space as text, enabling text-as-query → figure hits and image-as-query retrieval.

Features

Hybrid search — Vector + full-text with Reciprocal Rank Fusion
Multimodal & cross-modal search — Multimodal embedders (vLLM, VoyageAI, Cohere) put picture vectors in the same space as text; supports text-as-query → figure hits and image-as-query
Question answering — RAG skill with citations (page numbers, section headings)
Vision QA — Vision-capable models receive figure bytes alongside chunk text
Reranking — MxBAI, Cohere, Zero Entropy, or vLLM
Analysis skill — Complex analytical tasks via sandboxed Python code execution (aggregation, computation, multi-document analysis)
Conversational RAG — Chat TUI and web application for multi-turn conversations with session memory
Document structure — Stores full DoclingDocument, enabling structure-aware context expansion
Multiple providers — Embeddings: Ollama, OpenAI, VoyageAI, Cohere, LM Studio, vLLM (multimodal via multimodal: true on vLLM/VoyageAI/Cohere). QA: any model supported by Pydantic AI
Local-first — Embedded LanceDB, no servers required. Also supports S3, GCS, Azure, and LanceDB Cloud
CLI & Python API — Full functionality from command line or code
MCP server — Expose as tools for AI assistants (Claude Desktop, etc.)
Visual grounding — View chunks highlighted on original page images
Production ingester — Long-lived haiku-ingester service with persistent SQLite queue, async worker pool with retries and a dead-letter queue, FS / HTTP / S3 / WebDAV source adapters, FastAPI control plane, and a browser dashboard for operators. See docs/ingester.md.
Time travel — Query the database at any historical point with --before
Inspector — TUI for browsing documents, chunks, and search results

Installation

Python 3.12 or newer required

Full Package (Recommended)

pip install haiku.rag

Includes all features: document processing, all embedding providers, and rerankers.

Using uv? uv pip install haiku.rag

Slim Package (Minimal Dependencies)

pip install haiku.rag-slim

Install only the extras you need. See the Installation documentation for available options.

Quick Start

Note: Requires an embedding provider (Ollama, OpenAI, etc.). See the Tutorial for setup instructions.

# Index a PDF
haiku-rag add-src paper.pdf

# Search
haiku-rag search "attention mechanism"

# Ask questions with citations
haiku-rag ask "What datasets were used for evaluation?"

# Analyze — complex analytical tasks via code execution
haiku-rag analyze "How many documents mention transformers?"

# Interactive chat — multi-turn conversations with memory
haiku-rag chat

# Continuously ingest from configured sources (FS, HTTP, S3, WebDAV)
haiku-ingester serve

See Configuration for customization options.

Python API

from haiku.rag.client import HaikuRAG

async with HaikuRAG("knowledge.lancedb", create=True) as rag:
    # Index documents
    await rag.create_document_from_source("paper.pdf")
    await rag.create_document_from_source("https://arxiv.org/pdf/1706.03762")

    # Search — returns chunks with provenance
    results = await rag.search("self-attention")
    for result in results:
        print(f"{result.score:.2f} | p.{result.page_numbers} | {result.content[:100]}")

    # QA with citations
    answer, citations = await rag.ask("What is the complexity of self-attention?")
    print(answer)
    for cite in citations:
        print(f"  [{cite.chunk_id}] p.{cite.page_numbers}: {cite.content[:80]}")

For details on the skills the client wraps, see the Skills docs.

MCP Server

Use with AI assistants like Claude Desktop:

haiku-rag mcp --stdio

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "haiku-rag": {
      "command": "haiku-rag",
      "args": ["mcp", "--stdio"]
    }
  }
}

Provides tools for document management, search, QA, and analysis directly in your AI assistant.

Examples

See the examples directory for working examples:

Docker Setup - Complete Docker deployment with continuous ingestion (haiku-ingester) and MCP server
Web Application - Full-stack conversational RAG with CopilotKit frontend

Documentation

Full documentation at: https://ggozad.github.io/haiku.rag/

Quickstart - Provider setup and first ingestion
Installation - Packages and extras
Configuration - YAML reference
CLI - Command reference
Python API - Complete API docs
Skills - The RAG and analysis skills the client wraps
Tuning - Retrieval and answer-quality tuning
Ingester - Production ingester for continuous indexing from FS, HTTP, S3, and WebDAV
MCP - Model Context Protocol integration
Remote processing - Offload conversion to docling-serve
Applications - Chat TUI, web app, and inspector
Benchmarks - Performance benchmarks
Changelog - Version history

License

This project is licensed under the MIT License.

mcp-name: io.github.ggozad/haiku-rag

Haiku Rag

Haiku RAG

Features

Installation

Full Package (Recommended)

Slim Package (Minimal Dependencies)

Quick Start

Python API

MCP Server

Examples

Documentation

License

Haiku Rag

Haiku RAG

Features

Installation

Full Package (Recommended)

Slim Package (Minimal Dependencies)

Quick Start

Python API

MCP Server

Examples

Documentation

License

Related AI & LLM Tools MCP Servers

Related AI & LLM Tools MCP Servers