Embgrep

STDIOregistry active

Summary

Gives Claude four semantic search tools over your local codebase without needing API keys or vector databases. Uses fastembed with ONNX Runtime to generate embeddings and stores them in SQLite. The index_directory tool chunks code by functions and docs by headings, semantic_search runs natural language queries with cosine similarity ranking, and update_index does incremental reindexing based on SHA-256 hashes. Supports 15+ file types including Python, JavaScript, TypeScript, Rust, Go, and Markdown. Useful when you want Claude to find relevant code or documentation by meaning rather than exact string matches, especially across large projects where keyword search falls short.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

embgrep

한국어 문서 · llms.txt

Local semantic search — embedding-powered grep for files, zero external services.

Search your codebase and documentation by meaning, not just keywords. embgrep indexes files into local embeddings and lets you run semantic queries — no API keys, no cloud services, no vector database servers.

Features

Local embeddings — Uses fastembed (ONNX Runtime), no API keys needed
SQLite storage — Single-file index, no external vector DB
Incremental indexing — Only re-indexes changed files (SHA-256 hash comparison)
Smart chunking — Function-level splitting for code, heading-level for docs
MCP native — 4-tool FastMCP server for LLM agent integration
15+ file types — .py, .js, .ts, .java, .go, .rs, .md, .txt, .yaml, .json, .toml, and more

Install

pip install embgrep              # core (fastembed + numpy)
pip install embgrep[cli]         # + click/rich CLI
pip install embgrep[mcp]         # + FastMCP server
pip install embgrep[all]         # everything

Quick Start

Python API

from embgrep import EmbGrep

eg = EmbGrep()

# Index a directory
eg.index("./my-project", patterns=["*.py", "*.md"])

# Semantic search
results = eg.search("database connection pooling", top_k=5)
for r in results:
    print(f"{r.file_path}:{r.line_start}-{r.line_end} (score: {r.score:.4f})")
    print(f"  {r.chunk_text[:80]}...")

# Incremental update (only changed files)
eg.update()

# Index statistics
status = eg.status()
print(f"{status.total_files} files, {status.total_chunks} chunks, {status.index_size_mb} MB")

eg.close()

CLI

# Index a project
embgrep index ./my-project --patterns "*.py,*.md"

# Search
embgrep search "error handling patterns"

# Filter by file type
embgrep search "async database query" --path-filter "%.py"

# Check status
embgrep status

# Update changed files
embgrep update

Convenience functions

import embgrep

embgrep.index("./src")
results = embgrep.search("authentication middleware")
status = embgrep.status()
embgrep.update()

MCP Server

Add to your Claude Desktop / MCP client configuration:

{
  "mcpServers": {
    "embgrep": {
      "command": "embgrep-mcp"
    }
  }
}

Or with uvx:

{
  "mcpServers": {
    "embgrep": {
      "command": "uvx",
      "args": ["--from", "embgrep[mcp]", "embgrep-mcp"]
    }
  }
}

MCP Tools

Tool	Description
`index_directory`	Index files in a directory for semantic search
`semantic_search`	Search indexed files using natural language
`index_status`	Get current index statistics
`update_index`	Incremental update — re-index changed files only

How It Works

flowchart TD
    A["📁 Files"] --> B["Smart Chunking\ncode: function-level\ndocs: heading-level"]
    B --> C["fastembed\nlocal embeddings"]
    C --> D["SQLite\nvector index"]
    D --> E["🔍 Query"]
    E --> F["Cosine Similarity\nranked results"]
    F --> G["✅ Matches\nwith context"]

Chunking — Files are split into semantically meaningful chunks:
- Code files (.py, .js, .ts, etc.): split by function/class boundaries
- Documents (.md, .txt): split by headings or paragraph breaks
- Config files: fixed-size chunking
Embedding — Each chunk is converted to a 384-dimensional vector using BGE-small-en-v1.5 via ONNX Runtime (no PyTorch needed)
Storage — Embeddings are stored as BLOBs in a local SQLite database
Search — Query text is embedded and compared against all chunks using cosine similarity

Configuration

Parameter	Default	Description
`db_path`	`~/.local/share/embgrep/embgrep.db`	SQLite database location
`model`	`BAAI/bge-small-en-v1.5`	fastembed model name
`max_chunk_size`	1000 chars	Maximum chunk size for fixed-size splitting
`top_k`	5	Number of search results

QuartzUnit Ecosystem

Package	Description
markgrab	HTML/YouTube/PDF/DOCX to LLM-ready markdown
snapgrab	URL to screenshot + metadata
docpick	OCR + LLM document structure extraction
browsegrab	Local LLM browser agent
feedkit	RSS feed collection + MCP
embgrep	Local semantic search for files

Used in

newswatch — RSS news monitoring pipeline (feedkit → markgrab → embgrep → diffgrab)

License

MIT

_{Part of the QuartzUnit ecosystem — composable Python libraries for data collection, extraction, search, and AI agent safety.}

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

embgrep

한국어 문서 · llms.txt

Local semantic search — embedding-powered grep for files, zero external services.

Features

Local embeddings — Uses fastembed (ONNX Runtime), no API keys needed
SQLite storage — Single-file index, no external vector DB
Incremental indexing — Only re-indexes changed files (SHA-256 hash comparison)
Smart chunking — Function-level splitting for code, heading-level for docs
MCP native — 4-tool FastMCP server for LLM agent integration
15+ file types — .py, .js, .ts, .java, .go, .rs, .md, .txt, .yaml, .json, .toml, and more

Install

pip install embgrep              # core (fastembed + numpy)
pip install embgrep[cli]         # + click/rich CLI
pip install embgrep[mcp]         # + FastMCP server
pip install embgrep[all]         # everything

Quick Start

Python API

from embgrep import EmbGrep

eg = EmbGrep()

# Index a directory
eg.index("./my-project", patterns=["*.py", "*.md"])

# Semantic search
results = eg.search("database connection pooling", top_k=5)
for r in results:
    print(f"{r.file_path}:{r.line_start}-{r.line_end} (score: {r.score:.4f})")
    print(f"  {r.chunk_text[:80]}...")

# Incremental update (only changed files)
eg.update()

# Index statistics
status = eg.status()
print(f"{status.total_files} files, {status.total_chunks} chunks, {status.index_size_mb} MB")

eg.close()

CLI

# Index a project
embgrep index ./my-project --patterns "*.py,*.md"

# Search
embgrep search "error handling patterns"

# Filter by file type
embgrep search "async database query" --path-filter "%.py"

# Check status
embgrep status

# Update changed files
embgrep update

Convenience functions

import embgrep

embgrep.index("./src")
results = embgrep.search("authentication middleware")
status = embgrep.status()
embgrep.update()

MCP Server

Add to your Claude Desktop / MCP client configuration:

{
  "mcpServers": {
    "embgrep": {
      "command": "embgrep-mcp"
    }
  }
}

Or with uvx:

{
  "mcpServers": {
    "embgrep": {
      "command": "uvx",
      "args": ["--from", "embgrep[mcp]", "embgrep-mcp"]
    }
  }
}

MCP Tools

Tool	Description
`index_directory`	Index files in a directory for semantic search
`semantic_search`	Search indexed files using natural language
`index_status`	Get current index statistics
`update_index`	Incremental update — re-index changed files only

How It Works

flowchart TD
    A["📁 Files"] --> B["Smart Chunking\ncode: function-level\ndocs: heading-level"]
    B --> C["fastembed\nlocal embeddings"]
    C --> D["SQLite\nvector index"]
    D --> E["🔍 Query"]
    E --> F["Cosine Similarity\nranked results"]
    F --> G["✅ Matches\nwith context"]

Chunking — Files are split into semantically meaningful chunks:
- Code files (.py, .js, .ts, etc.): split by function/class boundaries
- Documents (.md, .txt): split by headings or paragraph breaks
- Config files: fixed-size chunking
Embedding — Each chunk is converted to a 384-dimensional vector using BGE-small-en-v1.5 via ONNX Runtime (no PyTorch needed)
Storage — Embeddings are stored as BLOBs in a local SQLite database
Search — Query text is embedded and compared against all chunks using cosine similarity

Configuration

Parameter	Default	Description
`db_path`	`~/.local/share/embgrep/embgrep.db`	SQLite database location
`model`	`BAAI/bge-small-en-v1.5`	fastembed model name
`max_chunk_size`	1000 chars	Maximum chunk size for fixed-size splitting
`top_k`	5	Number of search results

QuartzUnit Ecosystem

Package	Description
markgrab	HTML/YouTube/PDF/DOCX to LLM-ready markdown
snapgrab	URL to screenshot + metadata
docpick	OCR + LLM document structure extraction
browsegrab	Local LLM browser agent
feedkit	RSS feed collection + MCP
embgrep	Local semantic search for files

Used in

newswatch — RSS news monitoring pipeline (feedkit → markgrab → embgrep → diffgrab)

License

MIT

_{Part of the QuartzUnit ecosystem — composable Python libraries for data collection, extraction, search, and AI agent safety.}

Embgrep

embgrep

Features

Install

Quick Start

Python API

CLI

Convenience functions

MCP Server

MCP Tools

How It Works

Configuration

QuartzUnit Ecosystem

Used in

License

Embgrep

embgrep

Features

Install

Quick Start

Python API

CLI

Convenience functions

MCP Server

MCP Tools

How It Works

Configuration

QuartzUnit Ecosystem

Used in

License

Related AI & LLM Tools MCP Servers

Related AI & LLM Tools MCP Servers