qsearch

1HTTPregistry active

Summary

For agents that keep hallucinating because they're reading snippets instead of full pages, this plugs in multi-engine search with trust scoring built from repeated sweeps. You get full content extraction, per-URL provenance showing which engines agreed (Google, Brave, DuckDuckGo, Qwant via SearXNG), and a local Meilisearch corpus that ranks sources by how often they survive independent queries. The sweep endpoint batches queries with priority routing (free SearXNG for broad, paid Brave for focused, academic APIs for papers), the corpus viewer shows trust scores across topics, and everything runs self-hosted with your own API keys. Built for research sprints where you need to see what multiple engines confirm over time, not just what ranks first today.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

qsearch

I built this for my own daily research. After running 100+ research sprints, my agent kept hallucinating because it read 200-char snippets. qsearch gives it full content with multi-engine provenance — running locally, owned by me.

AI agents lose 17–33% of facts to hallucination because they read 200-character snippets, not full pages (Stanford 2024). Existing search APIs hide which engines agreed on a result. Existing knowledge graphs are enterprise-priced or vendor-locked.

qsearch is the open-source search layer that gives agents full content with multi-engine provenance — running on your machine, owned by you, ready for MCP today.

✅ v0.4.0 live at qsearch.pro. Multi-engine attribution, trust corpus with per-URL provenance (engines[], sweep_count, trust_score), corpus viewer at /ui, MCP-over-HTTP for Claude Code and any spec-compliant client. 📖 Architecture: ARCHITECTURE.md · Vision: docs/VISION.md · Technical spec: docs/TRUST_MESH.md · Federation deep-dive: docs/FEDERATION_ARCHITECTURE.md

Quick start

# 1. Clone
git clone https://github.com/theYahia/qsearch.git
cd qsearch

# 2. Get a Brave Search API key (BYOK, $5/mo for ~1000 queries)
#    → https://brave.com/search/api/ → sign up → copy key

# 3. Configure
cp .env.example .env.local
# Set BRAVE_API_KEY=your_key
# Set SEARXNG_URL=http://localhost:8888 (for multi-engine attribution)

# 4. Start infrastructure (Meilisearch + Qdrant + SearXNG)
docker compose up -d

# 5. (Optional) Pull Ollama models for local LLM cleaning + embedding rerank
#    Without them, search still works — just no cleaned_markdown and no rerank.
ollama pull qwen2.5:7b-instruct   # ~5GB, cleaner (used by /sweep_context)
ollama pull nomic-embed-text      # 274MB, embedding rerank (Phase B)

# 6. Install & run
npm install
npm start            # → qsearch v0.4.0 on http://localhost:8080

# 7. (Optional) MCP server for Claude Code / Workbench / OpenClaw
npm run start:mcp    # → http://0.0.0.0:8081

# 8. Test multi-engine attribution
curl -X POST http://localhost:8080/sweep \
  -H "Content-Type: text/plain" \
  --data-binary $'t1|self-hosted search engine\n'
# → parsed_snippets.md with "Engines: google, duckduckgo, brave (count=3)"

BYOK design: Brave key + SearXNG + Ollama all stay on your machine. No data exfiltration.

How I use it daily

Every research sprint I run a dual sweep:

# Brave sweep (primary, authoritative)
python research/scripts/brave_sweep.py queries.txt _raw_data/topic_2026-04-28/brave/

# qsearch sweep (secondary, auto-indexes into corpus)
curl -X POST http://localhost:8080/sweep?topic=my_topic \
  -H "Content-Type: text/plain" --data-binary @queries.txt

After 10+ sprints on the same domain, /corpus/top?min_engines=3 shows which URLs survived multiple independent search engines across multiple sessions. Those are the ones I actually trust.

Why qsearch exists

Every AI agent today hits the same broken loop:

Agent → Tavily/Exa/Serper API → 200-char snippets → hallucinated answer

Three failures:

Snippets aren't enough. Stanford's 2024 production RAG audit measured 17–33% hallucination on Lexis+ AI and Westlaw despite "hallucination-free" claims. On Wikipedia QA, full content beats snippet-RAG by +7.3pp (arxiv 2501.01880).
No trust signal. Search APIs return ranked lists without telling you which engines agreed. SEO-spam at position 3 looks identical to authoritative source at position 4.
No memory. Every search starts from zero. The same trash gets surfaced again. The same authority goes unrecognized.

qsearch addresses all three:

Full content fetched and cleaned, not just snippets.
engines[] field per result — Google + DDG + Brave + Qwant + Startpage attribution exposed (via SearXNG aggregation).
Local corpus accumulates — every URL grows a trust profile across sweeps.

How it works

flowchart LR
    A[Your agent] -->|query| Q[qsearch]
    Q -->|fan out| B[Brave Search API]
    Q -->|fan out| S["SearXNG\n(Google, DDG, Brave, Qwant, …)"]
    B -->|results| Q
    S -->|results + engines[]| Q
    Q -->|index by URL| C["Local corpus\n(Meilisearch + Qdrant)"]
    C -->|trust score| Q
    Q -->|re-ranked + full content + provenance| A

    style C fill:#fde68a,stroke:#d97706,color:#000
    style Q fill:#93c5fd,stroke:#2563eb,color:#000
    style S fill:#86efac,stroke:#16a34a,color:#000

The yellow node is your private corpus. URLs found by 5 engines + 3 sweeps + 4 topics get a trust score that emerges naturally — no human ranking, no centralized authority, no cloud round-trip.

How qsearch compares

	Tavily	Exa	Serper	Brave API	SearXNG	qsearch
Open source core	❌	❌	❌	❌	✅	✅
Full content (not snippets)	partial	partial	❌	❌	❌	✅
Multi-engine attribution	❌	❌	❌	❌	partial	✅ (`engines[]`)
Persistent local corpus	❌	❌	❌	❌	❌	✅
Trust score per URL	❌	❌	❌	❌	❌	✅
Self-hostable	❌	❌	❌	❌	✅	✅
MCP-native	partial	✅	❌	✅	❌	✅
BYOK upstream	❌	❌	❌	N/A	✅	✅

API — v0.4.0

Search endpoints

Endpoint	Description	Backend
`POST /search`	Web search + corpus first, trust-weighted re-rank	Brave or SearXNG
`POST /sweep`	Batch search with priority/domain routing (see below)	SearXNG / Brave / Academic
`POST /cached_sweep`	Same as `/sweep`, with SQLite memcache layer	SearXNG / Brave / Academic
`POST /academic_search`	Peer-reviewed papers via arxiv + PubMed + Semantic Scholar	Academic (free, no auth)
`POST /sweep_context`	Local LLM page extraction (analogue of Brave LLM Context)	Ollama qwen2.5
`POST /news`	News search	Brave (requires key)
`POST /context`	Deep page extraction	Brave (requires key)
`POST /index`	Crawl URL or index local `.md` glob	Crawl4AI
`GET /trust/:url`	Trust score + provenance for any URL in corpus	—
`GET /corpus/top`	Top URLs ranked by trust (`?limit=20&min_engines=3`)	—
`GET /corpus/stats`	Corpus size + counts	—
`GET /economy_report`	Sprint cost breakdown by backend + savings vs all-Brave	—
`GET /ui`	Corpus browser — search, trust scores, provenance modal	—
`GET /health`	Service status	—

/search accepts: query, n_results (1–20), freshness (pd/pw/pm/py), search_lang, country, corpus_first (default true), corpus_only (default false).

/sweep accepts text/plain body with one query per line in the format label|query[|priority][|domain]:

priority ∈ broad (default, SearXNG, $0) / focused (Brave, ~$0.005) / critical (Brave + LLM Context, ~$0.01)
domain ∈ general (default) / scholarly (arxiv+PubMed+S2, $0) / ru (SearXNG with language=ru-RU bias, $0)

# Examples
bench_a|qdrant production latency benchmarks|focused
sch_a|crispr cas9 off target effects|broad|scholarly
ru_a|tadviser сро рейтинг 2025|broad|ru
crit_a|self-hosted vector DB choice 2026|critical
gen|simple search|broad        # 2-field still works — defaults broad/general

Auto-indexes results into Meilisearch with engines[] and engine_count filterable.

/academic_search accepts JSON: { query, n_results (1-20), sources?: ["arxiv","pubmed","semanticscholar"] }. Fans out to all three in parallel, dedupes by DOI/title, returns interleaved top-N.

Multi-engine attribution example

curl -X POST http://localhost:8080/sweep \
  -H "Content-Type: text/plain" \
  --data-binary $'t1|self-hosted search engine 2026\n'

Output excerpt (parsed_snippets.md):

**1. GitHub - searxng/searxng**
- URL: https://github.com/searxng/searxng
- Engines: google, duckduckgo, brave, qwant (count=4)
  > A privacy-respecting, hackable metasearch engine...

**2. random-blog.io/seo-spam-2026**
- URL: https://random-blog.io/seo-spam-2026
- Engines: google (count=1)
  > Best self-hosted search engines you must try...

URL #1 has engine_count=4 — found by 4 independent engines. URL #2 has engine_count=1 — found by only one. The trust signal is built into the data, not bolted on.

Filter by trust in Meilisearch

curl -H "Authorization: Bearer masterKey" \
  "http://localhost:7700/indexes/qsearch_corpus/documents?filter=engine_count%20%3E%3D%203"

Returns only URLs found by 3+ engines — your high-trust subset.

MCP integration

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "qsearch": {
      "type": "http",
      "url": "http://localhost:8081"
    }
  }
}

Available tools:

mcp__qsearch__web_search — web search via Brave or SearXNG
mcp__qsearch__sweep — batch research sweep with multi-engine attribution
mcp__qsearch__academic_search — peer-reviewed papers via arxiv + PubMed + Semantic Scholar
mcp__qsearch__sweep_context — Phase 3 local LLM page extraction (free, Ollama)
mcp__qsearch__economy_report — cost breakdown vs all-Brave baseline
mcp__qsearch__index_research — index local .md files by glob
mcp__qsearch__news_search — news search (Brave key required)
mcp__qsearch__context_search — deep page content (Brave key required)

Other MCP-over-HTTP clients

qsearch publishes Streamable HTTP transport at / on port :8081. Compatible with Claude Desktop (HTTP mode), OpenClaw, and any spec-compliant MCP client.

Stack

Component	Tech
Runtime	Node.js ≥20
Web search	Brave Search API (BYOK)
Meta-search	SearXNG (self-hosted, optional)
Academic	arxiv + PubMed E-utilities + Semantic Scholar API (free, no auth)
Full-text corpus	Meilisearch v1.7
Vector corpus	Qdrant v1.17.1
Crawler	Crawl4AI 0.8.6 (Python subprocess)
Embedder (optional)	Ollama `nomic-embed-text` (default) or llama.cpp `/v1/embeddings`
LLM cleaner (optional)	Ollama `qwen2.5:7b-instruct` (default; configurable via `OLLAMA_CLEAN_MODEL`)
MCP	`@modelcontextprotocol/sdk`
License	Apache-2.0

Roadmap

Version	Feature	When
v0.3.1	Multi-engine `engines[]` attribution + dual sweep + corpus + MCP	shipped
v0.4.0	Trust layer: `/trust/:url`, `/corpus/top`, `/ui` viewer, trust-weighted re-rank, sort/pagination, corpus merge-on-upsert, snippet sanitization	shipped
v0.4.1	Phase A — academic backend (arxiv + PubMed + S2), 4-field queries (`label\|q\|priority\|domain`), `/academic_search` JSON + MCP tool	shipped
v0.4.2	Phase B — embedding rerank (Ollama nomic-embed-text, gated `QSEARCH_RERANK_ENABLED`); Phase C — RU coverage via SearXNG `language=ru-RU`	shipped
v0.4.3	QVAC SDK ripped out, all local LLM via Ollama (`qwen2.5:7b-instruct` + `nomic-embed-text`)	shipped
v0.5	Launch: awesome list PRs, MCP Registry publish, Show HN, newsletter distribution	in progress
v0.6	Phase B Stage 2 — LLM scoring rerank for critical queries; direct Yandex backend; Layer 8 quality gate (rejection threshold)	next
v0.7+	Optional federation (research direction — no timeline until v0.5 validated)	open

See docs/VISION.md for the full picture and why federation is research-direction-only until we can ship it without overpromise.

Honest trade-offs

Cold start. First sweep takes 5–10 seconds (engine fan-out + corpus indexing). Best run as long-lived daemon.
Vector search Windows-blocked. Qdrant requires bare-runtime; not all platforms supported. Full-text Meilisearch works everywhere.
SearXNG rate limits. Self-host required — public instances get blocked by Google. Our docker-compose handles this.
engines[] requires SearXNG. Pure-Brave mode still works but loses the multi-engine signal.
Full content has latency cost. ~31s vs ~3s naive snippet retrieval (Bidirectional RAG study). qsearch makes this opt-in via /context endpoint.

Follow

🌐 Live demo: qsearch.pro
⭐ Star: github.com/theYahia/qsearch
🐦 X: @TheTieTieTies

License

Apache-2.0 — see LICENSE. Independent. BYOK. Self-hostable. No vendor lock-in.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

qsearch

I built this for my own daily research. After running 100+ research sprints, my agent kept hallucinating because it read 200-char snippets. qsearch gives it full content with multi-engine provenance — running locally, owned by me.

qsearch is the open-source search layer that gives agents full content with multi-engine provenance — running on your machine, owned by you, ready for MCP today.

✅ v0.4.0 live at qsearch.pro. Multi-engine attribution, trust corpus with per-URL provenance (engines[], sweep_count, trust_score), corpus viewer at /ui, MCP-over-HTTP for Claude Code and any spec-compliant client. 📖 Architecture: ARCHITECTURE.md · Vision: docs/VISION.md · Technical spec: docs/TRUST_MESH.md · Federation deep-dive: docs/FEDERATION_ARCHITECTURE.md

Quick start

# 1. Clone
git clone https://github.com/theYahia/qsearch.git
cd qsearch

# 2. Get a Brave Search API key (BYOK, $5/mo for ~1000 queries)
#    → https://brave.com/search/api/ → sign up → copy key

# 3. Configure
cp .env.example .env.local
# Set BRAVE_API_KEY=your_key
# Set SEARXNG_URL=http://localhost:8888 (for multi-engine attribution)

# 4. Start infrastructure (Meilisearch + Qdrant + SearXNG)
docker compose up -d

# 5. (Optional) Pull Ollama models for local LLM cleaning + embedding rerank
#    Without them, search still works — just no cleaned_markdown and no rerank.
ollama pull qwen2.5:7b-instruct   # ~5GB, cleaner (used by /sweep_context)
ollama pull nomic-embed-text      # 274MB, embedding rerank (Phase B)

# 6. Install & run
npm install
npm start            # → qsearch v0.4.0 on http://localhost:8080

# 7. (Optional) MCP server for Claude Code / Workbench / OpenClaw
npm run start:mcp    # → http://0.0.0.0:8081

# 8. Test multi-engine attribution
curl -X POST http://localhost:8080/sweep \
  -H "Content-Type: text/plain" \
  --data-binary $'t1|self-hosted search engine\n'
# → parsed_snippets.md with "Engines: google, duckduckgo, brave (count=3)"

BYOK design: Brave key + SearXNG + Ollama all stay on your machine. No data exfiltration.

How I use it daily

Every research sprint I run a dual sweep:

# Brave sweep (primary, authoritative)
python research/scripts/brave_sweep.py queries.txt _raw_data/topic_2026-04-28/brave/

# qsearch sweep (secondary, auto-indexes into corpus)
curl -X POST http://localhost:8080/sweep?topic=my_topic \
  -H "Content-Type: text/plain" --data-binary @queries.txt

After 10+ sprints on the same domain, /corpus/top?min_engines=3 shows which URLs survived multiple independent search engines across multiple sessions. Those are the ones I actually trust.

Why qsearch exists

Every AI agent today hits the same broken loop:

Agent → Tavily/Exa/Serper API → 200-char snippets → hallucinated answer

Three failures:

Snippets aren't enough. Stanford's 2024 production RAG audit measured 17–33% hallucination on Lexis+ AI and Westlaw despite "hallucination-free" claims. On Wikipedia QA, full content beats snippet-RAG by +7.3pp (arxiv 2501.01880).
No trust signal. Search APIs return ranked lists without telling you which engines agreed. SEO-spam at position 3 looks identical to authoritative source at position 4.
No memory. Every search starts from zero. The same trash gets surfaced again. The same authority goes unrecognized.

qsearch addresses all three:

Full content fetched and cleaned, not just snippets.
engines[] field per result — Google + DDG + Brave + Qwant + Startpage attribution exposed (via SearXNG aggregation).
Local corpus accumulates — every URL grows a trust profile across sweeps.

How it works

flowchart LR
    A[Your agent] -->|query| Q[qsearch]
    Q -->|fan out| B[Brave Search API]
    Q -->|fan out| S["SearXNG\n(Google, DDG, Brave, Qwant, …)"]
    B -->|results| Q
    S -->|results + engines[]| Q
    Q -->|index by URL| C["Local corpus\n(Meilisearch + Qdrant)"]
    C -->|trust score| Q
    Q -->|re-ranked + full content + provenance| A

    style C fill:#fde68a,stroke:#d97706,color:#000
    style Q fill:#93c5fd,stroke:#2563eb,color:#000
    style S fill:#86efac,stroke:#16a34a,color:#000

The yellow node is your private corpus. URLs found by 5 engines + 3 sweeps + 4 topics get a trust score that emerges naturally — no human ranking, no centralized authority, no cloud round-trip.

How qsearch compares

	Tavily	Exa	Serper	Brave API	SearXNG	qsearch
Open source core	❌	❌	❌	❌	✅	✅
Full content (not snippets)	partial	partial	❌	❌	❌	✅
Multi-engine attribution	❌	❌	❌	❌	partial	✅ (`engines[]`)
Persistent local corpus	❌	❌	❌	❌	❌	✅
Trust score per URL	❌	❌	❌	❌	❌	✅
Self-hostable	❌	❌	❌	❌	✅	✅
MCP-native	partial	✅	❌	✅	❌	✅
BYOK upstream	❌	❌	❌	N/A	✅	✅

API — v0.4.0

Search endpoints

Endpoint	Description	Backend
`POST /search`	Web search + corpus first, trust-weighted re-rank	Brave or SearXNG
`POST /sweep`	Batch search with priority/domain routing (see below)	SearXNG / Brave / Academic
`POST /cached_sweep`	Same as `/sweep`, with SQLite memcache layer	SearXNG / Brave / Academic
`POST /academic_search`	Peer-reviewed papers via arxiv + PubMed + Semantic Scholar	Academic (free, no auth)
`POST /sweep_context`	Local LLM page extraction (analogue of Brave LLM Context)	Ollama qwen2.5
`POST /news`	News search	Brave (requires key)
`POST /context`	Deep page extraction	Brave (requires key)
`POST /index`	Crawl URL or index local `.md` glob	Crawl4AI
`GET /trust/:url`	Trust score + provenance for any URL in corpus	—
`GET /corpus/top`	Top URLs ranked by trust (`?limit=20&min_engines=3`)	—
`GET /corpus/stats`	Corpus size + counts	—
`GET /economy_report`	Sprint cost breakdown by backend + savings vs all-Brave	—
`GET /ui`	Corpus browser — search, trust scores, provenance modal	—
`GET /health`	Service status	—

/search accepts: query, n_results (1–20), freshness (pd/pw/pm/py), search_lang, country, corpus_first (default true), corpus_only (default false).

/sweep accepts text/plain body with one query per line in the format label|query[|priority][|domain]:

priority ∈ broad (default, SearXNG, $0) / focused (Brave, ~$0.005) / critical (Brave + LLM Context, ~$0.01)
domain ∈ general (default) / scholarly (arxiv+PubMed+S2, $0) / ru (SearXNG with language=ru-RU bias, $0)

# Examples
bench_a|qdrant production latency benchmarks|focused
sch_a|crispr cas9 off target effects|broad|scholarly
ru_a|tadviser сро рейтинг 2025|broad|ru
crit_a|self-hosted vector DB choice 2026|critical
gen|simple search|broad        # 2-field still works — defaults broad/general

Auto-indexes results into Meilisearch with engines[] and engine_count filterable.

/academic_search accepts JSON: { query, n_results (1-20), sources?: ["arxiv","pubmed","semanticscholar"] }. Fans out to all three in parallel, dedupes by DOI/title, returns interleaved top-N.

Multi-engine attribution example

curl -X POST http://localhost:8080/sweep \
  -H "Content-Type: text/plain" \
  --data-binary $'t1|self-hosted search engine 2026\n'

Output excerpt (parsed_snippets.md):

**1. GitHub - searxng/searxng**
- URL: https://github.com/searxng/searxng
- Engines: google, duckduckgo, brave, qwant (count=4)
  > A privacy-respecting, hackable metasearch engine...

**2. random-blog.io/seo-spam-2026**
- URL: https://random-blog.io/seo-spam-2026
- Engines: google (count=1)
  > Best self-hosted search engines you must try...

URL #1 has engine_count=4 — found by 4 independent engines. URL #2 has engine_count=1 — found by only one. The trust signal is built into the data, not bolted on.

Filter by trust in Meilisearch

curl -H "Authorization: Bearer masterKey" \
  "http://localhost:7700/indexes/qsearch_corpus/documents?filter=engine_count%20%3E%3D%203"

Returns only URLs found by 3+ engines — your high-trust subset.

MCP integration

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "qsearch": {
      "type": "http",
      "url": "http://localhost:8081"
    }
  }
}

Available tools:

mcp__qsearch__web_search — web search via Brave or SearXNG
mcp__qsearch__sweep — batch research sweep with multi-engine attribution
mcp__qsearch__academic_search — peer-reviewed papers via arxiv + PubMed + Semantic Scholar
mcp__qsearch__sweep_context — Phase 3 local LLM page extraction (free, Ollama)
mcp__qsearch__economy_report — cost breakdown vs all-Brave baseline
mcp__qsearch__index_research — index local .md files by glob
mcp__qsearch__news_search — news search (Brave key required)
mcp__qsearch__context_search — deep page content (Brave key required)

Other MCP-over-HTTP clients

qsearch publishes Streamable HTTP transport at / on port :8081. Compatible with Claude Desktop (HTTP mode), OpenClaw, and any spec-compliant MCP client.

Stack

Component	Tech
Runtime	Node.js ≥20
Web search	Brave Search API (BYOK)
Meta-search	SearXNG (self-hosted, optional)
Academic	arxiv + PubMed E-utilities + Semantic Scholar API (free, no auth)
Full-text corpus	Meilisearch v1.7
Vector corpus	Qdrant v1.17.1
Crawler	Crawl4AI 0.8.6 (Python subprocess)
Embedder (optional)	Ollama `nomic-embed-text` (default) or llama.cpp `/v1/embeddings`
LLM cleaner (optional)	Ollama `qwen2.5:7b-instruct` (default; configurable via `OLLAMA_CLEAN_MODEL`)
MCP	`@modelcontextprotocol/sdk`
License	Apache-2.0

Roadmap

Version	Feature	When
v0.3.1	Multi-engine `engines[]` attribution + dual sweep + corpus + MCP	shipped
v0.4.0	Trust layer: `/trust/:url`, `/corpus/top`, `/ui` viewer, trust-weighted re-rank, sort/pagination, corpus merge-on-upsert, snippet sanitization	shipped
v0.4.1	Phase A — academic backend (arxiv + PubMed + S2), 4-field queries (`label\|q\|priority\|domain`), `/academic_search` JSON + MCP tool	shipped
v0.4.2	Phase B — embedding rerank (Ollama nomic-embed-text, gated `QSEARCH_RERANK_ENABLED`); Phase C — RU coverage via SearXNG `language=ru-RU`	shipped
v0.4.3	QVAC SDK ripped out, all local LLM via Ollama (`qwen2.5:7b-instruct` + `nomic-embed-text`)	shipped
v0.5	Launch: awesome list PRs, MCP Registry publish, Show HN, newsletter distribution	in progress
v0.6	Phase B Stage 2 — LLM scoring rerank for critical queries; direct Yandex backend; Layer 8 quality gate (rejection threshold)	next
v0.7+	Optional federation (research direction — no timeline until v0.5 validated)	open

See docs/VISION.md for the full picture and why federation is research-direction-only until we can ship it without overpromise.

Honest trade-offs

Cold start. First sweep takes 5–10 seconds (engine fan-out + corpus indexing). Best run as long-lived daemon.
Vector search Windows-blocked. Qdrant requires bare-runtime; not all platforms supported. Full-text Meilisearch works everywhere.
SearXNG rate limits. Self-host required — public instances get blocked by Google. Our docker-compose handles this.
engines[] requires SearXNG. Pure-Brave mode still works but loses the multi-engine signal.
Full content has latency cost. ~31s vs ~3s naive snippet retrieval (Bidirectional RAG study). qsearch makes this opt-in via /context endpoint.

Follow

🌐 Live demo: qsearch.pro
⭐ Star: github.com/theYahia/qsearch
🐦 X: @TheTieTieTies

License

Apache-2.0 — see LICENSE. Independent. BYOK. Self-hostable. No vendor lock-in.

qsearch

qsearch

Quick start

How I use it daily

Why qsearch exists

How it works

How qsearch compares

API — v0.4.0

Search endpoints

Multi-engine attribution example

Filter by trust in Meilisearch

MCP integration

Claude Code

Other MCP-over-HTTP clients

Stack

Roadmap

Honest trade-offs

Follow

License

qsearch

qsearch

Quick start

How I use it daily

Why qsearch exists

How it works

How qsearch compares

API — v0.4.0

Search endpoints

Multi-engine attribution example

Filter by trust in Meilisearch

MCP integration

Claude Code

Other MCP-over-HTTP clients

Stack

Roadmap

Honest trade-offs

Follow

License

Related Search & Web Crawling MCP Servers

Related Search & Web Crawling MCP Servers