Connects Claude to your Thoth systematic literature review workspace in read-only mode. Exposes five tools: list your reviews with their critic and citation faithfulness scores, pull the full markdown draft of a completed review, retrieve the per-claim citation audit report that flags unsupported references, and browse discovered papers with their screening status. The citation audit is the centerpiece here. Every claim in a Thoth review gets verified against its source PDF before you see it, and this MCP lets Claude surface those verdicts directly. Useful when you're iterating on research questions across multiple reviews or need an AI to help you spot patterns in citation failures without opening the web UI.
Agentic systematic literature reviews — with every citation checked against the source.
Named for Thoth, ancient Egypt's ibis-headed god of writing and scribes.
Try the live demo · See a sample review · Public eval dashboard · Connect via MCP
Systematic literature reviews are slow to write — and when you ask an LLM to write one, it confidently invents citations and statistics that aren't in any paper.
Thoth does both halves and checks its own work. Give it a research question and
it discovers relevant papers, reads them, drafts an evidence-grounded review — then
runs a verification pass (cite_check) that compares every cited claim against the
source paper and flags anything unsupported before you read the draft. The result
is a review with a critic score, a citation-faithfulness percentage, and a per-claim
audit you can trust.
It runs as a polished web app, a public eval dashboard, and an authenticated MCP server your AI assistant can call directly.
Claude.ai catches 6 fabricated citations in a real draft — using Thoth's audit:
Connected to Thoth via the official MCP Registry, Claude calls
get_citation_auditon one deliberately-weak review (faithfulness 0.13 for that single review) and identifies all 6 unsupported claims — every one citing the same paper, with invented percentages that aren't in the source. This iscite_checkdoing its job: it's a single-review audit sample, not the golden-set aggregate (see/evals).
Every claim, scored against its source — the /showcase review (no login needed). The figures on this card (critic 4.2/5, faithfulness 75%, 8/8 citations checked, 2 unsupported) are this one review's scores — a worked example, not the aggregate:
Evaluated in public — /evals tracks citation recall / precision / faithfulness / coverage over an 18-question versioned golden set (7 of 18 populated at this commit), regenerated in CI and published with the last-run date, so a regression is a public, falsifiable signal:
You approve every step — three human-in-the-loop gates (review plan → review discovered papers → approve included papers); nothing runs unattended:
cite_check — verifiable citations. Every [paper_id] in the draft is
scored against the cited paper and labelled supported / unsupported / unclear,
so the LLM can't quietly hallucinate a citation. On the public golden set, the
citations it does surface are accurate — citation precision 97%, recall 74%
— and the verdict report is published per claim, not summarised away. This is the
core differentiator: the citations are measured, not asserted.discoverer → fetcher → screener path is wired across OpenAlex, arXiv, and
Exa: it fetches open-access PDFs, OCRs them, and screens each against your plan,
so you can run uploaded-only, hybrid, or fully autonomous discovery. The discovery
and screening axes are v2 and still being calibrated — they're tracked openly on
/evals (both currently at 0%) rather than
shipped as a silent claim./evals — an eval regression is a public
signal, not a hidden one.Try it now (nothing to install):
Connect it to your AI assistant — paste this into claude.ai (Pro/Max), Claude Desktop, Cursor, or any MCP client (OAuth runs in your browser; no token to copy):
https://thoth-slr.vercel.app/api/mcp/mcp
list_reviews — your reviews with critic + faithfulness scoresget_review_draft — the markdown draft of a completed reviewget_citation_audit — the per-claim cite_check verdict reportlist_discovered_papers (v2) — papers the discoverer surfaced, with fetch + screening statusget_search_queries (v2) — the queries the discoverer generated + per-provider errorsFull reference: docs/mcp/tools.md · auth + audit model: docs/mcp/security.md
Run it locally:
git clone https://github.com/ahmedEid1/thoth.git && cd thoth
cp .env.example .env # Clerk + Trigger.dev keys + MISTRAL_API_KEY
docker compose up -d # postgres, minio, langfuse
pnpm install && pnpm prisma migrate dev
pnpm dev # Next.js on :3000
pnpm dev:trigger # Trigger.dev worker (separate terminal)
Full setup, the agent pipeline, and the v2 flow: docs/architecture.md.
| Live app | thoth-slr.vercel.app (Clerk sign-in) · sample review at /showcase |
| Public evals | /evals — citation precision 97%, recall 74% on a versioned 18-question golden set (7 of 18 populated at this commit; faithfulness 38% / coverage 32% tracked in the open as the set fills out; discovery/screening v2 under calibration). Regenerated in CI, published with the last-run date — a regression is a public signal. |
| MCP Registry | io.github.ahmedEid1/thoth — status: active |
| Tests | 676 unit/integration + 22 live e2e against the deployed instance (MCP transport, real-browser, authenticated walkthroughs, full agent runs) — all green; tsc + lint clean |
| Audit log | Every MCP call recorded with a SHA-256 input hash; no raw input stored |
| Deploy cost | $0/mo — Vercel + Neon + Cloudflare R2 + Langfuse + Trigger.dev, all free tiers (self-host option) |
Thoth is a LangGraph StateGraph driven by a Trigger.dev worker, with durable
human-in-the-loop gates, a per-run cost cap, and exactly-once gate delivery. Next.js 16
Ibis icon by Delapouite under CC BY 3.0, via game-icons.net.
MIT © 2026 Ahmed Hobeishy
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent