Connects Claude directly to the arXiv preprint repository through four tools: free-text search with boolean operators and field prefixes, batch metadata fetching by paper ID, full-text HTML extraction with LaTeX math rendering, and category taxonomy browsing. Built on the author's mcp-ts-core framework with request queuing that respects arXiv's 3-second crawl delay and adaptive rate limit handling. Includes an optional local OAI-PMH mirror mode that caches metadata in SQLite to eliminate rate limit exposure for search and metadata operations. Available as a public hosted instance at arxiv.caseyjhand.com or self-hosted via stdio. Useful for literature reviews, citation gathering, or any workflow where you need to query and read academic papers without leaving your LLM conversation.
Public tool metadata for what this MCP can expose to an agent.
arxiv_searchSearch arXiv papers by query with category and sort filters. Returns paper metadata including title, authors, abstract, categories, and links.6 paramsSearch arXiv papers by query with category and sort filters. Returns paper metadata including title, authors, abstract, categories, and links.
querystringstartnumbersort_bystringrelevance · submitted · updateddefault: relevancecategorystringsort_orderstringascending · descendingdefault: descendingmax_resultsnumberarxiv_get_metadataGet full metadata for one or more arXiv papers by ID. Use when you have known IDs from citations, prior search results, or memory.1 paramsGet full metadata for one or more arXiv papers by ID. Use when you have known IDs from citations, prior search results, or memory.
paper_idsvaluearxiv_read_paperFetch the full text content of an arXiv paper from its HTML rendering. Tries native arXiv HTML first, falls back to ar5iv. Returns raw HTML for direct interpretation.2 paramsFetch the full text content of an arXiv paper from its HTML rendering. Tries native arXiv HTML first, falls back to ar5iv. Returns raw HTML for direct interpretation.
paper_idstringmax_charactersnumberarxiv_list_categoriesList arXiv category codes and names. Useful for discovering valid category filters for arxiv_search.1 paramsList arXiv category codes and names. Useful for discovering valid category filters for arxiv_search.
groupstringcs · econ · eess · math · physics · q-bioSearch arXiv, fetch paper metadata, and read full-text content via MCP. STDIO or Streamable HTTP.
Public Hosted Server: https://arxiv.caseyjhand.com/mcp
Four tools for searching and reading arXiv papers:
| Tool Name | Description |
|---|---|
arxiv_search | Search arXiv papers by query with category and sort filters. |
arxiv_get_metadata | Get full metadata for one or more arXiv papers by ID. |
arxiv_read_paper | Fetch the full text content of an arXiv paper from its HTML rendering. |
arxiv_list_categories | List arXiv category taxonomy, optionally filtered by group. |
arxiv_searchSearch for papers using free-text queries with field prefixes and boolean operators.
ti: (title), au: (author), abs: (abstract), cat: (category), all: (all fields)AND, OR, ANDNOTarxiv_get_metadataFetch full metadata for one or more papers by known arXiv ID.
2401.12345v2) and unversioned (2401.12345) IDshep-th/9901001)arxiv_read_paperRead the full HTML content of an arXiv paper.
$…$ inline, $$…$$ block) so the character budget targets paper contentmax_characters defaults to 100,000; raw HTML can be 500KB-3MB+ for math-heavy papersarxiv_list_categoriesList arXiv category codes and names for discovery.
| URI Pattern | Description |
|---|---|
arxiv://paper/{paperId} | Paper metadata by arXiv ID. |
arxiv://categories | Full arXiv category taxonomy. |
Built on @cyanheads/mcp-ts-core:
none, jwt, oauth)arXiv-specific:
Retry-Afterarxiv_search and arxiv_get_metadata. See Optional: Local Mirror.A public instance is available at https://arxiv.caseyjhand.com/mcp — no installation required. Point any MCP client at it via Streamable HTTP:
{
"mcpServers": {
"arxiv-mcp-server": {
"type": "streamable-http",
"url": "https://arxiv.caseyjhand.com/mcp"
}
}
}
Add to your MCP client config (e.g., claude_desktop_config.json):
{
"mcpServers": {
"arxiv-mcp-server": {
"type": "stdio",
"command": "bunx",
"args": ["@cyanheads/arxiv-mcp-server@latest"]
}
}
}
git clone https://github.com/cyanheads/arxiv-mcp-server.git
cd arxiv-mcp-server
bun install
All configuration is optional — the server works out of the box with sensible defaults.
| Variable | Description | Default |
|---|---|---|
ARXIV_API_BASE_URL | arXiv API base URL. | https://export.arxiv.org/api |
ARXIV_REQUEST_DELAY_MS | Minimum delay between arXiv API requests (ms). | 3000 |
ARXIV_CONTENT_TIMEOUT_MS | Timeout for HTML content fetches (ms). | 30000 |
ARXIV_API_TIMEOUT_MS | Timeout for API search/metadata requests (ms). | 15000 |
ARXIV_MIRROR_ENABLED | Enable local OAI-PMH metadata mirror for search and metadata. | false |
ARXIV_MIRROR_PATH | SQLite path for the mirror. | ./data/arxiv-mirror.db |
ARXIV_MIRROR_REFRESH_CRON | UTC cron expression for in-process daily refresh (HTTP mode only). | unset |
ARXIV_MIRROR_FALLBACK_LIVE | Fall through to live API on local ID-lookup miss. | true |
ARXIV_MIRROR_RECENT_DAYS_LIVE | Route sortBy=submitted descending queries within this window to the live API. | 2 |
ARXIV_MIRROR_OAI_BASE_URL | arXiv OAI-PMH endpoint base URL. | https://oaipmh.arxiv.org/oai |
ARXIV_MIRROR_OAI_REQUEST_DELAY_MS | Minimum delay between OAI-PMH requests (ms). | 3000 |
ARXIV_MIRROR_REFRESH_TIMEOUT_MS | Abort budget for one scheduled refresh subprocess (ms). | 7200000 |
MCP_TRANSPORT_TYPE | Transport: stdio or http. | stdio |
MCP_HTTP_PORT | Port for HTTP server. | 3010 |
MCP_AUTH_MODE | Auth mode: none, jwt, or oauth. | none |
MCP_LOG_LEVEL | Log level (RFC 5424). | info |
Build and run:
bun run build
bun run start:http # or start:stdio
Run checks and tests:
bun run devcheck # Lint, format, typecheck, audit
bun run test # Vitest
For self-hosted deployments behind a single egress IP, arXiv's ~3-second per-IP crawl delay serializes concurrent users. An optional local mirror eliminates rate-limit exposure for arxiv_search and arxiv_get_metadata by serving from a SQLite + FTS5 store harvested via OAI-PMH. arxiv_read_paper continues to use the live API — full-content harvest is forbidden by arXiv's data policy.
Disabled by default. To enable:
# 1. Cold-start harvest (~4.4h sequential, resumable from checkpoint). One-time per installation.
bun run mirror:init
# 2. Enable the mirror.
export ARXIV_MIRROR_ENABLED=true
# 3. Start the server — reads switch to the mirror once the harvest completes.
bun run start:http
Daily incremental refresh (small delta; duration depends on arXiv's OAI-PMH page pacing) via:
bun run mirror:refresh # wire to cron / systemd timer / launchd, OR
# set ARXIV_MIRROR_REFRESH_CRON to schedule it in HTTP mode (spawned as a child process)
bun run mirror:verify # PRAGMA integrity_check + quick_check
Behavior notes. Ranking divergence: FTS5 BM25 differs from arXiv's internal ranking, so sortBy=relevance against the mirror returns a different top-K than the live API. Queries sorted by submitted descending within ARXIV_MIRROR_RECENT_DAYS_LIVE days route to the live API to cover the nightly-update gap. Refresh resilience: after the initial cold harvest completes, an in-progress or failed daily refresh keeps serving the existing dataset from the mirror — arxiv_search and arxiv_get_metadata don't drop to the live API during the refresh window (#21). The scheduled HTTP-mode refresh runs in a child process, so the harvest's synchronous SQLite writes never block the request event loop — search and metadata stay responsive throughout (#22). The mirror stores the latest version only; per-version reads continue to use the live API. See #12 for the full design.
docker build -t arxiv-mcp-server .
docker run -p 3010:3010 arxiv-mcp-server
| Directory | Purpose |
|---|---|
src/mcp-server/tools/definitions/ | Tool definitions (*.tool.ts). |
src/mcp-server/resources/definitions/ | Resource definitions (*.resource.ts). |
src/services/arxiv/ | ArxivService — live arXiv API client (search, metadata, HTML). |
src/services/arxiv/mirror/ | Optional OAI-PMH mirror — harvester, SQLite + FTS5 store, query translator, runner. |
src/config/ | Environment variable parsing and validation with Zod. |
scripts/arxiv-mirror-*.ts | Mirror lifecycle scripts (init, refresh, verify). |
tests/ | Unit and integration tests. |
docs/ | Design document and directory structure. |
See CLAUDE.md for development guidelines and architectural rules. The short version:
try/catch in tool logicctx.log for domain-specific loggingArxivService — don't add per-tool delaysIssues and pull requests are welcome. Run checks before submitting:
bun run devcheck
bun test
Apache-2.0 — see LICENSE for details.
ARXIV_API_BASE_URLdefault: https://export.arxiv.org/apiarXiv API base URL.
ARXIV_REQUEST_DELAY_MSdefault: 3000Minimum delay between arXiv API requests (ms).
ARXIV_CONTENT_TIMEOUT_MSdefault: 30000Timeout for HTML content fetches (ms).
ARXIV_API_TIMEOUT_MSdefault: 15000Timeout for API search/metadata requests (ms).
MCP_LOG_LEVELdefault: infoSets the minimum log level for output (e.g., 'debug', 'info', 'warn').
MCP_HTTP_HOSTdefault: 127.0.0.1The hostname for the HTTP server.
MCP_HTTP_PORTdefault: 3010The port to run the HTTP server on.
MCP_HTTP_ENDPOINT_PATHdefault: /mcpThe endpoint path for the MCP server.
MCP_AUTH_MODEdefault: noneAuthentication mode to use: 'none', 'jwt', or 'oauth'.
com.mcparmory/google-search
io.github.pipeworx-io/brave-search
marcopesani/mcp-server-serper
brave/brave-search-mcp-server
com.mcparmory/google-search-console
acamolese/google-search-console-mcp