CRW Web Scraper

158STDIOregistry active

Summary

A self-hosted Rust crawler that gives Claude direct access to scrape, crawl, extract, map, and search operations through a Firecrawl-compatible API. The MCP server runs embedded with no external dependencies, idles at roughly 50 MB RAM, and exposes the same /v1/scrape and /v1/crawl endpoints you'd hit over REST. Reach for this when you need structured data extraction from websites during agent workflows without spinning up Node.js or browser sidecars. It ships as a single static binary you can run via npx crw-mcp for zero-install MCP integration, or point at api.fastcrw.com if you want managed infrastructure. The AGPL license means you can self-host free or pay for the hosted version to avoid license obligations.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

fastCRW

Self-hosted, Rust-native web crawler & scraper for AI agents

The open-source alternative to Firecrawl. One static binary, ~50 MB RAM idle, Firecrawl-compatible REST API on both /v1/* and /v2/* (scrape, crawl, map, search, extract, plus v2 batch & parse) — a drop-in for the official Firecrawl SDKs — plus first-class MCP. Self-host free under AGPL-3.0, or hit our managed API at api.fastcrw.com. Reproducible 63.74% truth-recall on the public 1,000-URL dataset (diagnose_3way.py, 2026-05-08) — see fastcrw.com/benchmarks. Built in Rust because every millisecond of agent latency compounds.

Works with: Claude Code · Cursor · Windsurf · Cline · Copilot · Continue.dev · Codex · Gemini CLI

fastCRW vs Crawl4AI vs Firecrawl on Firecrawl's public 1,000-URL dataset — fastCRW leads on truth-recall, p50, and fast-mode p90 latency

_{Firecrawl's own 1,000-URL public dataset (diagnose_3way.py) — fastCRW leads on truth-recall, median latency, and fast-mode p90. Full numbers and one-command repro ↓}

Why fastCRW?

Rust-native, single static binary — no Redis, no Node.js, no Python venv, no headless-browser sidecar in the request path. One binary, one config file, one process.
~50 MB RAM idle — leaves headroom on a $5 VPS. Browser-render-first stacks (Firecrawl, Crawl4AI) carry a Chromium heap baseline measured in hundreds of MB before a single request lands.
Firecrawl-compatible drop-in — both the /v1/* and /v2/* surfaces (scrape, crawl, map, search, extract; plus v2-only batch & parse) with compatible request/response shapes. The v2 API is a drop-in for the official firecrawl-py v4 SDK (FirecrawlApp(api_url="https://api.fastcrw.com")) — swap the base URL and keep your code.
Change tracking & monitoring — diff a page against a prior snapshot (markdown git-diff, per-field JSON, or both) with an optional LLM "meaningful-change" judge. Stateless changeTracking primitive in the engine; scheduled monitors + signed-webhook/email alerts on the managed platform. See the Monitoring docs.
AGPL-3.0 open core + managed option — self-host free, or point at api.fastcrw.com for managed proxy network, dashboard, and SLA without the AGPL obligations on your application code.

Comparison Table

Qualitative positioning vs. the three most-cited alternatives. Numerical claims trace to the inline sources noted; everything else is descriptive.

	fastCRW	Firecrawl	Crawl4AI	Spider
Language	Rust	Node.js + Playwright	Python + Playwright	Rust
License	AGPL-3.0 (commercial avail.)	AGPL-3.0 (commercial avail.)	Apache-2.0	Source-available / commercial (spider.cloud)
Self-host install size	Single static binary (~8 MB)	Multi-container (~500 MB+ image)	~2 GB image (browser bundled)	Managed-first; self-host via crate
Memory baseline (idle)	~50 MB	Large (Chromium heap)	Large (Chromium heap)	Light (Rust)
Firecrawl-compat API	Yes — v1 + v2 (`/v1/` and `/v2/`)	Native	No	No
MCP server	Built-in (`crw-mcp`)	Separate package	Community add-on	No first-party
Hosted option	`api.fastcrw.com` (BYOK or managed)	firecrawl.dev	None official	spider.cloud (primary product)
Reproducible public benchmark	Yes — 63.74% truth-recall on 1,000-URL dataset (`diagnose_3way.py`, 2026-05-08)	Vendor-published only	Vendor-published only	Vendor-published only

Pricing/spec cells where claimed link to the vendor page; everything else is the qualitative architectural shape, not a comparison number.

Quickstart

Hit the managed API at api.fastcrw.com, or self-host the same binary.

# /v1/scrape — URL → markdown / HTML / JSON / links
curl -X POST https://api.fastcrw.com/v1/scrape \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","formats":["markdown"]}'

# /v1/scrape + formats:["json"] — structured JSON extraction via a JSON Schema
curl -X POST https://api.fastcrw.com/v1/scrape \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url":"https://example.com",
    "formats":["json"],
    "jsonSchema":{
      "type":"object",
      "properties":{"title":{"type":"string"}}
    }
  }'

# /v1/crawl — async multi-page job (returns a job id; poll with /v1/crawl/:id)
curl -X POST https://api.fastcrw.com/v1/crawl \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://docs.example.com","maxDepth":2,"maxPages":50}'

# Self-host (no auth, localhost) — single docker command
docker run -p 3000:3000 ghcr.io/us/crw
curl http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com"}'

Other install paths (each documented under Install further down):

npx crw-mcp                           # zero install — runs the embedded engine
pip install crw                        # Python SDK (auto-downloads binary)
brew install us/crw/crw                # Homebrew
cargo install crw-cli                  # Cargo
curl -fsSL https://fastcrw.com/install | sh

Why Rust?

Cold start is sub-second and the resident memory ceiling is bounded by the crawl queue, not by a JavaScript runtime or a headless browser parked in the background. An agent that issues N scrapes per task pays the network floor N times — anything you add on top (process spawn, JIT warmup, browser navigation overhead) multiplies. Pushing the request-path language down to Rust strips that surcharge out of every call. The same property lets one static binary saturate a $5 VPS instead of needing a multi-container compose stack, which is why the idle footprint is in the tens of MB rather than the hundreds.

MCP + SDK quickstart

fastCRW ships a built-in MCP server so any MCP-compatible agent (Claude Code, Cursor, Windsurf, Cline, Continue.dev, Codex, Gemini CLI) can call scraping tools without bespoke glue. Embedded mode runs the engine in-process — no server, no API key, no setup. The crw Python SDK and the crw-mcp Node binary both shell to the same Rust core.

npm install -g crw-mcp          # MCP server (Node wrapper)
pip install crw                 # Python SDK (auto-downloads binary)
claude mcp add crw -- npx -y crw-mcp                                       # Claude Code, embedded
claude mcp add crw \
  -e CRW_API_URL=https://api.fastcrw.com -e CRW_API_KEY=… \
  -- npx -y crw-mcp                                                        # Claude Code, managed

Per-client config recipes (Claude Desktop, Cursor, Windsurf, Cline, Continue.dev) live under docs.fastcrw.com/mcp-clients/.

Agent Skills

Beyond raw MCP tools, fastCRW ships a set of agent skills — reusable instruction packs that teach AI coding agents when and how to scrape, crawl, map, search, parse, extract, and change-track the web. Install into any agent (Claude Code, Codex, Cursor, OpenCode, Gemini CLI, Windsurf + more) with one command:

npx skills add us/crw                 # all 12 skills, into every detected agent
npx skills add us/crw@crw-scrape      # just one
npx skills add -g us/crw              # global (user-level)

Tier	Skills
Core / verb ladder	`crw` (hub) · `crw-search` · `crw-scrape` · `crw-map` · `crw-crawl` · `crw-parse` · `crw-extract` · `crw-watch`
Quality / meta	`crw-dynamic-search` (context-isolating filter — the biggest token-saver) · `crw-best-practices`
Migration / ops	`crw-migrate` (one-line Firecrawl `base_url` swap) · `crw-self-host`

The skills drive the crw CLI, the crw-mcp tools, or the REST API — pick whatever surface you have. No API key needed for self-hosted search (SearXNG). Full catalog and per-skill docs: skills/. Also packaged as a plugin marketplace for Claude Code, Codex, and Cursor (.claude-plugin/, .codex-plugin/, .cursor-plugin/).

Self-host vs Managed

	Self-host (free)	Managed — `api.fastcrw.com`
Best when	You want full data residency, AGPL is fine, you can run your own proxy strategy, latency to your infra matters more than ours.	You want zero infra, a global proxy network, a dashboard, usage metering, and AGPL carve-out for closed-source product code.
Install	`docker run -p 3000:3000 ghcr.io/us/crw` or `cargo install crw-server`.	Sign up at fastcrw.com — 500 free credits, no card.
Search	Bundled SearXNG sidecar (`docker compose up`).	Managed search backend.
Proxy rotation	Bring your own pool (`proxy_list` + `proxy_rotation`: round_robin / random / sticky_per_host) — rotated across the HTTP and JS/Chrome paths for scrape, crawl, and map; per-request BYOP supported. LightPanda can't proxy, so it's skipped (fail-closed) when a proxy is active.	Managed proxy network.
Cost	$0 + your hosting bill.	From $13/mo; pricing on fastcrw.com/pricing.
License obligations	AGPL-3.0 applies if you expose the API to third parties.	AGPL carve-out included.

The binary is the same in both modes — you can develop against your self-hosted instance and ship to managed without code changes.

Install

MCP server (`crw-mcp`) — recommended for AI agents

npx crw-mcp                              # zero install (npm)
pip install crw                          # Python SDK (auto-downloads binary)
brew install us/crw/crw-mcp              # Homebrew
cargo install crw-mcp                    # Cargo (full embedded, ~17 MB)
docker run -i ghcr.io/us/crw crw-mcp     # Docker

Lean browser-free proxy build (~4.2 MB, no headless browser engine — proxy/cloud mode only):

cargo build --profile release-small --no-default-features -p crw-mcp

CLI (`crw`) — scrape URLs from your terminal

brew install us/crw/crw

# One-line install (auto-detects OS & arch):
curl -fsSL https://fastcrw.com/install | CRW_BINARY=crw sh

# APT (Debian/Ubuntu):
curl -fsSL https://apt.fastcrw.com/gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/crw.gpg
echo "deb [signed-by=/usr/share/keyrings/crw.gpg] https://apt.fastcrw.com stable main" \
  | sudo tee /etc/apt/sources.list.d/crw.list
sudo apt update && sudo apt install crw

cargo install crw-cli

API server (`crw-server`) — Firecrawl-compatible REST API

For serving multiple apps, other languages (Node.js, Go, Java), or as a shared microservice.

brew install us/crw/crw-server

# One-line install:
curl -fsSL https://fastcrw.com/install | CRW_BINARY=crw-server sh

# Docker:
docker run -p 3000:3000 ghcr.io/us/crw

Docker Compose ships with lightpanda by default; chrome is opt-in:

docker compose up -d                                         # http + lightpanda
docker compose --profile heavy up -d                         # + chrome failover
docker compose -f docker-compose.yml \
  -f docker-compose.stealth.yml --profile stealth up -d      # browserless stealth tier

There's also an optional Camoufox stealth tier (REST sidecar, opt-in) for fingerprint-blocked targets the CDP renderers can't pass — off by default and never touches the auto chain unless you turn it on. Build with --features camoufox; see JS rendering → Camoufox.

See the self-hosting guide for production hardening, auth, reverse proxy, and resource tuning.

API endpoints

Method	Endpoint	Description
`POST`	`/v1/scrape`	Scrape a single URL, optionally with LLM extraction or summary
`POST`	`/v1/crawl`	Start async BFS crawl (returns job ID)
`GET`	`/v1/crawl/:id`	Check crawl status and retrieve results
`DELETE`	`/v1/crawl/:id`	Cancel a running crawl job
`POST`	`/v1/map`	Discover all URLs on a site
`POST`	`/v1/search`	Web search via SearXNG sidecar, with optional content scraping
`POST`	`/v1/change-tracking/diff`	Diff a scrape against a supplied snapshot (the monitoring primitive) — single or batch
`GET`	`/health`	Health check (no auth required)
`POST`	`/mcp`	Streamable HTTP MCP transport

Firecrawl v2 surface — scrape, crawl, map, search are also served under /v2/* with Firecrawl v2 request/response shapes, plus v2-only POST /v2/batch/scrape, POST /v2/parse (PDF/doc → markdown), and GET /v2/crawl/active. This makes the official firecrawl-py v4 SDK a drop-in: FirecrawlApp(api_url="https://api.fastcrw.com").

Full reference at docs.fastcrw.com/#rest-api. The Firecrawl compatibility matrix (field-by-field diff) lives in COMPATIBILITY-firecrawl.md.

Benchmark

Reproduce it yourself first — the canonical harness is diagnose_3way.py (matches truth text against md + strip_md_links(md), applied identically to all three tools — a fairness control, not a looser number):

cd ~/coding/crw/crw-opencore
docker compose -f docker-compose.yml -f docker-compose.override.yml \
               -f docker-compose.stealth.yml --profile stealth up -d
docker start crawl4ai-bench
cd ~/coding/crw/competitors/firecrawl && docker compose up -d

cd ~/coding/crw/crw-opencore
uv run python bench/diagnose_3way.py \
  --max-urls 1000 --tools crw,crawl4ai,firecrawl \
  --concurrency 5 --timeout 120 \
  --out bench/server-runs/diag3w-1000-full.jsonl

3-way scrape benchmark, full 1,000-URL run on Firecrawl's scrape-content-dataset-v1 (diagnose_3way.py, 2026-05-08, concurrency 5, timeout 120s):

Metric	fastCRW	crawl4ai	Firecrawl
Truth-recall (522/819 labeled URLs) ^{recall mode}	63.74%	59.95%	56.04%
p50 latency	1914 ms	1916 ms	2305 ms
p90 latency ^{fast mode}	4348 ms	4754 ms	6937 ms
Thrown errors (3,000 requests)	0	0	0

fastCRW leads on every axis — top truth-recall, fastest median, and the lowest p90 tail — with 0 thrown errors across all 3,000 requests, and it uniquely recovers 34 URLs the other two miss (70% more than crawl4ai and Firecrawl combined). The 63.74% denominator is 819 labeled/matchable URLs, not 3,000 requests, not 1,000.

Two tunable modes, one engine, one config toggle. Recall mode (the full ladder) maximizes truth-recall, recovering the long tail of hard pages the others miss. Fast mode (LightPanda-only, no Chrome tier) optimizes the latency tail — p90 4348 ms, the lowest of the three (diagnose_3way.py, N=1000). Same binary, same API; pick accuracy or latency per workload.

Full result of record: bench/server-runs/RESULT_3WAY_1000_FULL.md.

SDKs and integrations

Python

pip install crw

from crw import CrwClient

# Managed (includes web search):
client = CrwClient(api_url="https://api.fastcrw.com", api_key="YOUR_API_KEY")
# Local (embedded, no server needed):
# client = CrwClient()

result = client.scrape("https://example.com", formats=["markdown", "links"])
pages = client.crawl("https://docs.example.com", max_depth=2, max_pages=50)
urls = client.map("https://example.com")
results = client.search("AI news", limit=10, sources=["web", "news"])

Requires Python 3.10+. Local mode auto-downloads crw-mcp on first use.

Framework extras:

pip install crw[crewai]    # CRW scraping tools for CrewAI agents
pip install crw[langchain] # CRW document loader for LangChain

TypeScript / Node.js

npm install crw-sdk

SDK examples →

Frameworks & platforms

CrewAI · LangChain · Agno · Dify · n8n · Flowise

All integrations →

Architecture

┌─────────────────────────────────────────────┐
│                 crw-server                  │
│         Axum HTTP API + Auth + MCP          │
├──────────┬──────────┬───────────────────────┤
│ crw-crawl│crw-extract│    crw-renderer      │
│ BFS crawl│ HTML→MD   │  HTTP + CDP(WS)      │
│ robots   │ LLM/JSON  │  LightPanda/Chrome   │
│ sitemap  │ clean/read│  auto-detect SPA     │
├──────────┴──────────┴───────────────────────┤
│                 crw-core                    │
│        Types, Config, Errors                │
└─────────────────────────────────────────────┘

Crate	Description
`crw-core`	Core types, config, and error handling
`crw-renderer`	HTTP + CDP browser rendering engine
`crw-extract`	HTML → markdown/plaintext extraction
`crw-crawl`	Async BFS crawler with robots.txt & sitemap
`crw-server`	Axum API server (Firecrawl-compatible)
`crw-mcp`	MCP stdio server (embedded + proxy mode)
`crw-cli`	Standalone CLI (`crw` binary, no server)

Full architecture docs →

Security

SSRF protection — blocks loopback, private IPs, cloud metadata (169.254.x.x), IPv6 mapped addresses, and non-HTTP schemes (file://, data:)
Auth — optional Bearer token with constant-time comparison
robots.txt — RFC 9309 compliant with wildcard patterns
Rate limiting — token-bucket algorithm, returns 429 with error_code
Resource limits — max body 1 MB, max crawl depth 10, max pages 1,000

Full security docs →

Contributing

Contributions are welcome — issues and PRs both.

Fork the repository
Install pre-commit hooks: make hooks
Create your feature branch (git checkout -b feat/my-feature)
Commit your changes (git commit -m 'feat: add my feature')
Push to the branch (git push origin feat/my-feature)
Open a Pull Request

The pre-commit hook runs the same checks as CI (cargo fmt, cargo clippy, cargo test). Run manually with make check.

Contributors

License

fastCRW is open source under AGPL-3.0. If you embed fastCRW in a closed-source product or expose it as a hosted service to third parties and you can't comply with AGPL's source-availability requirements, the managed offering at fastcrw.com includes a commercial carve-out, and standalone commercial licenses are available on request — write to hello@fastcrw.com.

Star History

It is the sole responsibility of end users to respect websites' policies when scraping. Users are advised to adhere to applicable privacy policies and terms of use. By default, fastCRW respects robots.txt directives.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

fastCRW

Self-hosted, Rust-native web crawler & scraper for AI agents

Works with: Claude Code · Cursor · Windsurf · Cline · Copilot · Continue.dev · Codex · Gemini CLI

fastCRW vs Crawl4AI vs Firecrawl on Firecrawl's public 1,000-URL dataset — fastCRW leads on truth-recall, p50, and fast-mode p90 latency

_{Firecrawl's own 1,000-URL public dataset (diagnose_3way.py) — fastCRW leads on truth-recall, median latency, and fast-mode p90. Full numbers and one-command repro ↓}

Why fastCRW?

Rust-native, single static binary — no Redis, no Node.js, no Python venv, no headless-browser sidecar in the request path. One binary, one config file, one process.
~50 MB RAM idle — leaves headroom on a $5 VPS. Browser-render-first stacks (Firecrawl, Crawl4AI) carry a Chromium heap baseline measured in hundreds of MB before a single request lands.
Firecrawl-compatible drop-in — both the /v1/* and /v2/* surfaces (scrape, crawl, map, search, extract; plus v2-only batch & parse) with compatible request/response shapes. The v2 API is a drop-in for the official firecrawl-py v4 SDK (FirecrawlApp(api_url="https://api.fastcrw.com")) — swap the base URL and keep your code.
Change tracking & monitoring — diff a page against a prior snapshot (markdown git-diff, per-field JSON, or both) with an optional LLM "meaningful-change" judge. Stateless changeTracking primitive in the engine; scheduled monitors + signed-webhook/email alerts on the managed platform. See the Monitoring docs.
AGPL-3.0 open core + managed option — self-host free, or point at api.fastcrw.com for managed proxy network, dashboard, and SLA without the AGPL obligations on your application code.

Comparison Table

Qualitative positioning vs. the three most-cited alternatives. Numerical claims trace to the inline sources noted; everything else is descriptive.

	fastCRW	Firecrawl	Crawl4AI	Spider
Language	Rust	Node.js + Playwright	Python + Playwright	Rust
License	AGPL-3.0 (commercial avail.)	AGPL-3.0 (commercial avail.)	Apache-2.0	Source-available / commercial (spider.cloud)
Self-host install size	Single static binary (~8 MB)	Multi-container (~500 MB+ image)	~2 GB image (browser bundled)	Managed-first; self-host via crate
Memory baseline (idle)	~50 MB	Large (Chromium heap)	Large (Chromium heap)	Light (Rust)
Firecrawl-compat API	Yes — v1 + v2 (`/v1/` and `/v2/`)	Native	No	No
MCP server	Built-in (`crw-mcp`)	Separate package	Community add-on	No first-party
Hosted option	`api.fastcrw.com` (BYOK or managed)	firecrawl.dev	None official	spider.cloud (primary product)
Reproducible public benchmark	Yes — 63.74% truth-recall on 1,000-URL dataset (`diagnose_3way.py`, 2026-05-08)	Vendor-published only	Vendor-published only	Vendor-published only

Pricing/spec cells where claimed link to the vendor page; everything else is the qualitative architectural shape, not a comparison number.

Quickstart

Hit the managed API at api.fastcrw.com, or self-host the same binary.

# /v1/scrape — URL → markdown / HTML / JSON / links
curl -X POST https://api.fastcrw.com/v1/scrape \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","formats":["markdown"]}'

# /v1/scrape + formats:["json"] — structured JSON extraction via a JSON Schema
curl -X POST https://api.fastcrw.com/v1/scrape \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url":"https://example.com",
    "formats":["json"],
    "jsonSchema":{
      "type":"object",
      "properties":{"title":{"type":"string"}}
    }
  }'

# /v1/crawl — async multi-page job (returns a job id; poll with /v1/crawl/:id)
curl -X POST https://api.fastcrw.com/v1/crawl \
  -H "Authorization: Bearer $CRW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://docs.example.com","maxDepth":2,"maxPages":50}'

# Self-host (no auth, localhost) — single docker command
docker run -p 3000:3000 ghcr.io/us/crw
curl http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com"}'

Other install paths (each documented under Install further down):

npx crw-mcp                           # zero install — runs the embedded engine
pip install crw                        # Python SDK (auto-downloads binary)
brew install us/crw/crw                # Homebrew
cargo install crw-cli                  # Cargo
curl -fsSL https://fastcrw.com/install | sh

Why Rust?

MCP + SDK quickstart

npm install -g crw-mcp          # MCP server (Node wrapper)
pip install crw                 # Python SDK (auto-downloads binary)
claude mcp add crw -- npx -y crw-mcp                                       # Claude Code, embedded
claude mcp add crw \
  -e CRW_API_URL=https://api.fastcrw.com -e CRW_API_KEY=… \
  -- npx -y crw-mcp                                                        # Claude Code, managed

Per-client config recipes (Claude Desktop, Cursor, Windsurf, Cline, Continue.dev) live under docs.fastcrw.com/mcp-clients/.

Agent Skills

npx skills add us/crw                 # all 12 skills, into every detected agent
npx skills add us/crw@crw-scrape      # just one
npx skills add -g us/crw              # global (user-level)

Tier	Skills
Core / verb ladder	`crw` (hub) · `crw-search` · `crw-scrape` · `crw-map` · `crw-crawl` · `crw-parse` · `crw-extract` · `crw-watch`
Quality / meta	`crw-dynamic-search` (context-isolating filter — the biggest token-saver) · `crw-best-practices`
Migration / ops	`crw-migrate` (one-line Firecrawl `base_url` swap) · `crw-self-host`

Self-host vs Managed

	Self-host (free)	Managed — `api.fastcrw.com`
Best when	You want full data residency, AGPL is fine, you can run your own proxy strategy, latency to your infra matters more than ours.	You want zero infra, a global proxy network, a dashboard, usage metering, and AGPL carve-out for closed-source product code.
Install	`docker run -p 3000:3000 ghcr.io/us/crw` or `cargo install crw-server`.	Sign up at fastcrw.com — 500 free credits, no card.
Search	Bundled SearXNG sidecar (`docker compose up`).	Managed search backend.
Proxy rotation	Bring your own pool (`proxy_list` + `proxy_rotation`: round_robin / random / sticky_per_host) — rotated across the HTTP and JS/Chrome paths for scrape, crawl, and map; per-request BYOP supported. LightPanda can't proxy, so it's skipped (fail-closed) when a proxy is active.	Managed proxy network.
Cost	$0 + your hosting bill.	From $13/mo; pricing on fastcrw.com/pricing.
License obligations	AGPL-3.0 applies if you expose the API to third parties.	AGPL carve-out included.

The binary is the same in both modes — you can develop against your self-hosted instance and ship to managed without code changes.

Install

MCP server (`crw-mcp`) — recommended for AI agents

npx crw-mcp                              # zero install (npm)
pip install crw                          # Python SDK (auto-downloads binary)
brew install us/crw/crw-mcp              # Homebrew
cargo install crw-mcp                    # Cargo (full embedded, ~17 MB)
docker run -i ghcr.io/us/crw crw-mcp     # Docker

Lean browser-free proxy build (~4.2 MB, no headless browser engine — proxy/cloud mode only):

cargo build --profile release-small --no-default-features -p crw-mcp

CLI (`crw`) — scrape URLs from your terminal

brew install us/crw/crw

# One-line install (auto-detects OS & arch):
curl -fsSL https://fastcrw.com/install | CRW_BINARY=crw sh

# APT (Debian/Ubuntu):
curl -fsSL https://apt.fastcrw.com/gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/crw.gpg
echo "deb [signed-by=/usr/share/keyrings/crw.gpg] https://apt.fastcrw.com stable main" \
  | sudo tee /etc/apt/sources.list.d/crw.list
sudo apt update && sudo apt install crw

cargo install crw-cli

API server (`crw-server`) — Firecrawl-compatible REST API

For serving multiple apps, other languages (Node.js, Go, Java), or as a shared microservice.

brew install us/crw/crw-server

# One-line install:
curl -fsSL https://fastcrw.com/install | CRW_BINARY=crw-server sh

# Docker:
docker run -p 3000:3000 ghcr.io/us/crw

Docker Compose ships with lightpanda by default; chrome is opt-in:

docker compose up -d                                         # http + lightpanda
docker compose --profile heavy up -d                         # + chrome failover
docker compose -f docker-compose.yml \
  -f docker-compose.stealth.yml --profile stealth up -d      # browserless stealth tier

See the self-hosting guide for production hardening, auth, reverse proxy, and resource tuning.

API endpoints

Method	Endpoint	Description
`POST`	`/v1/scrape`	Scrape a single URL, optionally with LLM extraction or summary
`POST`	`/v1/crawl`	Start async BFS crawl (returns job ID)
`GET`	`/v1/crawl/:id`	Check crawl status and retrieve results
`DELETE`	`/v1/crawl/:id`	Cancel a running crawl job
`POST`	`/v1/map`	Discover all URLs on a site
`POST`	`/v1/search`	Web search via SearXNG sidecar, with optional content scraping
`POST`	`/v1/change-tracking/diff`	Diff a scrape against a supplied snapshot (the monitoring primitive) — single or batch
`GET`	`/health`	Health check (no auth required)
`POST`	`/mcp`	Streamable HTTP MCP transport

Full reference at docs.fastcrw.com/#rest-api. The Firecrawl compatibility matrix (field-by-field diff) lives in COMPATIBILITY-firecrawl.md.

Benchmark

cd ~/coding/crw/crw-opencore
docker compose -f docker-compose.yml -f docker-compose.override.yml \
               -f docker-compose.stealth.yml --profile stealth up -d
docker start crawl4ai-bench
cd ~/coding/crw/competitors/firecrawl && docker compose up -d

cd ~/coding/crw/crw-opencore
uv run python bench/diagnose_3way.py \
  --max-urls 1000 --tools crw,crawl4ai,firecrawl \
  --concurrency 5 --timeout 120 \
  --out bench/server-runs/diag3w-1000-full.jsonl

3-way scrape benchmark, full 1,000-URL run on Firecrawl's scrape-content-dataset-v1 (diagnose_3way.py, 2026-05-08, concurrency 5, timeout 120s):

Metric	fastCRW	crawl4ai	Firecrawl
Truth-recall (522/819 labeled URLs) ^{recall mode}	63.74%	59.95%	56.04%
p50 latency	1914 ms	1916 ms	2305 ms
p90 latency ^{fast mode}	4348 ms	4754 ms	6937 ms
Thrown errors (3,000 requests)	0	0	0

Full result of record: bench/server-runs/RESULT_3WAY_1000_FULL.md.

SDKs and integrations

Python

pip install crw

from crw import CrwClient

# Managed (includes web search):
client = CrwClient(api_url="https://api.fastcrw.com", api_key="YOUR_API_KEY")
# Local (embedded, no server needed):
# client = CrwClient()

result = client.scrape("https://example.com", formats=["markdown", "links"])
pages = client.crawl("https://docs.example.com", max_depth=2, max_pages=50)
urls = client.map("https://example.com")
results = client.search("AI news", limit=10, sources=["web", "news"])

Requires Python 3.10+. Local mode auto-downloads crw-mcp on first use.

Framework extras:

pip install crw[crewai]    # CRW scraping tools for CrewAI agents
pip install crw[langchain] # CRW document loader for LangChain

TypeScript / Node.js

npm install crw-sdk

SDK examples →

Frameworks & platforms

CrewAI · LangChain · Agno · Dify · n8n · Flowise

All integrations →

Architecture

┌─────────────────────────────────────────────┐
│                 crw-server                  │
│         Axum HTTP API + Auth + MCP          │
├──────────┬──────────┬───────────────────────┤
│ crw-crawl│crw-extract│    crw-renderer      │
│ BFS crawl│ HTML→MD   │  HTTP + CDP(WS)      │
│ robots   │ LLM/JSON  │  LightPanda/Chrome   │
│ sitemap  │ clean/read│  auto-detect SPA     │
├──────────┴──────────┴───────────────────────┤
│                 crw-core                    │
│        Types, Config, Errors                │
└─────────────────────────────────────────────┘

Crate	Description
`crw-core`	Core types, config, and error handling
`crw-renderer`	HTTP + CDP browser rendering engine
`crw-extract`	HTML → markdown/plaintext extraction
`crw-crawl`	Async BFS crawler with robots.txt & sitemap
`crw-server`	Axum API server (Firecrawl-compatible)
`crw-mcp`	MCP stdio server (embedded + proxy mode)
`crw-cli`	Standalone CLI (`crw` binary, no server)

Full architecture docs →

Security

SSRF protection — blocks loopback, private IPs, cloud metadata (169.254.x.x), IPv6 mapped addresses, and non-HTTP schemes (file://, data:)
Auth — optional Bearer token with constant-time comparison
robots.txt — RFC 9309 compliant with wildcard patterns
Rate limiting — token-bucket algorithm, returns 429 with error_code
Resource limits — max body 1 MB, max crawl depth 10, max pages 1,000

Full security docs →

Contributing

Contributions are welcome — issues and PRs both.

Fork the repository
Install pre-commit hooks: make hooks
Create your feature branch (git checkout -b feat/my-feature)
Commit your changes (git commit -m 'feat: add my feature')
Push to the branch (git push origin feat/my-feature)
Open a Pull Request

The pre-commit hook runs the same checks as CI (cargo fmt, cargo clippy, cargo test). Run manually with make check.

CRW Web Scraper

fastCRW

Why fastCRW?

Comparison Table

Quickstart

Why Rust?

MCP + SDK quickstart

Agent Skills

Self-host vs Managed

Install

MCP server (crw-mcp) — recommended for AI agents

CLI (crw) — scrape URLs from your terminal

API server (crw-server) — Firecrawl-compatible REST API

API endpoints

Benchmark

SDKs and integrations

Python

TypeScript / Node.js

Frameworks & platforms

Architecture

Security

Contributing

Contributors

License

Links

Star History

CRW Web Scraper

fastCRW

Why fastCRW?

Comparison Table

Quickstart

Why Rust?

MCP + SDK quickstart

Agent Skills

Self-host vs Managed

Install

MCP server (crw-mcp) — recommended for AI agents

CLI (crw) — scrape URLs from your terminal

API server (crw-server) — Firecrawl-compatible REST API

API endpoints

Benchmark

SDKs and integrations

Python

TypeScript / Node.js

Frameworks & platforms

Architecture

Security

Contributing

Contributors

License

Links

Star History

Related Search & Web Crawling MCP Servers

Related Search & Web Crawling MCP Servers

MCP server (`crw-mcp`) — recommended for AI agents

CLI (`crw`) — scrape URLs from your terminal

API server (`crw-server`) — Firecrawl-compatible REST API

MCP server (`crw-mcp`) — recommended for AI agents

CLI (`crw`) — scrape URLs from your terminal

API server (`crw-server`) — Firecrawl-compatible REST API