If you need Claude to scrape JavaScript-heavy sites without spinning up a full Chrome instance, this is a Rust-based headless browser that exposes extract and fetch operations over MCP. It runs V8 for JS execution, strips navigation and ads to isolate main content, and returns clean markdown or structured JSON. The extract command handles SPAs with a delay flag for hydration, caps output with max-chars to fit context windows, and includes a stealth mode that randomizes fingerprints and blocks trackers. You can also run it as a CDP server on port 9222 and connect Puppeteer or Playwright if you need programmatic control outside of Claude.
Cloaked headless browser for AI agents.
Lightweight, stealthy, built in Rust. Based on Obscura.
Kloakt is a headless browser built for AI agents. It runs JavaScript via V8, extracts clean markdown from any page (including SPAs), and exposes tools via MCP for Claude Code and other AI systems.
| Metric | Kloakt | Headless Chrome |
|---|---|---|
| Memory | 30 MB | 200+ MB |
| Binary size | 70 MB | 300+ MB |
| Anti-detect | Built-in | None |
| Page load | 85 ms | ~500 ms |
| Startup | Instant | ~2s |
| SPA extract | Yes | Manual |
git clone https://github.com/KultMember6Banger/kloakt.git
cd kloakt
cargo build --release
# With stealth mode (anti-detection + tracker blocking)
cargo build --release --features stealth
Requires Rust 1.75+ (rustup.rs). First build takes ~5 min (V8 compiles from source, cached after).
# Clean markdown from any page
kloakt extract https://example.com --main
# Structured JSON with metadata
kloakt extract https://example.com --main --json
# Cap output for agent context windows
kloakt extract https://en.wikipedia.org/wiki/Rust --main --json --max-chars 3000
# Wait for SPA hydration
kloakt extract https://example.com --delay 2000 --json
# Get the page title
kloakt fetch https://example.com --eval "document.title"
# Extract all links
kloakt fetch https://example.com --dump links
# Render JavaScript and dump markdown
kloakt fetch https://news.ycombinator.com --dump markdown
# Wait for dynamic content
kloakt fetch https://example.com --wait-until networkidle0
kloakt serve --port 9222
# With stealth mode
kloakt serve --port 9222 --stealth
kloakt scrape url1 url2 url3 ... \
--concurrency 25 \
--eval "document.querySelector('h1').textContent" \
--format json
The extract command uses a multi-phase pipeline optimized for AI agents:
Works on static HTML, server-rendered pages, and pure client-side SPAs (React, Vue, etc.).
from kloakt import extract, extract_fields, fetch, scrape
# Extract clean markdown
page = extract("https://example.com")
print(page.title, page.content, page.meta)
# Cap output length
page = extract("https://example.com", max_chars=3000)
# Wait for SPA content
page = extract("https://example.com", delay=2000)
# Structured field extraction via CSS selectors
data = extract_fields("https://news.ycombinator.com", {
"title": "title",
"stories": ".titleline > a[]", # [] => list of all matches
"links": ".titleline > a[]@href" # @href => an attribute
})
print(data["data"]["stories"])
# Raw fetch
html = fetch("https://example.com", dump="html")
title = fetch("https://example.com", eval_js="document.title")
# Parallel scrape
results = scrape(["https://a.com", "https://b.com"], concurrency=5)
Kloakt includes an MCP server for use as a Claude Code tool:
{
"mcpServers": {
"kloakt": {
"command": "python3",
"args": ["/path/to/kloakt/mcp_server.py"]
}
}
}
Exposes kloakt_extract and kloakt_fetch as native tools.
The CDP server embeds a per-session token in the WebSocket path (like Chrome). Connect via
browserURL so the client discovers the token from /json/version automatically — don't
hardcode the ws://.../devtools/browser path.
import puppeteer from 'puppeteer-core';
const browser = await puppeteer.connect({
browserURL: 'http://127.0.0.1:9222', // discovers the tokenized ws endpoint
});
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com');
const stories = await page.evaluate(() =>
Array.from(document.querySelectorAll('.titleline > a'))
.map(a => ({ title: a.textContent, url: a.href }))
);
await browser.disconnect();
import { chromium } from 'playwright-core';
const browser = await chromium.connectOverCDP({
endpointURL: 'http://127.0.0.1:9222', // discovers the tokenized ws endpoint
});
const page = await browser.newContext().then(ctx => ctx.newPage());
await page.goto('https://en.wikipedia.org/wiki/Web_scraping');
console.log(await page.title());
await browser.close();
Enable with --features stealth.
navigator.userAgentData (Chrome 145, high-entropy values)event.isTrusted = true for dispatched eventsFunction.prototype.toString() → [native code])navigator.webdriver = undefinedkloakt extract <URL>| Flag | Default | Description |
|---|---|---|
--format | markdown | Output: markdown, text, or links |
--main | off | Strip nav, header, footer, sidebar |
--json | off | Structured JSON: title, URL, content, meta |
--max-chars | unlimited | Truncate content to N characters |
--delay | 0 | Extra ms to wait after load |
--stealth | off | Anti-detection mode |
--selector | — | Wait for CSS selector |
--wait-until | load | load, domcontentloaded, networkidle0 (bounded by --wait) |
--schema | — | Extract structured fields as JSON (see below) |
--har | — | Write captured network activity to a HAR file |
--cache-ttl | 0 | Cache the result on disk and reuse it for N seconds |
--schemaPass a JSON object mapping field names to CSS selectors. Suffix a selector with [] to
return all matches as a list, and with @attr to return an attribute instead of text:
kloakt extract https://news.ycombinator.com \
--schema '{"title":"title","stories":".titleline > a[]","first_link":".titleline > a@href"}'
# => { "url": ..., "data": { "title": "...", "stories": [...], "first_link": "..." }, "elapsed_ms": ... }
This is also exposed through the MCP kloakt_extract tool via an optional schema argument.
kloakt fetch <URL>| Flag | Default | Description |
|---|---|---|
--dump | html | Output: html, text, links, markdown |
--eval | — | JavaScript expression to evaluate |
--wait-until | load | Wait condition |
--selector | — | Wait for CSS selector |
--stealth | off | Anti-detection mode |
--quiet | off | Suppress banner |
kloakt serve| Flag | Default | Description |
|---|---|---|
--port | 9222 | WebSocket port |
--proxy | — | HTTP/SOCKS5 proxy URL |
--stealth | off | Anti-detection + tracker blocking |
--workers | 1 | Parallel workers |
kloakt scrape <URL...>| Flag | Default | Description |
|---|---|---|
--concurrency | 10 | Parallel workers |
--eval | — | JS expression per page |
--format | json | Output: json or text |
kloakt benchmark <URL...>Measure load performance per URL — average/min/max load time, request count, bytes, and DOM
node count — as a table or --json.
kloakt benchmark https://example.com https://news.ycombinator.com --runs 3
| Flag | Default | Description |
|---|---|---|
--runs | 1 | Runs per URL (reports the average) |
--json | off | Emit JSON instead of a table |
--wait-until | load | load, domcontentloaded, or networkidle0 |
kloakt extract --json includes a "challenge" field reporting a detected captcha or bot
wall (recaptcha, hcaptcha, turnstile, cloudflare, datadome, perimeterx) or
null. This is detection only — kloakt tells you a page is gated so an agent can stop
and back off; it does not attempt to solve or evade challenges. (Also surfaced via the MCP
kloakt_extract output and the Python Page.challenge field.)
| Flag | Default | Description |
|---|---|---|
--obey-robots | off | Respect robots.txt — refuse to fetch disallowed paths |
--allow-private | off | Allow private/internal/loopback hosts (disables the SSRF guard) |
Security note: by default kloakt refuses to fetch private, loopback, link-local, and cloud-metadata addresses (SSRF protection), and rejects
file://URLs. Use--allow-privateonly when you intentionally need to reach internal services. The CDP server binds to127.0.0.1and validates theHostheader to block DNS-rebinding.
Full Chrome DevTools Protocol support for Puppeteer/Playwright compatibility.
| Domain | Methods |
|---|---|
| Target | createTarget, closeTarget, attachToTarget, createBrowserContext, disposeBrowserContext |
| Page | navigate, getFrameTree, addScriptToEvaluateOnNewDocument, lifecycleEvents |
| Runtime | evaluate, callFunctionOn, getProperties, addBinding |
| DOM | getDocument, querySelector, querySelectorAll, getOuterHTML, resolveNode |
| Network | enable, setCookies, getCookies, setExtraHTTPHeaders, setUserAgentOverride |
| Fetch | enable, continueRequest, fulfillRequest, failRequest |
| Storage | getCookies, setCookies, deleteCookies |
| Input | dispatchMouseEvent, dispatchKeyEvent |
Apache 2.0 — Based on Obscura by h4ckf0r0day.