Kloakt

1STDIOregistry active

Summary

If you need Claude to scrape JavaScript-heavy sites without spinning up a full Chrome instance, this is a Rust-based headless browser that exposes extract and fetch operations over MCP. It runs V8 for JS execution, strips navigation and ads to isolate main content, and returns clean markdown or structured JSON. The extract command handles SPAs with a delay flag for hydration, caps output with max-chars to fit context windows, and includes a stealth mode that randomizes fingerprints and blocks trackers. You can also run it as a CDP server on port 9222 and connect Puppeteer or Playwright if you need programmatic control outside of Claude.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Kloakt

Cloaked headless browser for AI agents.
Lightweight, stealthy, built in Rust. Based on Obscura.

Kloakt is a headless browser built for AI agents. It runs JavaScript via V8, extracts clean markdown from any page (including SPAs), and exposes tools via MCP for Claude Code and other AI systems.

Why Kloakt?

Metric	Kloakt	Headless Chrome
Memory	30 MB	200+ MB
Binary size	70 MB	300+ MB
Anti-detect	Built-in	None
Page load	85 ms	~500 ms
Startup	Instant	~2s
SPA extract	Yes	Manual

Install

Build from source

git clone https://github.com/KultMember6Banger/kloakt.git
cd kloakt
cargo build --release

# With stealth mode (anti-detection + tracker blocking)
cargo build --release --features stealth

Requires Rust 1.75+ (rustup.rs). First build takes ~5 min (V8 compiles from source, cached after).

Quick Start

Extract content (AI agent use)

# Clean markdown from any page
kloakt extract https://example.com --main

# Structured JSON with metadata
kloakt extract https://example.com --main --json

# Cap output for agent context windows
kloakt extract https://en.wikipedia.org/wiki/Rust --main --json --max-chars 3000

# Wait for SPA hydration
kloakt extract https://example.com --delay 2000 --json

Fetch a page

# Get the page title
kloakt fetch https://example.com --eval "document.title"

# Extract all links
kloakt fetch https://example.com --dump links

# Render JavaScript and dump markdown
kloakt fetch https://news.ycombinator.com --dump markdown

# Wait for dynamic content
kloakt fetch https://example.com --wait-until networkidle0

Start the CDP server

kloakt serve --port 9222

# With stealth mode
kloakt serve --port 9222 --stealth

Scrape in parallel

kloakt scrape url1 url2 url3 ... \
  --concurrency 25 \
  --eval "document.querySelector('h1').textContent" \
  --format json

Smart Extraction

The extract command uses a multi-phase pipeline optimized for AI agents:

Noise removal — strips cookie banners, ads, popups, nav, social widgets
Content scoring — text-density algorithm (Readability-like) finds the main content block
Markdown conversion — DOM-to-markdown with absolute URL resolution
SPA fallback — when JS rendering fails, extracts from meta tags, Open Graph, JSON-LD, and noscript content

Works on static HTML, server-rendered pages, and pure client-side SPAs (React, Vue, etc.).

Python API

from kloakt import extract, extract_fields, fetch, scrape

# Extract clean markdown
page = extract("https://example.com")
print(page.title, page.content, page.meta)

# Cap output length
page = extract("https://example.com", max_chars=3000)

# Wait for SPA content
page = extract("https://example.com", delay=2000)

# Structured field extraction via CSS selectors
data = extract_fields("https://news.ycombinator.com", {
    "title": "title",
    "stories": ".titleline > a[]",   # [] => list of all matches
    "links": ".titleline > a[]@href" # @href => an attribute
})
print(data["data"]["stories"])

# Raw fetch
html = fetch("https://example.com", dump="html")
title = fetch("https://example.com", eval_js="document.title")

# Parallel scrape
results = scrape(["https://a.com", "https://b.com"], concurrency=5)

MCP Server (Claude Code)

Kloakt includes an MCP server for use as a Claude Code tool:

{
  "mcpServers": {
    "kloakt": {
      "command": "python3",
      "args": ["/path/to/kloakt/mcp_server.py"]
    }
  }
}

Exposes kloakt_extract and kloakt_fetch as native tools.

Puppeteer / Playwright

Puppeteer

The CDP server embeds a per-session token in the WebSocket path (like Chrome). Connect via browserURL so the client discovers the token from /json/version automatically — don't hardcode the ws://.../devtools/browser path.

import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserURL: 'http://127.0.0.1:9222', // discovers the tokenized ws endpoint
});

const page = await browser.newPage();
await page.goto('https://news.ycombinator.com');
const stories = await page.evaluate(() =>
  Array.from(document.querySelectorAll('.titleline > a'))
    .map(a => ({ title: a.textContent, url: a.href }))
);
await browser.disconnect();

Playwright

import { chromium } from 'playwright-core';

const browser = await chromium.connectOverCDP({
  endpointURL: 'http://127.0.0.1:9222', // discovers the tokenized ws endpoint
});

const page = await browser.newContext().then(ctx => ctx.newPage());
await page.goto('https://en.wikipedia.org/wiki/Web_scraping');
console.log(await page.title());
await browser.close();

Stealth Mode

Enable with --features stealth.

Per-session fingerprint randomization (GPU, screen, canvas, audio, battery)
Realistic navigator.userAgentData (Chrome 145, high-entropy values)
event.isTrusted = true for dispatched events
Native function masking (Function.prototype.toString() → [native code])
navigator.webdriver = undefined
3,520 tracker domains blocked

CLI Reference

`kloakt extract <URL>`

Flag	Default	Description
`--format`	`markdown`	Output: `markdown`, `text`, or `links`
`--main`	off	Strip nav, header, footer, sidebar
`--json`	off	Structured JSON: title, URL, content, meta
`--max-chars`	unlimited	Truncate content to N characters
`--delay`	`0`	Extra ms to wait after load
`--stealth`	off	Anti-detection mode
`--selector`	—	Wait for CSS selector
`--wait-until`	`load`	`load`, `domcontentloaded`, `networkidle0` (bounded by `--wait`)
`--schema`	—	Extract structured fields as JSON (see below)
`--har`	—	Write captured network activity to a HAR file
`--cache-ttl`	`0`	Cache the result on disk and reuse it for N seconds

Structured extraction with `--schema`

Pass a JSON object mapping field names to CSS selectors. Suffix a selector with [] to return all matches as a list, and with @attr to return an attribute instead of text:

kloakt extract https://news.ycombinator.com \
  --schema '{"title":"title","stories":".titleline > a[]","first_link":".titleline > a@href"}'
# => { "url": ..., "data": { "title": "...", "stories": [...], "first_link": "..." }, "elapsed_ms": ... }

This is also exposed through the MCP kloakt_extract tool via an optional schema argument.

`kloakt fetch <URL>`

Flag	Default	Description
`--dump`	`html`	Output: `html`, `text`, `links`, `markdown`
`--eval`	—	JavaScript expression to evaluate
`--wait-until`	`load`	Wait condition
`--selector`	—	Wait for CSS selector
`--stealth`	off	Anti-detection mode
`--quiet`	off	Suppress banner

`kloakt serve`

Flag	Default	Description
`--port`	`9222`	WebSocket port
`--proxy`	—	HTTP/SOCKS5 proxy URL
`--stealth`	off	Anti-detection + tracker blocking
`--workers`	`1`	Parallel workers

`kloakt scrape <URL...>`

Flag	Default	Description
`--concurrency`	`10`	Parallel workers
`--eval`	—	JS expression per page
`--format`	`json`	Output: `json` or `text`

`kloakt benchmark <URL...>`

Measure load performance per URL — average/min/max load time, request count, bytes, and DOM node count — as a table or --json.

kloakt benchmark https://example.com https://news.ycombinator.com --runs 3

Flag	Default	Description
`--runs`	`1`	Runs per URL (reports the average)
`--json`	off	Emit JSON instead of a table
`--wait-until`	`load`	`load`, `domcontentloaded`, or `networkidle0`

Challenge / bot-wall detection

kloakt extract --json includes a "challenge" field reporting a detected captcha or bot wall (recaptcha, hcaptcha, turnstile, cloudflare, datadome, perimeterx) or null. This is detection only — kloakt tells you a page is gated so an agent can stop and back off; it does not attempt to solve or evade challenges. (Also surfaced via the MCP kloakt_extract output and the Python Page.challenge field.)

Global flags

Flag	Default	Description
`--obey-robots`	off	Respect `robots.txt` — refuse to fetch disallowed paths
`--allow-private`	off	Allow private/internal/loopback hosts (disables the SSRF guard)

Security note: by default kloakt refuses to fetch private, loopback, link-local, and cloud-metadata addresses (SSRF protection), and rejects file:// URLs. Use --allow-private only when you intentionally need to reach internal services. The CDP server binds to 127.0.0.1 and validates the Host header to block DNS-rebinding.

CDP API

Full Chrome DevTools Protocol support for Puppeteer/Playwright compatibility.

Domain	Methods
Target	createTarget, closeTarget, attachToTarget, createBrowserContext, disposeBrowserContext
Page	navigate, getFrameTree, addScriptToEvaluateOnNewDocument, lifecycleEvents
Runtime	evaluate, callFunctionOn, getProperties, addBinding
DOM	getDocument, querySelector, querySelectorAll, getOuterHTML, resolveNode
Network	enable, setCookies, getCookies, setExtraHTTPHeaders, setUserAgentOverride
Fetch	enable, continueRequest, fulfillRequest, failRequest
Storage	getCookies, setCookies, deleteCookies
Input	dispatchMouseEvent, dispatchKeyEvent

License

Apache 2.0 — Based on Obscura by h4ckf0r0day.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Registryactive

Packagekloakt

TransportSTDIO

UpdatedApr 28, 2026

View on GitHub

Kloakt

Cloaked headless browser for AI agents.
Lightweight, stealthy, built in Rust. Based on Obscura.

Kloakt is a headless browser built for AI agents. It runs JavaScript via V8, extracts clean markdown from any page (including SPAs), and exposes tools via MCP for Claude Code and other AI systems.

Why Kloakt?

Metric	Kloakt	Headless Chrome
Memory	30 MB	200+ MB
Binary size	70 MB	300+ MB
Anti-detect	Built-in	None
Page load	85 ms	~500 ms
Startup	Instant	~2s
SPA extract	Yes	Manual

Install

Build from source

git clone https://github.com/KultMember6Banger/kloakt.git
cd kloakt
cargo build --release

# With stealth mode (anti-detection + tracker blocking)
cargo build --release --features stealth

Requires Rust 1.75+ (rustup.rs). First build takes ~5 min (V8 compiles from source, cached after).

Quick Start

Extract content (AI agent use)

# Clean markdown from any page
kloakt extract https://example.com --main

# Structured JSON with metadata
kloakt extract https://example.com --main --json

# Cap output for agent context windows
kloakt extract https://en.wikipedia.org/wiki/Rust --main --json --max-chars 3000

# Wait for SPA hydration
kloakt extract https://example.com --delay 2000 --json

Fetch a page

# Get the page title
kloakt fetch https://example.com --eval "document.title"

# Extract all links
kloakt fetch https://example.com --dump links

# Render JavaScript and dump markdown
kloakt fetch https://news.ycombinator.com --dump markdown

# Wait for dynamic content
kloakt fetch https://example.com --wait-until networkidle0

Start the CDP server

kloakt serve --port 9222

# With stealth mode
kloakt serve --port 9222 --stealth

Scrape in parallel

kloakt scrape url1 url2 url3 ... \
  --concurrency 25 \
  --eval "document.querySelector('h1').textContent" \
  --format json

Smart Extraction

The extract command uses a multi-phase pipeline optimized for AI agents:

Noise removal — strips cookie banners, ads, popups, nav, social widgets
Content scoring — text-density algorithm (Readability-like) finds the main content block
Markdown conversion — DOM-to-markdown with absolute URL resolution
SPA fallback — when JS rendering fails, extracts from meta tags, Open Graph, JSON-LD, and noscript content

Works on static HTML, server-rendered pages, and pure client-side SPAs (React, Vue, etc.).

Python API

from kloakt import extract, extract_fields, fetch, scrape

# Extract clean markdown
page = extract("https://example.com")
print(page.title, page.content, page.meta)

# Cap output length
page = extract("https://example.com", max_chars=3000)

# Wait for SPA content
page = extract("https://example.com", delay=2000)

# Structured field extraction via CSS selectors
data = extract_fields("https://news.ycombinator.com", {
    "title": "title",
    "stories": ".titleline > a[]",   # [] => list of all matches
    "links": ".titleline > a[]@href" # @href => an attribute
})
print(data["data"]["stories"])

# Raw fetch
html = fetch("https://example.com", dump="html")
title = fetch("https://example.com", eval_js="document.title")

# Parallel scrape
results = scrape(["https://a.com", "https://b.com"], concurrency=5)

MCP Server (Claude Code)

Kloakt includes an MCP server for use as a Claude Code tool:

{
  "mcpServers": {
    "kloakt": {
      "command": "python3",
      "args": ["/path/to/kloakt/mcp_server.py"]
    }
  }
}

Exposes kloakt_extract and kloakt_fetch as native tools.

Puppeteer / Playwright

Puppeteer

import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserURL: 'http://127.0.0.1:9222', // discovers the tokenized ws endpoint
});

const page = await browser.newPage();
await page.goto('https://news.ycombinator.com');
const stories = await page.evaluate(() =>
  Array.from(document.querySelectorAll('.titleline > a'))
    .map(a => ({ title: a.textContent, url: a.href }))
);
await browser.disconnect();

Playwright

import { chromium } from 'playwright-core';

const browser = await chromium.connectOverCDP({
  endpointURL: 'http://127.0.0.1:9222', // discovers the tokenized ws endpoint
});

const page = await browser.newContext().then(ctx => ctx.newPage());
await page.goto('https://en.wikipedia.org/wiki/Web_scraping');
console.log(await page.title());
await browser.close();

Stealth Mode

Enable with --features stealth.

Per-session fingerprint randomization (GPU, screen, canvas, audio, battery)
Realistic navigator.userAgentData (Chrome 145, high-entropy values)
event.isTrusted = true for dispatched events
Native function masking (Function.prototype.toString() → [native code])
navigator.webdriver = undefined
3,520 tracker domains blocked

CLI Reference

`kloakt extract <URL>`

Flag	Default	Description
`--format`	`markdown`	Output: `markdown`, `text`, or `links`
`--main`	off	Strip nav, header, footer, sidebar
`--json`	off	Structured JSON: title, URL, content, meta
`--max-chars`	unlimited	Truncate content to N characters
`--delay`	`0`	Extra ms to wait after load
`--stealth`	off	Anti-detection mode
`--selector`	—	Wait for CSS selector
`--wait-until`	`load`	`load`, `domcontentloaded`, `networkidle0` (bounded by `--wait`)
`--schema`	—	Extract structured fields as JSON (see below)
`--har`	—	Write captured network activity to a HAR file
`--cache-ttl`	`0`	Cache the result on disk and reuse it for N seconds

Structured extraction with `--schema`

Pass a JSON object mapping field names to CSS selectors. Suffix a selector with [] to return all matches as a list, and with @attr to return an attribute instead of text:

kloakt extract https://news.ycombinator.com \
  --schema '{"title":"title","stories":".titleline > a[]","first_link":".titleline > a@href"}'
# => { "url": ..., "data": { "title": "...", "stories": [...], "first_link": "..." }, "elapsed_ms": ... }

This is also exposed through the MCP kloakt_extract tool via an optional schema argument.

`kloakt fetch <URL>`

Flag	Default	Description
`--dump`	`html`	Output: `html`, `text`, `links`, `markdown`
`--eval`	—	JavaScript expression to evaluate
`--wait-until`	`load`	Wait condition
`--selector`	—	Wait for CSS selector
`--stealth`	off	Anti-detection mode
`--quiet`	off	Suppress banner

`kloakt serve`

Flag	Default	Description
`--port`	`9222`	WebSocket port
`--proxy`	—	HTTP/SOCKS5 proxy URL
`--stealth`	off	Anti-detection + tracker blocking
`--workers`	`1`	Parallel workers

`kloakt scrape <URL...>`

Flag	Default	Description
`--concurrency`	`10`	Parallel workers
`--eval`	—	JS expression per page
`--format`	`json`	Output: `json` or `text`

`kloakt benchmark <URL...>`

Measure load performance per URL — average/min/max load time, request count, bytes, and DOM node count — as a table or --json.

kloakt benchmark https://example.com https://news.ycombinator.com --runs 3

Flag	Default	Description
`--runs`	`1`	Runs per URL (reports the average)
`--json`	off	Emit JSON instead of a table
`--wait-until`	`load`	`load`, `domcontentloaded`, or `networkidle0`

Challenge / bot-wall detection

Global flags

Flag	Default	Description
`--obey-robots`	off	Respect `robots.txt` — refuse to fetch disallowed paths
`--allow-private`	off	Allow private/internal/loopback hosts (disables the SSRF guard)

Security note: by default kloakt refuses to fetch private, loopback, link-local, and cloud-metadata addresses (SSRF protection), and rejects file:// URLs. Use --allow-private only when you intentionally need to reach internal services. The CDP server binds to 127.0.0.1 and validates the Host header to block DNS-rebinding.

CDP API

Full Chrome DevTools Protocol support for Puppeteer/Playwright compatibility.

Domain	Methods
Target	createTarget, closeTarget, attachToTarget, createBrowserContext, disposeBrowserContext
Page	navigate, getFrameTree, addScriptToEvaluateOnNewDocument, lifecycleEvents
Runtime	evaluate, callFunctionOn, getProperties, addBinding
DOM	getDocument, querySelector, querySelectorAll, getOuterHTML, resolveNode
Network	enable, setCookies, getCookies, setExtraHTTPHeaders, setUserAgentOverride
Fetch	enable, continueRequest, fulfillRequest, failRequest
Storage	getCookies, setCookies, deleteCookies
Input	dispatchMouseEvent, dispatchKeyEvent

License

Apache 2.0 — Based on Obscura by h4ckf0r0day.

Kloakt

Kloakt

Why Kloakt?

Install

Build from source

Quick Start

Extract content (AI agent use)

Fetch a page

Start the CDP server

Scrape in parallel

Smart Extraction

Python API

MCP Server (Claude Code)

Puppeteer / Playwright

Puppeteer

Playwright

Stealth Mode

CLI Reference

kloakt extract <URL>

Structured extraction with --schema

kloakt fetch <URL>

kloakt serve

kloakt scrape <URL...>

kloakt benchmark <URL...>

Challenge / bot-wall detection

Global flags

CDP API

License

Kloakt

Kloakt

Why Kloakt?

Install

Build from source

Quick Start

Extract content (AI agent use)

Fetch a page

Start the CDP server

Scrape in parallel

Smart Extraction

Python API

MCP Server (Claude Code)

Puppeteer / Playwright

Puppeteer

Playwright

Stealth Mode

CLI Reference

kloakt extract <URL>

Structured extraction with --schema

kloakt fetch <URL>

kloakt serve

kloakt scrape <URL...>

kloakt benchmark <URL...>

Challenge / bot-wall detection

Global flags

CDP API

License

`kloakt extract <URL>`

Structured extraction with `--schema`

`kloakt fetch <URL>`

`kloakt serve`

`kloakt scrape <URL...>`

`kloakt benchmark <URL...>`

`kloakt extract <URL>`

Structured extraction with `--schema`

`kloakt fetch <URL>`

`kloakt serve`

`kloakt scrape <URL...>`

`kloakt benchmark <URL...>`