Playwright meets accessibility trees for browser automation through Claude. You get 8 tools: navigate, click, type, scroll, snapshot, extract_content, go_back, and wait. The hook here is token efficiency for local LLMs. Instead of dumping full HTML to your context window, it sends accessibility trees with ref IDs like e1, e2 for stable element targeting, plus MarkGrab for clean markdown extraction. Claims 500 to 1,500 tokens per step versus 4,000 to 10,000 for browser-use. Built by the QuartzUnit folks who made MarkGrab, SnapGrab, and DocPick. Reach for this when you need Claude to navigate websites but don't want to burn tokens on raw DOM dumps or coordinate hunting.
Token-efficient browser agent for local LLMs — Playwright + accessibility tree + MarkGrab, MCP native.
browsegrab is a lightweight browser automation library designed for local LLMs (8B-35B parameters). It combines Playwright's accessibility tree with MarkGrab's HTML-to-markdown conversion to achieve 5-8x fewer tokens per step compared to alternatives like browser-use.
e1, e2, ...) without vision modelsplaywright + httpx in corepip install browsegrab
playwright install chromium
With optional features:
pip install browsegrab[mcp] # MCP server support
pip install browsegrab[content] # MarkGrab content extraction
pip install browsegrab[cli] # CLI with rich output
pip install browsegrab[all] # Everything
from browsegrab import BrowseSession
async with BrowseSession() as session:
# Navigate and get accessibility tree snapshot
await session.navigate("https://example.com")
snap = await session.snapshot()
print(snap.tree_text)
# - heading "Example Domain" [level=1]
# - link "Learn more": [ref=e1]
# Click using ref ID
result = await session.click("e1")
print(result.url) # https://www.iana.org/help/example-domains
# Type into search box
await session.navigate("https://en.wikipedia.org")
snap = await session.snapshot()
await session.type("e4", "Python programming", submit=True)
# Extract compressed content (AX tree + markdown)
content = await session.extract_content()
# Accessibility tree snapshot
browsegrab snapshot https://example.com
# JSON output
browsegrab snapshot https://example.com -f json
# Extract content (AX tree + markdown)
browsegrab extract https://en.wikipedia.org/wiki/Python
# Agentic browse (requires LLM endpoint)
browsegrab browse https://example.com "Find the about page"
browsegrab-mcp # Start MCP server (stdio)
Claude Desktop / Cursor / VS Code config:
{
"mcpServers": {
"browsegrab": {
"command": "browsegrab-mcp"
}
}
}
8 MCP tools: browser_navigate, browser_click, browser_type, browser_snapshot, browser_scroll, browser_extract_content, browser_go_back, browser_wait
flowchart LR
A["🌐 URL + Goal"] --> B["Navigate"]
B --> C["AX Tree Snapshot\n~200–500 tokens"]
C --> D{"LLM\nDecision"}
D -->|"click / type / scroll"| E["Execute Action"]
E --> C
D -->|"goal reached"| F["Extract Content\n(MarkGrab)"]
F --> G["✅ Result"]
browsegrab separates structure (accessibility tree) from content (MarkGrab markdown), sending only what the LLM needs:
flowchart TD
A["Raw HTML"] --> B["Accessibility Tree"]
A --> C["MarkGrab Markdown"]
B --> D["Structure: ~200–500 tokens\nInteractive elements with ref IDs"]
C --> E["Content: ~300–800 tokens\nClean markdown · on-demand"]
D --> F["Combined: ~500–1,300 tokens/step\n⚡ 5–8× fewer than browser-use"]
E --> F
| Page | Interactive elements | Tokens | browser-use equivalent |
|---|---|---|---|
| example.com | 1 | ~60 | ~500+ |
| Wikipedia article | 452 | ~1,254 | ~10,000+ |
browsegrab/
├── config.py # Dataclass configs (env var loading)
├── result.py # Result types (ActionResult, BrowseResult, ...)
├── session.py # BrowseSession orchestrator
├── browser/
│ ├── manager.py # Playwright lifecycle (async context manager)
│ ├── snapshot.py # Accessibility tree + ref system
│ ├── selectors.py # 4-strategy selector resolver
│ └── actions.py # navigate, click, type, scroll, go_back, wait
├── dom/
│ ├── ref_map.py # ref ID ↔ element bidirectional mapping
│ └── compress.py # AX tree + MarkGrab → compressed context
├── llm/
│ ├── base.py # LLMProvider ABC
│ ├── provider.py # vLLM, Ollama, OpenAI-compatible
│ ├── prompt.py # System prompts (~400 tokens)
│ └── parse.py # 5-stage JSON fallback parser
├── agent/
│ ├── history.py # Sliding window history compression
│ ├── cache.py # Domain-based success pattern cache
│ └── loop_guard.py # Duplicate action detection
├── __main__.py # CLI (click)
└── mcp_server.py # FastMCP server (8 tools)
All settings via environment variables (BROWSEGRAB_* prefix):
# Browser
BROWSEGRAB_BROWSER_HEADLESS=true
BROWSEGRAB_BROWSER_TIMEOUT_MS=30000
# LLM (for agentic browse)
BROWSEGRAB_LLM_PROVIDER=vllm # vllm | ollama | openai
BROWSEGRAB_LLM_BASE_URL=http://localhost:8000/v1
BROWSEGRAB_LLM_MODEL=Qwen/Qwen3.5-32B-AWQ
# Agent
BROWSEGRAB_AGENT_MAX_STEPS=10
BROWSEGRAB_AGENT_ENABLE_CACHE=true
| Library | Role |
|---|---|
| markgrab | Passive extraction (URL → markdown) |
| snapgrab | Passive capture (URL → screenshot) |
| docpick | Document OCR → structured JSON |
| browsegrab | Active automation (goal → browser actions → results) |
git clone https://github.com/QuartzUnit/browsegrab.git
cd browsegrab
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
playwright install chromium
# Unit tests (no browser needed)
pytest tests/ -m "not e2e"
# Full suite including E2E
pytest tests/ -v
Part of the QuartzUnit ecosystem — composable Python libraries for data collection, extraction, search, and AI agent safety.
therealtimex/browser-use
jae-jae/fetcher-mcp
merajmehrabi/puppeteer-mcp-server
com.thenextgennexus/playwright-mcp-server
saik0s/mcp-browser-use