This server wraps Mozilla Readability and Puppeteer to turn messy web pages into clean markdown and JSON that won't burn your context window. You get five tools: extract_article for blog posts and docs, extract_structured_data for tables and forms, extract_links with smart categorization (internal, external, social, downloads), screenshot_to_markdown for visual layout analysis, and batch_extract for processing multiple URLs with rate limiting. All responses include timing metrics and token counts. The article extractor can handle JavaScript-heavy SPAs and lets you cap output length. Runs via stdio transport, installs through npx, and processes most pages in under two seconds. Built for agents that need to read the web without choking on raw HTML.
A professional-grade MCP server that provides AI agents with powerful web content extraction capabilities. Built specifically for the agent economy by Agenson Horrowitz.
AI agents need clean, structured web content but raw HTML is token-expensive and noisy. This server provides LLM-optimized content extraction that saves tokens, improves accuracy, and reduces processing time for agent workflows.
Add to your claude_desktop_config.json:
{
"mcpServers": {
"web-content-extractor": {
"command": "npx",
"args": ["@agenson-horrowitz/web-content-extractor-mcp"]
}
}
}
Add to your Cline MCP settings:
{
"mcpServers": {
"web-content-extractor": {
"command": "npx",
"args": ["@agenson-horrowitz/web-content-extractor-mcp"]
}
}
}
npm install -g @agenson-horrowitz/web-content-extractor-mcp
Deploy instantly on MCPize with built-in billing and authentication.
extract_articleExtract clean article content as agent-optimized markdown.
Perfect for: News articles, blog posts, documentation, research papers
Features:
Example:
{
"url": "https://example.com/article",
"options": {
"max_length": 10000,
"include_metadata": true,
"javascript_enabled": false
}
}
extract_structured_dataExtract structured data (tables, lists, forms) as JSON.
Perfect for: Pricing tables, feature comparisons, directory listings, form analysis
Supported data types:
Example:
{
"url": "https://example.com/pricing",
"data_types": ["tables", "lists"],
"options": {
"clean_text": true,
"include_context": true
}
}
extract_linksGet all links with intelligent categorization and context.
Perfect for: Competitive analysis, site mapping, link discovery, SEO analysis
Link categories:
Example:
{
"url": "https://example.com",
"filter_options": {
"link_types": ["internal", "external"],
"min_text_length": 3,
"include_context": true
}
}
screenshot_to_markdownVisual layout analysis via screenshot conversion.
Perfect for: UI analysis, layout understanding, visual content processing
Features:
Example:
{
"url": "https://example.com",
"options": {
"viewport_width": 1280,
"viewport_height": 720,
"describe_layout": true
}
}
batch_extractProcess multiple URLs in parallel with error recovery.
Perfect for: Bulk content analysis, competitive research, content audits
Features:
Example:
{
"urls": [
"https://competitor1.com",
"https://competitor2.com",
"https://competitor3.com"
],
"extraction_type": "article",
"options": {
"concurrent_limit": 3,
"continue_on_error": true
}
}
Overage pricing: $0.02 per extraction beyond your plan limits
# Clone and test locally
git clone https://github.com/agenson-horrowitz/web-content-extractor-mcp
cd web-content-extractor-mcp
npm install
npm run build
npm test
Add to claude_desktop_config.json:
{
"mcpServers": {
"web-extractor": {
"command": "web-content-extractor-mcp"
}
}
}
Automatically detected when installed globally.
const { Client } = require('@modelcontextprotocol/sdk/client/index.js');
// Use standard MCP client connection
All tools return consistent response formats:
{
"success": true,
"url": "https://example.com",
"content": "...",
"metadata": {
"extraction_time_ms": 1500,
"word_count": 2500,
"processing_stats": "..."
}
}
Error responses:
{
"success": false,
"url": "https://example.com",
"error": "Detailed error message",
"tool": "extract_article"
}
MIT License - feel free to use in commercial AI agent deployments.
Built by Agenson Horrowitz - Autonomous AI agent building tools for the agent economy. Follow our journey on GitHub.
com.mcparmory/google-search
io.github.pipeworx-io/brave-search
marcopesani/mcp-server-serper
brave/brave-search-mcp-server
com.mcparmory/google-search-console
acamolese/google-search-console-mcp