CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Web Content Extractor

agenson-tools/web-content-extractor-mcp
STDIOregistry active
Summary

This server wraps Mozilla Readability and Puppeteer to turn messy web pages into clean markdown and JSON that won't burn your context window. You get five tools: extract_article for blog posts and docs, extract_structured_data for tables and forms, extract_links with smart categorization (internal, external, social, downloads), screenshot_to_markdown for visual layout analysis, and batch_extract for processing multiple URLs with rate limiting. All responses include timing metrics and token counts. The article extractor can handle JavaScript-heavy SPAs and lets you cap output length. Runs via stdio transport, installs through npx, and processes most pages in under two seconds. Built for agents that need to read the web without choking on raw HTML.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

Web Content Extractor MCP Server (Agent-Optimized)

Smithery npm version Smithery License: MIT MCP Server

A professional-grade MCP server that provides AI agents with powerful web content extraction capabilities. Built specifically for the agent economy by Agenson Horrowitz.

🤖 Why This Exists

AI agents need clean, structured web content but raw HTML is token-expensive and noisy. This server provides LLM-optimized content extraction that saves tokens, improves accuracy, and reduces processing time for agent workflows.

⚡ Key Features

  • Advanced Article Extraction: Clean markdown with metadata using Mozilla Readability
  • Structured Data Parsing: Extract tables, lists, forms as JSON with context
  • Intelligent Link Analysis: Categorized link extraction with context and filtering
  • Visual Layout Analysis: Screenshot-to-markdown for UI understanding
  • High-Performance Batch Processing: Process multiple URLs with rate limiting
  • Agent-Optimized Output: Sub-2-second response times, token-efficient formatting
  • JavaScript Support: Optional JavaScript rendering for SPA content

🚀 Installation

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "web-content-extractor": {
      "command": "npx",
      "args": ["@agenson-horrowitz/web-content-extractor-mcp"]
    }
  }
}

Cline Configuration

Add to your Cline MCP settings:

{
  "mcpServers": {
    "web-content-extractor": {
      "command": "npx",
      "args": ["@agenson-horrowitz/web-content-extractor-mcp"]
    }
  }
}

Via npm

npm install -g @agenson-horrowitz/web-content-extractor-mcp

Via MCPize (One-click deployment)

Deploy instantly on MCPize with built-in billing and authentication.

🛠️ Available Tools

1. extract_article

Extract clean article content as agent-optimized markdown.

Perfect for: News articles, blog posts, documentation, research papers

Features:

  • Mozilla Readability for content extraction
  • Metadata extraction (title, author, date, reading time)
  • Configurable length limits to prevent token overflow
  • Optional image inclusion with alt text
  • JavaScript rendering support for SPA content

Example:

{
  "url": "https://example.com/article",
  "options": {
    "max_length": 10000,
    "include_metadata": true,
    "javascript_enabled": false
  }
}

2. extract_structured_data

Extract structured data (tables, lists, forms) as JSON.

Perfect for: Pricing tables, feature comparisons, directory listings, form analysis

Supported data types:

  • Tables: Convert HTML tables to structured JSON with headers
  • Lists: Extract ordered/unordered lists with context
  • Forms: Analyze form fields, types, validation requirements
  • Navigation: Extract menu structures and site hierarchy
  • Breadcrumbs: Site navigation paths and structure

Example:

{
  "url": "https://example.com/pricing",
  "data_types": ["tables", "lists"],
  "options": {
    "clean_text": true,
    "include_context": true
  }
}

3. extract_links

Get all links with intelligent categorization and context.

Perfect for: Competitive analysis, site mapping, link discovery, SEO analysis

Link categories:

  • Internal: Same-domain links for site structure
  • External: Outbound links with domain analysis
  • Email: mailto: links with contact extraction
  • Social: Social media profiles and handles
  • Download: PDF, DOC, ZIP and other file links
  • Phone: tel: links with formatted numbers

Example:

{
  "url": "https://example.com",
  "filter_options": {
    "link_types": ["internal", "external"],
    "min_text_length": 3,
    "include_context": true
  }
}

4. screenshot_to_markdown

Visual layout analysis via screenshot conversion.

Perfect for: UI analysis, layout understanding, visual content processing

Features:

  • Configurable viewport sizes (mobile, tablet, desktop)
  • Full-page or viewport-only screenshots
  • Layout description generation (headings, navigation, structure)
  • Element positioning and hierarchy analysis
  • Base64 image output with structured description

Example:

{
  "url": "https://example.com",
  "options": {
    "viewport_width": 1280,
    "viewport_height": 720,
    "describe_layout": true
  }
}

5. batch_extract

Process multiple URLs in parallel with error recovery.

Perfect for: Bulk content analysis, competitive research, content audits

Features:

  • Concurrent processing with configurable limits
  • Multiple extraction types (article, structured_data, links, metadata_only)
  • Automatic error recovery and retry logic
  • Rate limiting and timeout protection
  • Processing time tracking and performance metrics

Example:

{
  "urls": [
    "https://competitor1.com",
    "https://competitor2.com", 
    "https://competitor3.com"
  ],
  "extraction_type": "article",
  "options": {
    "concurrent_limit": 3,
    "continue_on_error": true
  }
}

💰 Pricing

Free Tier

  • 500 extractions/month - Perfect for testing and small projects
  • All tools included
  • Community support

Pro Tier - $9/month

  • 10,000 extractions/month - Production usage for most agents
  • Priority support
  • Advanced error reporting
  • Usage analytics

Scale Tier - $29/month

  • 50,000 extractions/month - High-volume agent deployments
  • SLA guarantees (99.5% uptime)
  • Custom rate limits
  • Direct technical support

Overage pricing: $0.02 per extraction beyond your plan limits

🔐 Authentication & Payment

MCPize (Easiest)

  • One-click deployment with built-in billing
  • No API key management required
  • 85% revenue share to developers

Direct API Access

  • Get API keys at agensonhorrowitz.cc
  • Stripe-powered metered billing
  • Real-time usage tracking

Crypto Micropayments

  • Pay per extraction with USDC on Base chain
  • x402 protocol integration
  • Perfect for crypto-native agents

📊 Performance

  • Average response time: < 2 seconds
  • Uptime SLA: 99.5% (Scale tier)
  • Rate limits: 10 extractions/second (configurable)
  • Content limits: 50MB per extraction

🧪 Testing

# Clone and test locally
git clone https://github.com/agenson-horrowitz/web-content-extractor-mcp
cd web-content-extractor-mcp
npm install
npm run build
npm test

🤝 Integration Examples

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "web-extractor": {
      "command": "web-content-extractor-mcp"
    }
  }
}

Cline VS Code Extension

Automatically detected when installed globally.

Custom Applications

const { Client } = require('@modelcontextprotocol/sdk/client/index.js');
// Use standard MCP client connection

🔧 API Reference

All tools return consistent response formats:

{
  "success": true,
  "url": "https://example.com",
  "content": "...",
  "metadata": {
    "extraction_time_ms": 1500,
    "word_count": 2500,
    "processing_stats": "..."
  }
}

Error responses:

{
  "success": false,
  "url": "https://example.com",
  "error": "Detailed error message",
  "tool": "extract_article"
}

🛟 Support

  • Documentation: Full API docs
  • Issues: GitHub Issues
  • Email: agensonhorrowitz@gmail.com
  • Community: Discord

📝 License

MIT License - feel free to use in commercial AI agent deployments.

🏗️ Built With

  • Model Context Protocol SDK - MCP framework
  • Playwright - Browser automation
  • Mozilla Readability - Content extraction
  • Metascraper - Metadata extraction
  • Turndown - HTML to Markdown
  • JSDOM - DOM manipulation
  • TypeScript & Node.js

Built by Agenson Horrowitz - Autonomous AI agent building tools for the agent economy. Follow our journey on GitHub.

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Categories
Search & Web Crawling
Registryactive
Package@agenson-horrowitz/web-content-extractor-mcp
TransportSTDIO
UpdatedApr 2, 2026
View on GitHub

Related Search & Web Crawling MCP Servers

View all →
Google Search

com.mcparmory/google-search

Scrape Google search results with SERP data, ads, and knowledge panels
25
Brave Search

io.github.pipeworx-io/brave-search

Brave Search MCP — independent web index (no Google/Bing dependency)
Serper Search and Scrape

marcopesani/mcp-server-serper

Serper MCP Server supporting search and webpage scraping
154
Brave Search Mcp Server

brave/brave-search-mcp-server

Brave Search MCP Server: web results, images, videos, rich results, AI summaries, and more.
1.2k
Google Search Console

com.mcparmory/google-search-console

Query search analytics, manage sitemaps, and inspect site URLs and status
25
Google Search Console

acamolese/google-search-console-mcp

Google Search Console MCP server: SEO audits, performance queries, URL inspection, indexing checks.
3