CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

HTML to Markdown MCP Server

sunshad0w/html2md-mcp
51 toolsSTDIOregistry active
Summary

Converts web pages to clean Markdown with 90-95% size reduction, making HTML digestible for AI context windows. Built on trafilatura and BeautifulSoup4, it strips scripts, styles, and navigation while preserving tables, images, and links. The Playwright integration handles JavaScript-heavy SPAs and authenticated pages by executing client-side code and leveraging your browser profile with cookies. Stream processing and configurable size limits (1MB-50MB) keep large pages manageable, while optional caching speeds up repeated conversions. Reach for this when you need to feed web content into Claude but the raw HTML is too bloated or requires JavaScript rendering to display properly.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

Tools

Public tool metadata for what this MCP can expose to an agent.

1 tools
text_convert_html_to_markdownUse this when you need to convert HTML to clean Markdown text. Returns the converted markdown in JSON. Returns: 1. markdown (converted text) 2. inputLength (character count of HTML) 3. outputLength (1 params

Use this when you need to convert HTML to clean Markdown text. Returns the converted markdown in JSON. Returns: 1. markdown (converted text) 2. inputLength (character count of HTML) 3. outputLength (

Parameters* required
htmlstring
HTML content to convert to markdown

HTML to Markdown MCP Server

MCP (Model Context Protocol) server for converting HTML webpages to clean Markdown format. Reduces HTML size by ~90-95% while preserving tables, images, and important content - perfect for AI context.

Features

  • Converts HTML from URLs to clean Markdown
  • Preserves tables, images, and links
  • Removes unnecessary elements (scripts, styles, navigation, footers, headers)
  • Significant size reduction (typically 90-95% compression)
  • Configurable options for images, tables, and links
  • Built with trafilatura and BeautifulSoup4 for robust extraction
  • Stream processing for efficient handling of large pages
  • Size limits to prevent downloading excessively large content (1MB-50MB)
  • Optional caching to speed up repeated conversions of the same URLs
  • 🌐 Browser mode with Playwright - Handles JavaScript-heavy sites and authenticated pages
    • Execute JavaScript (perfect for SPAs: React, Vue, Angular)
    • Use your browser profile with cookies (access authenticated pages!)
    • Support for Chrome, Firefox, WebKit
    • Configurable wait strategies for dynamic content

Installation

Prerequisites

  • Python 3.10 or higher
  • uv package manager (recommended) or pip

Install with uv (recommended)

# Clone the repository
git clone <your-repo-url>
cd html2md

# Install dependencies
uv pip install -e .

# Install Playwright browsers (required for browser mode)
playwright install chromium

Install with pip

# Clone the repository
git clone <your-repo-url>
cd html2md

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .

# Install Playwright browsers (required for browser mode)
playwright install chromium

Docker Installation (Recommended for Production)

The easiest way to use html2md is with Docker:

# Build the image
docker build -t html2md .

# Or use pre-built image (when published)
docker pull your-registry/html2md:latest

For Claude Desktop, configure with Docker:

{
  "mcpServers": {
    "html2md": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "html2md"
      ]
    }
  }
}

Docker Image Features:

  • Pre-installed Playwright with Chromium
  • Optimized for minimal size (~1GB)
  • Non-root user for security
  • Ready to use - no additional setup required

Configuration

Add the server to your Claude Desktop configuration file:

macOS

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "html2md": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/html2md",
        "run",
        "html2md"
      ]
    }
  }
}

Windows

Edit %APPDATA%/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "html2md": {
      "command": "uv",
      "args": [
        "--directory",
        "C:\\absolute\\path\\to\\html2md",
        "run",
        "html2md"
      ]
    }
  }
}

Linux

Edit ~/.config/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "html2md": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/html2md",
        "run",
        "html2md"
      ]
    }
  }
}

Usage

Once configured, the MCP server will be available in Claude Desktop. You can use the html_to_markdown tool:

Example 1: Basic conversion

Convert this webpage to markdown: https://example.com/article

Example 2: With options

Use the html_to_markdown tool with:
- url: https://example.com/docs
- include_images: false
- include_tables: true

Example 3: Browser mode for JavaScript-heavy sites

Use the html_to_markdown tool with:
- url: https://spa-application.com
- fetch_method: playwright
- wait_for: networkidle

Example 4: Access authenticated pages

Use the html_to_markdown tool with:
- url: https://private-site.com/dashboard
- fetch_method: playwright
- use_user_profile: true
- browser_type: chromium

Note: For use_user_profile=true, make sure Chrome is closed before running.

Tool Parameters

Basic Parameters:

  • url (required): URL of the webpage to convert
  • include_images (optional, default: true): Include images in Markdown
  • include_tables (optional, default: true): Include tables in Markdown
  • include_links (optional, default: true): Include links in Markdown
  • timeout (optional, default: 30): Request timeout in seconds (5-120)

Performance Parameters:

  • max_size (optional, default: 10MB): Maximum size of content to download in bytes (1MB-50MB)
  • use_cache (optional, default: false): Enable caching for faster repeated conversions
  • cache_ttl (optional, default: 3600): Cache time-to-live in seconds (60-86400)

Browser Mode Parameters:

  • fetch_method (optional, default: "fetch"): Fetch method - "fetch" (fast) or "playwright" (handles JS, auth)
  • browser_type (optional, default: "chromium"): Browser to use - "chromium", "firefox", or "webkit"
  • headless (optional, default: true): Run browser in headless mode
  • wait_for (optional, default: "networkidle"): Wait strategy - "load", "domcontentloaded", or "networkidle"
  • use_user_profile (optional, default: false): Use your browser profile with cookies (requires Chrome closed)

Development

Install development dependencies

uv pip install -e ".[dev]"

Run tests

pytest

Code formatting

# Format with black
black src/ tests/

# Lint with ruff
ruff check src/ tests/

Type checking

mypy src/

Architecture

The project consists of three main modules:

converter.py

Core HTML to Markdown conversion functionality:

  • fetch_html(): Downloads HTML from URL
  • clean_html(): Removes unnecessary elements with BeautifulSoup
  • convert_to_markdown(): Converts cleaned HTML to Markdown with trafilatura
  • html_to_markdown(): Main workflow combining all steps

server.py

MCP server implementation:

  • Registers the html_to_markdown tool
  • Handles tool calls and error responses
  • Runs async MCP server with stdio transport

utils.py

Utility functions:

  • Hash calculation for caching
  • Text formatting and truncation
  • Domain extraction
  • Filename sanitization

cache.py

In-memory caching system:

  • SimpleCache class with TTL support
  • Global cache instance management
  • Automatic expiration of old entries
  • Hash-based cache keys for URL + parameters

browser.py

Playwright browser automation:

  • fetch_html_playwright() - Async browser-based HTML fetching
  • Support for Chromium, Firefox, WebKit
  • User profile integration for authenticated access
  • Configurable wait strategies for dynamic content

Troubleshooting

Server not appearing in Claude Desktop

  1. Check that the path in claude_desktop_config.json is absolute and correct
  2. Restart Claude Desktop completely
  3. Check Claude Desktop logs for errors

Installation issues

# Verify Python version
python --version  # Should be 3.10+

# Try reinstalling dependencies
uv pip install --force-reinstall -e .

Conversion errors

  • Timeout errors: Increase the timeout parameter
  • Empty content: Some websites may block automated requests or use JavaScript rendering
    • Solution: Use fetch_method: playwright to execute JavaScript
  • Parse errors: The webpage structure may be unusual or malformed
  • Content too large: Increase the max_size parameter (up to 50MB) or the page exceeds limits
  • Cache issues: Disable caching with use_cache: false if you need fresh content

Browser mode issues

  • Playwright not installed: Run playwright install chromium
  • Browser launch fails: Check that you have sufficient permissions and disk space
  • User profile error: Make sure Chrome is completely closed before using use_user_profile: true
  • Page doesn't load fully: Try different wait_for strategies:
    • "load" - fastest, waits for page load event
    • "domcontentloaded" - waits for DOM to be ready
    • "networkidle" - slowest but most reliable, waits for network to be idle
  • Authentication not working: Ensure you're using browser_type: chromium and use_user_profile: true

Performance

Typical conversion results:

  • Original HTML: ~500KB - 2MB
  • Markdown output: ~25KB - 100KB
  • Compression: 90-95%
  • Processing time: 2-10 seconds (depending on page size and network)

License

MIT

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Credits

Built with:

  • MCP SDK - Model Context Protocol
  • trafilatura - Web content extraction
  • BeautifulSoup4 - HTML parsing
Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Categories
Web & Browser AutomationDocuments & Knowledge
Registryactive
Packagedocker.io/sunshad0w/html2md-mcp
TransportSTDIO
UpdatedNov 1, 2025
View on GitHub

Related Web & Browser Automation MCP Servers

View all →
Browser Use

therealtimex/browser-use

AI browser automation - navigate, click, type, extract content, and run autonomous web tasks
Fetcher

jae-jae/fetcher-mcp

Fetch web page content using a Playwright headless browser with intelligent content extraction and Markdown/HTML output.
1k
Puppeteer

merajmehrabi/puppeteer-mcp-server

This MCP server provides browser automation capabilities through Puppeteer, allowing interaction with both new browser instances and existing Chrome windows.
449
Playwright Mcp Server

com.thenextgennexus/playwright-mcp-server

Headless browser primitives for AI agents when sites need real JS rendering.
Browser

saik0s/mcp-browser-use

Provides a browser automation MCP server that lets AI assistants control a real browser for navigation, form interaction, data extraction, and more.
933
Browser Use

kontext-dev/browser-use-mcp-server

Browse the web, directly from Cursor etc.
822