This is a web scraping toolkit that turns messy HTML into structured data. You get 11 tools split across fetching (with custom headers and auth support), conversion (HTML to Markdown, JSON to Markdown, text extraction), and extraction (article content via readability, metadata, links, images, JSON-LD structured data). Runs over stdio, so it plugs into Claude Desktop or any MCP client. Useful when you need to pull clean content from web pages without writing your own scraper, whether that's grabbing article text, converting documentation to Markdown, or extracting SEO metadata. Built on Python 3.11+ and ships with automated publishing scripts for PyPI.
A dedicated web content fetching and conversion MCP (Model Context Protocol) server that provides tools for fetching, converting, and extracting data from web pages.
From MCP Registry (Recommended)
This server is available in the Model Context Protocol Registry. Install it using your MCP client.
mcp-name: io.github.huoshuiai42/huoshui-fetch
# Using uv (recommended)
uv sync
# Or install from GitHub
pip install git+https://github.com/yourusername/huoshui-fetch.git
# From the repository
uvx --from . huoshui-fetch
# From GitHub (once published)
uvx --from git+https://github.com/yourusername/huoshui-fetch.git huoshui-fetch
# Using uv
uv run python -m huoshui_fetch
# Or if installed
python -m huoshui_fetch
The server communicates via standard input/output, making it perfect for integration with Claude Desktop and other MCP-compatible clients.
Add to your Claude Desktop configuration:
{
"mcpServers": {
"huoshui-fetch": {
"command": "uvx",
"args": ["--no-cache", "--from", ".", "huoshui-fetch"],
"cwd": "/path/to/huoshui-fetch"
}
}
}
Or if installed from GitHub:
{
"mcpServers": {
"huoshui-fetch": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/yourusername/huoshui-fetch.git",
"huoshui-fetch"
]
}
}
}
Once configured, you can use the tools in Claude Desktop:
// Fetch a webpage
fetch_url("https://example.com")
// Convert HTML to Markdown
html_to_markdown_tool("<h1>Hello</h1><p>World</p>")
// Extract article content
extract_article_tool(html_content, "https://example.com/article")
This project includes comprehensive automation for building and publishing to PyPI.
# Complete automated workflow (TestPyPI + PyPI)
uv run python scripts/publish.py --include-pypi
# TestPyPI only (recommended for testing)
uv run python scripts/publish.py
# Bump version and publish
uv run python scripts/publish.py --version-bump patch --include-pypi
# Version management
uv run python scripts/version_manager.py --check
uv run python scripts/version_manager.py --bump patch
# Setup PyPI credentials (first time)
uv run python scripts/credentials_setup.py
# Build package
uv run python scripts/build.py
# Run comprehensive tests
uv run python scripts/test.py
# Upload to PyPI
uv run python scripts/upload.py
uv publish (supports .pypirc files)See PUBLISHING.md for detailed documentation.
This project supports DXT (Desktop Extensions) format for easy distribution and installation.
To build the DXT extension:
python build_dxt.py
This will create a huoshui-fetch-{version}.dxt file that can be installed in compatible AI desktop applications.
MIT
com.mcparmory/google-search
io.github.pipeworx-io/brave-search
marcopesani/mcp-server-serper
brave/brave-search-mcp-server
com.mcparmory/google-search-console
acamolese/google-search-console-mcp