CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Webshift

x-monk/webshift
7registry active
Summary

WebShift gives Claude web search that returns clean text, not raw HTML garbage. It runs as a native Rust binary and exposes three tools: webshift_query for full search pipelines (search, fetch, clean, rerank), webshift_fetch for single-page grabs, and webshift_onboarding for configuration help. The pipeline strips scripts, ads, and navigation bloat, then uses BM25 reranking to surface relevant content within strict token budgets. You can plug in SearXNG locally via Docker or use cloud backends like Brave, Tavily, Exa, or SerpAPI. If you need Claude to search the web without flooding context windows with markup noise, this is the tool. No Python runtime, no dependencies, just a single static binary.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

WebShift

Crates.io docs.rs Latest Release Beta License


What is WebShift

WebShift is a Rust library and MCP server that shifts noisy web pages into clean, right-sized text for LLM consumption.

Raw HTML is mostly junk: scripts, ads, navigation menus, cookie banners, tracking pixels. Feeding it directly to an LLM floods the context window with tens of thousands of useless tokens and leaves no room for reasoning. WebShift strips all that noise, sterilizes the text, and enforces strict size budgets so the model receives only the content that matters.

What you get

Depending on the features you enable, WebShift can be four things:

Use caseCrateFeature flagsWhat it does
HTML denoiserwebshiftdefault-features = falseclean() — pure Rust HTML-to-text pipeline. Strips noise elements, sterilizes Unicode/BiDi, collapses whitespace. Zero network, zero config. Drop into any Rust project that processes web content for LLMs.
HTML text rewriterwebshiftfeatures = ["text-map"]extract_text_nodes() + replace_text_nodes() — extract individual text nodes from HTML, manipulate them (translate, rewrite, simplify), and rebuild the HTML with structure intact. Tags, attributes, and links are never touched.
Web content clientwebshiftdefault or features = ["llm"]fetch() + query() — streaming HTTP fetcher with size caps, 8 search backends, BM25 reranking, optional LLM query expansion and summarization. Full pipeline from search query to structured results.
MCP serverwebshift-mcpall featuresNative binary (mcp-webshift) that exposes webshift_query, webshift_fetch, and webshift_onboarding over MCP stdio. Single static binary, zero runtime dependencies.

When to use WebShift

  • You're building an AI agent that needs web search and you want clean, budget-controlled text — not raw HTML.
  • You're processing web pages in a Rust pipeline and need a reliable HTML-to-text cleaner that strips noise without losing real content.
  • You need an LLM to translate, rewrite, or simplify text inside HTML without corrupting the markup — text-map gives you a safe round-trip.
  • You want an MCP web search server that works as a single binary — no Python, no pip, no venv, no Docker (unless you want it).
  • You need hard guarantees on output size: per-page caps, total budget caps, streaming download limits.

When NOT to use WebShift

  • You need a headless browser that renders JavaScript-heavy SPAs. WebShift parses static HTML — it doesn't execute JS.
  • You need to render or screenshot a page preserving its visual layout. WebShift processes HTML structure but does not render CSS or compute layout. (Note: text-map does preserve the DOM structure for text rewriting use cases.)
  • You're building a web scraper that needs to navigate across pages, fill forms, or handle authentication flows.

How it works

Question
  |
  +- (optional) LLM query expansion -> multiple search variants
  |
  +- Search via backend (SearXNG, Brave, Tavily, Exa, SerpAPI, Google, Bing, HTTP)
  |
  +- Deduplicate + filter binary URLs
  |
  +- Streaming fetch with per-page size cap
  |
  +- HTML cleaning -> plain text (noise elements, scripts, nav removed)
  |
  +- Unicode/BiDi sterilization
  |
  +- BM25 deterministic reranking
  |   +- (optional) LLM-assisted tier-2 reranking
  |
  +- Budget-aware truncation across all sources
  |
  +- (optional) LLM Markdown summary with inline citations
  |
  +- Structured JSON output

For a detailed explanation of each pipeline stage, BM25 parameters, adaptive budget allocation, and real compression metrics see Under the Hood. For the full configuration reference (TOML, env vars, CLI args) see Configuration. For ready-to-use examples see Use Cases.


Installation

Binary (MCP server)

cargo install webshift-mcp

The binary is called mcp-webshift.

From source

cargo install --path crates/webshift-mcp

As a library

# Full pipeline (search + fetch + clean + rerank)
webshift = "0.2"

# Cleaner + fetcher only (no search backends)
webshift = { version = "0.2", default-features = false }

# Text-map only (extract/replace text nodes in HTML)
webshift = { version = "0.2", default-features = false, features = ["text-map"] }

# Everything including LLM features
webshift = { version = "0.2", features = ["llm"] }

Quick start

1. Set up a search backend

The easiest option is SearXNG — free, self-hosted, no API key:

docker run -d -p 8080:8080 searxng/searxng

No Docker? Use a cloud backend — see Search backends.

2. Configure your MCP client

{
  "mcpServers": {
    "webshift": {
      "command": "mcp-webshift",
      "args": ["--default-backend", "searxng"]
    }
  }
}

That's it. The agent now has webshift_query, webshift_fetch, and webshift_onboarding.

For client-specific setup see docs/integrations/.


MCP tools

ToolDescription
webshift_queryFull search pipeline: search + fetch + clean + rerank + (optional) summarize
webshift_fetchSingle page fetch and clean
webshift_onboardingReturns a JSON guide for the agent (budgets, backends, tips)

webshift_query parameters

ParameterTypeDefaultDescription
queriesstring or listrequiredSearch query or list of queries
num_results_per_queryinteger5Results per query
langstringnoneLanguage filter (e.g. "en")
backendstringconfig defaultOverride search backend

Configuration

Resolution order (highest priority first):

  1. CLI args — --default-backend, --brave-api-key, etc.
  2. Environment variables — WEBSHIFT_* prefix
  3. Config file — webshift.toml (current dir, then ~/webshift.toml)
  4. Built-in defaults

Config file

[server]
max_query_budget    = 32000   # total char budget across all sources
max_result_length   = 8000    # per-page char cap
max_total_results   = 20      # hard cap on results per call
max_download_mb     = 1       # streaming cap per page (MB)
search_timeout      = 8       # seconds
results_per_query   = 5
oversampling_factor = 2
adaptive_budget     = "auto"  # "auto" | "on" | "off" — budget allocation mode

[backends]
default = "searxng"

[backends.searxng]
url = "http://localhost:8080"

[backends.brave]
api_key = "BSA-..."

[backends.tavily]
api_key = "tvly-..."

[backends.exa]
api_key = "..."

[backends.serpapi]
api_key = "..."
engine  = "google"    # google | bing | duckduckgo | yandex

[backends.google]
api_key = "..."
cx      = "..."       # Custom Search Engine ID

[backends.bing]
api_key = "..."
market  = "en-US"

[backends.http]
url           = "https://my-search.example.com/api/search"
query_param   = "q"
count_param   = "limit"
results_path  = "data.items"     # dot-path to results array in JSON response
title_field   = "title"
url_field     = "link"
snippet_field = "description"

[backends.http.headers]
"Authorization" = "Bearer my-token"

[llm]
enabled               = false
base_url              = "http://localhost:11434/v1"   # OpenAI-compatible
api_key               = ""
model                 = "gemma3:27b"
timeout               = 60
expansion_enabled     = true
summarization_enabled = true
llm_rerank_enabled    = false

For every setting with all three config methods (TOML, env vars, CLI args) and plain-language descriptions, see the full Configuration Reference. Ready-to-use config examples are in Use Cases and examples/.

Key environment variables

WEBSHIFT_DEFAULT_BACKEND=searxng
WEBSHIFT_SEARXNG_URL=http://localhost:8080
WEBSHIFT_BRAVE_API_KEY=BSA-xxx
WEBSHIFT_GOOGLE_API_KEY=xxx
WEBSHIFT_GOOGLE_CX=xxx
WEBSHIFT_BING_API_KEY=xxx
WEBSHIFT_LLM_ENABLED=true
WEBSHIFT_LLM_BASE_URL=http://localhost:11434/v1
WEBSHIFT_LLM_MODEL=gemma3:27b

Search backends

BackendAuthNotes
SearXNGnoneSelf-hosted, free. Default: http://localhost:8080
BraveAPI keyFree tier. brave.com/search/api
TavilyAPI keyAI-oriented. tavily.com
ExaAPI keyNeural search. exa.ai
SerpAPIAPI keyMulti-engine proxy (Google, Bing, DDG...). serpapi.com
GoogleAPI key + CXCustom Search. Free: 100 req/day. programmablesearchengine.google.com
BingAPI keyWeb Search API. Free: 1,000 req/month. Microsoft Azure
HTTPconfigurableGeneric REST backend — no code required, TOML-only config

LLM features (optional)

All opt-in — disabled by default, no data leaves your machine unless enabled.

FeatureWhat it does
Query expansionSingle query -> N complementary search variants
SummarizationMarkdown report with inline [1] [2] citations
LLM rerankingTier-2 reranking on top of deterministic BM25

Cross-language normalization (bonus): when BM25 reranking surfaces pages in foreign languages (e.g. Chinese, Japanese, Arabic), the LLM summarizer still produces the final report in the prompt language. The agent receives clean, readable output regardless of the language mix in the source pages.

Works with any OpenAI-compatible API (OpenAI, Ollama, vLLM, LM Studio, etc.):

[llm]
enabled  = true
base_url = "http://localhost:11434/v1"
model    = "gemma3:27b"

Anti-flooding protections

Always active — the core value proposition:

ProtectionDescription
max_download_mbStreaming cap — never buffers full response
max_result_lengthHard cap on characters per cleaned page
max_query_budgetTotal character budget across all sources
max_total_resultsHard cap on results per call
Binary filter.pdf, .zip, .exe, etc. filtered before any network request
Unicode sterilizationBiDi control chars, zero-width chars removed

Library usage

use webshift::{Config, clean, fetch, query};

// Clean raw HTML — cap output at 8000 chars
let result = clean("<html><body><p>Hello world</p></body></html>", 8000);
println!("{}", result.text);

// Pass 0 to disable the per-page cap entirely (no truncation)
let full = clean("<html><body><p>Hello world</p></body></html>", 0);
assert!(!full.truncated);

// Fetch and clean a single page
let config = Config::default();
let page = fetch("https://example.com", &config).await?;

// Full search pipeline
let results = query(&["rust async programming"], &config).await?;
for source in &results.sources {
    println!("[{}] {} — {} chars", source.id, source.title, source.content.len());
}

// Backend partial failures (CAPTCHA, rate-limits, engine outages) are surfaced
// in `warnings` rather than as Err. Empty sources + non-empty warnings means
// "all backends blocked" — distinct from a legitimate "no matches found".
if results.sources.is_empty() && !results.warnings.is_empty() {
    eprintln!("all backends failed: {:?}", results.warnings);
}

Text-map: rewrite HTML content without breaking markup

Extract text nodes, manipulate them (translate, rewrite, simplify), and rebuild the HTML with structure, attributes, and links intact.

use webshift::{extract_text_nodes, replace_text_nodes, TextReplacement};

let html = r#"<p>Hello <a href="https://example.com">world</a></p>"#;
let map = extract_text_nodes(html);
// map.nodes = [(0, "Hello"), (1, "world")]

let replacements = vec![
    TextReplacement { id: 0, text: "Ciao".into() },
    TextReplacement { id: 1, text: "mondo".into() },
];
let result = replace_text_nodes(html, &replacements).unwrap();
// → <p>Ciao <a href="https://example.com">mondo</a></p>
// href untouched, tags intact, only text changed.

Requires features = ["text-map"]. See Use Cases #11 for a full translation example.

Feature flags

FeatureDefaultEnables
backendsonAll search backends + query pipeline
llmoffLLM client, expander, summarizer, LLM reranking
text-mapoffextract_text_nodes() + replace_text_nodes() — DOM round-trip for content rewriting

Integrations

PlatformGuide
Claude Desktop, Claude Code, Cursor, Windsurf, VS CodeIDE Integration
Zed — native extension with auto-download and Configure Server modalZed Extension
Gemini CLI, Claude CLI, custom agentsAgent Integration

Beta Status

WebShift is in beta. Core functionality is stable and the server is used daily, but the API surface may still change before 1.0.

Feedback is very welcome. If something doesn't work as expected, behaves oddly, or you have a use case that isn't covered:

Open an issue on GitHub

Bug reports, configuration questions, and feature requests all help shape the roadmap.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for detailed guidelines on:

  • Development setup and workflow
  • Code style and conventions
  • Testing requirements
  • Documentation standards
  • Pull request process

License

MIT License — see LICENSE for details.

Links

  • GitHub Repository — Source code and issues
  • Docs.rs — API documentation
  • MCP Registry — WebShift on Model Context Protocol Registry
  • MCP Protocol — Model Context Protocol specification

Need help? Check the documentation or open an issue on GitHub.

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Categories
Search & Web Crawling
Registryactive
UpdatedApr 1, 2026
View on GitHub

Related Search & Web Crawling MCP Servers

View all →
Google Search

com.mcparmory/google-search

Scrape Google search results with SERP data, ads, and knowledge panels
25
Brave Search

io.github.pipeworx-io/brave-search

Brave Search MCP — independent web index (no Google/Bing dependency)
Serper Search and Scrape

marcopesani/mcp-server-serper

Serper MCP Server supporting search and webpage scraping
154
Brave Search Mcp Server

brave/brave-search-mcp-server

Brave Search MCP Server: web results, images, videos, rich results, AI summaries, and more.
1.2k
Google Search Console

com.mcparmory/google-search-console

Query search analytics, manage sitemaps, and inspect site URLs and status
25
Google Search Console

acamolese/google-search-console-mcp

Google Search Console MCP server: SEO audits, performance queries, URL inspection, indexing checks.
3