Web Scraper

264 installs310 stars

Summary

Straightforward Python script that fetches web pages and converts them to markdown. Good for when you need to read articles or extract content from URLs, especially if Claude's built-in fetching hits network restrictions. It strips out navigation and ads, preserves formatting like headings and code blocks, and can handle multiple URLs in parallel with basic shell scripting. Supports custom timeouts and has a raw HTML mode if you need it. Won't help with JavaScript-heavy sites since it's just HTTP requests and HTML parsing, but for static content it gets the job done without dependencies on external services.

Install to Claude Code

npx -y skills add zephyrwang6/myskill --skill web-scraper --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Files

SKILL.md

Web Scraper

Fetch web page content and convert to clean markdown format.

Usage

Run the fetch script to get web content:

python3 scripts/fetch_url.py <url> [options]

Options

--timeout <seconds>: Request timeout (default: 30)
--max-length <chars>: Maximum output length (default: 100000)
--raw: Output raw HTML instead of markdown

Examples

Fetch single URL:

python3 scripts/fetch_url.py "https://example.com/article"

Fetch with custom timeout:

python3 scripts/fetch_url.py "https://example.com/article" --timeout 60

Fetch multiple URLs in parallel:

for url in "https://url1.com" "https://url2.com"; do
  python3 scripts/fetch_url.py "$url" &
done
wait

Workflow

Single URL: Run fetch_url.py with the URL
Multiple URLs: Run multiple fetch commands in parallel using background processes
Handle errors: If a URL fails, check:
- Network connectivity
- URL validity
- Website may block automated requests (try different User-Agent or use browser automation)

Output Format

The script converts HTML to clean markdown:

Headings → #, ##, ###, etc.
Lists → - for unordered, 1. for ordered
Bold/Italic → **bold**, *italic*
Code blocks preserved
Navigation, footer, and ads removed

Troubleshooting

403 Forbidden: Website blocks automated requests. Consider:

Some sites require JavaScript rendering (not supported by this script)
Try accessing from a different network

Timeout errors: Increase timeout with --timeout 60

Empty content: Website may require JavaScript to render content

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

First SeenJun 3, 2026

View on GitHub

Usage

Run the fetch script to get web content:

python3 scripts/fetch_url.py <url> [options]

Options

--timeout <seconds>: Request timeout (default: 30)

--max-length <chars>: Maximum output length (default: 100000)

--raw: Output raw HTML instead of markdown

Examples

Fetch single URL:

python3 scripts/fetch_url.py "https://example.com/article"

Fetch with custom timeout:

python3 scripts/fetch_url.py "https://example.com/article" --timeout 60

Fetch multiple URLs in parallel:

for url in "https://url1.com" "https://url2.com"; do python3 scripts/fetch_url.py "$url" & done wait

Web Scraper

Install to Claude Code

Web Scraper

Usage

Options

Examples

Workflow

Output Format

Troubleshooting

Web Scraper

Install to Claude Code

Web Scraper

Usage

Options

Examples

Workflow

Output Format

Troubleshooting

Recommended

Recommended