Wraps the Scrapling CLI for extracting content from web pages as HTML, Markdown, or plain text. Starts with static fetching and only escalates to browser automation when JavaScript rendering is actually needed. Includes a diagnostic script that checks your install health before you waste time debugging the wrong layer. The workflow is opinionated: verify the tool works, pick the lightest fetcher that'll succeed, save to a file, then validate what you actually got instead of trusting the exit code. Has specific handling for WeChat public articles and TLS trust store problems. If you're pulling article text or need to decide between curl-style and Playwright-style fetching, this gives you the decision tree and smoke tests up front.
npx -y skills add daymade/claude-code-skills --skill scrapling-skill --agent claude-codeInstalls into .claude/skills of the current project.
Use Scrapling through its CLI as the default path. Start with the smallest working command, validate the saved output, and only escalate to browser-backed fetching when the static fetch does not contain the real page content.
Do not assume the user's Scrapling install is healthy. Verify it first.
Copy this checklist and keep it updated while working:
Scrapling Progress:
- [ ] Step 1: Diagnose the local Scrapling install
- [ ] Step 2: Fix CLI extras or browser runtime if needed
- [ ] Step 3: Choose static or dynamic fetch
- [ ] Step 4: Save output to a file
- [ ] Step 5: Validate file size and extracted content
- [ ] Step 6: Escalate only if the previous path failed
Run the bundled diagnostic script first:
python3 scripts/diagnose_scrapling.py
Use the result as the source of truth for the next step.
If scrapling --help fails with missing click or a message about installing Scrapling with extras, reinstall it with the CLI extra:
uv tool uninstall scrapling
uv tool install 'scrapling[shell]'
Do not default to scrapling[all] unless the user explicitly needs the broader feature set.
Install the Playwright runtime:
scrapling install
If the install looks slow or opaque, read references/troubleshooting.md before guessing. Do not claim success until either:
scrapling install reports that dependencies are already installed, orUse this decision rule:
extract get for normal pages, article pages, and most WeChat public articles.extract fetch when the static HTML does not contain the real content or the page depends on JavaScript rendering.extract stealthy-fetch only after fetch still fails because of anti-bot or challenge behavior. Do not make it the default.Always quote URLs in shell commands. This is mandatory in zsh when the URL contains ?, &, or other special characters.
scrapling extract get 'https://example.com' page.html
scrapling extract get 'https://example.com' article.md -s 'main'
scrapling extract fetch 'https://example.com' page.html --timeout 20000
Use #js_content first. This is the default selector for article body extraction on mp.weixin.qq.com pages.
scrapling extract get 'https://mp.weixin.qq.com/s/ARTICLE_ID?scene=1' article.md -s '#js_content'
After every extraction, verify the file instead of assuming success:
wc -c article.md
sed -n '1,40p' article.md
For HTML output, check that the expected title, container, or selector target is actually present:
rg -n '<title>|js_content|rich_media_title|main' page.html
If the file is tiny, empty, or missing the expected container, the extraction did not succeed. Go back to Step 3 and switch fetchers or selectors.
If extract get fails with curl: (60) SSL certificate problem, treat it as a local trust-store problem first, not a Scrapling content failure.
Retry the same command with:
--no-verify
Only do this after confirming the failure matches the local certificate verification error pattern. Do not silently disable verification by default.
For mp.weixin.qq.com:
extract get before extract fetch-s '#js_content' for the article bodyIf extract fetch fails:
python3 scripts/diagnose_scrapling.pystealthy-fetch only if the site behavior justifies itpython3 scripts/diagnose_scrapling.py --url 'https://example.com'
python3 scripts/diagnose_scrapling.py \
--url 'https://mp.weixin.qq.com/s/ARTICLE_ID?scene=1' \
--selector '#js_content' \
--no-verify
python3 scripts/diagnose_scrapling.py \
--url 'https://example.com' \
--dynamic
scripts/diagnose_scrapling.pyreferences/troubleshooting.mdjuliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills