A proper scraping cascade that tries trafilatura first, falls back to requests with rotated user agents, then escalates to Playwright with stealth mode if the site runs JavaScript or basic anti-bot checks. The code is clean and tracks which method succeeded. The anti-bot landscape section is honest about what playwright-stealth actually handles (navigator.webdriver patches, fingerprint evasion) versus what it doesn't (TLS fingerprinting, Cloudflare Turnstile). The async Playwright variant for Jupyter notebooks is a nice touch since sync Playwright breaks in notebook event loops. This won't beat DataDome or sophisticated bot management, but it covers the 80% case where you just need content extraction with reasonable resilience.
npx -y skills add jamditis/claude-skills-journalism --skill web-scraping --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills