CAT
/Skills
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Web Crawler

starchild-ai-agent/official-skills
1.8k installs13 stars
Summary

This is the fallback scraper you reach for when web_fetch hits a wall or you need social media data. It routes to ScrapeCreators for 27+ platforms (TikTok, Instagram, YouTube, LinkedIn, Reddit, Threads, the whole roster), SerpApi for YouTube transcripts, and Firecrawl when a page blocks basic fetching. The OpenAPI routing is smart: you specify intent (profile, posts, single video), it fetches the spec for that endpoint, then makes the call. Costs money per request, so use native fetch first and invoke this when you actually need profiles, comments, transcripts, or anti-bot evasion. The scope is huge but the use case is clear: structured extraction from platforms that don't want to be scraped.

Install to Claude Code

npx -y skills add starchild-ai-agent/official-skills --skill web-crawler --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Files
SKILL.mdView on GitHub

Preferred entry: call exports.py (don't hand-roll requests)

Ready-made helpers live in skills/web-crawler/exports.py. Prefer them over writing your own proxied_get/proxied_post calls — they already inject the proxy credentials, so there is no API key to find (don't read $SCRAPECREATORS_API_KEY / $FIRECRAWL_API_KEY, don't check .env, don't ask the user).

import sys; sys.path.insert(0, "/data/workspace/skills/web-crawler")
from exports import scrape_markdown, youtube_transcript, sc_get
scrape_markdown("https://example.com/article")          # Firecrawl fallback
youtube_transcript("https://youtube.com/watch?v=ID")     # ScrapeCreators
sc_get("/v1/tiktok/profile", handle="charlidamelio")     # any SC endpoint

from exports import archive_fallback                      # paywall / Firecrawl-403
archive_fallback("https://www.nytimes.com/.../article.html")  # archive snapshot

Named wrappers exist for the high-frequency actions (YouTube/TikTok transcript & video, IG/Twitter/Reddit posts, profiles, Google/Reddit search). For any other ScrapeCreators endpoint use sc_get(path, **params) — it auto-strips leading @/# from handles/hashtags. The intent-routing tables below still tell you which endpoint to pass. Pass caller_id="chat:<thread>" (or job:/preview:) for cost tracking.

Quick trigger rules (read this first)

Use this skill immediately when any of these conditions is true:

  • web_fetch returns HTTP 401/403/429/5xx
  • Response is an anti-bot challenge page (for example Cloudflare "Attention Required", "Just a moment", or challenge/captcha pages)
  • The page is JS-heavy and the first fetch misses required detail fields (for example publish time, author, listing code, updated time, price breakdown)
  • Search results do not contain the requested field and the value must be extracted from the target page itself

Fallback rule:

  • If ordinary fetch is blocked or incomplete, switch to Firecrawl fallback in this skill before asking the user for screenshots/manual text.

Error signatures -> action

SignatureAction
web_fetch HTTP 401/403/429/5xxCall Firecrawl POST /v2/scrape once with formats:["markdown","links"] + onlyMainContent:true
Cloudflare/challenge page text in bodySame Firecrawl call as above
Markdown still misses key fieldsRetry once with formats:["rawHtml"]
Firecrawl returns 403 / empty for a social media URLCheck the intent-routing tables below for a ScrapeCreators platform-specific endpoint for this domain (e.g. sc_get('/v1/instagram/post', url=...), sc_get('/v2/tiktok/video', url=...)). If one exists, use it — these have dedicated extraction that bypasses anti-bot. If no dedicated endpoint exists, fall through to archive_fallback or ask the user.
Firecrawl itself returns 403 / empty (hard paywall: NYT, WSJ, Economist, FT, Bloomberg)Call archive_fallback(url) — recovers full text from a web archive snapshot
Need structured data from a China app (抖音/小红书/微博/B站/京东/淘宝/1688/闲鱼/得物 etc.)Call apify_run() — Apify Store has purpose-built actors for these platforms that Firecrawl/ScrapeCreators don't cover

Paywall / Firecrawl-blocked fallback chain (use archive_fallback)

When Firecrawl can't get the page either (it returns 403, or markdown comes back empty) the site is behind a hard paywall or aggressive WAF. Do NOT keep retrying Firecrawl. Recover the article from a web archive snapshot instead:

from exports import archive_fallback
res = archive_fallback("https://www.nytimes.com/.../article.html")
res["markdown"]      # full text, or "" if no snapshot exists anywhere
res["source"]        # "archive.today" | "wayback" | None
res["snapshot_url"]  # the snapshot that was scraped

How it works (and why this order):

  • archive.today first (archive.ph / archive.is mirrors). User-triggered, real-browser captures; historically preserves full text behind paywalls. Best bet for NYT/WSJ/Economist. We scrape its /newest/ snapshot via Firecrawl (archive.today has its own Cloudflare, so scrape it through Firecrawl, never web_fetch it directly).
  • Wayback Machine second (archive.org). Automated crawler that honors robots.txt and paywalls, so it often has NO full text for hard paywalls — but it's a good fallback for ordinary 403/Cloudflare pages that aren't paywalled.

Limitation: archives only return text someone already saved. If res["markdown"] == "", no snapshot exists — stop, tell the user, and try the outlet's official API/RSS or a different source. Do not fabricate the article.

China-app structured data (use apify_run)

When you need structured data from a China app — Douyin video search, Xiaohongshu notes, Weibo posts, Bilibili videos, JD/Taobao product prices, 1688 wholesale listings, Xianyu second-hand, Dewu sneakers, etc. — Firecrawl and ScrapeCreators don't cover these platforms. Use the Apify Store fallback instead. Apify is a serverless scraper marketplace with hundreds of community-maintained actors that run real browsers + proxy pools against Chinese platforms.

Auth: No user-supplied key needed. sc-proxy injects the platform Apify token automatically. The Authorization: Bearer header can be any fake value — the proxy replaces it with the real token. Do NOT read $APIFY_TOKEN from env, do NOT check .env, do NOT ask the user for an Apify key.

When to use Apify (vs Firecrawl/ScrapeCreators):

  • ✅ China apps: 抖音, 小红书, 微博, B站, 京东, 淘宝, 1688, 闲鱼, 得物, 携程, 知乎, 豆瓣, 雪球, 快手, 爱奇艺, 优酷
  • ✅ Southeast Asia e-commerce: Shopee, Lazada, Temu
  • ❌ Western social media (TikTok/Instagram/YouTube/X/Reddit) → use ScrapeCreators first (cheaper)
  • ❌ Generic web page scraping → use Firecrawl first (cheaper)
  • ❌ Hard paywall articles → use archive_fallback (Apify doesn't help here)

How to pick an actor: The reliable-actors catalog is at output/apify_china_reliable.json (sorted by 30-day success count). Pick the top actor for the target platform. A few common ones:

PlatformActor IDInput key
抖音 searchzen-studio~douyin-search-scraper{"keywords": [...], "maxResultsPerQuery": N}
小红书 searchzen-studio~rednote-search-scraper{"keywords": [...], "maxResults": N}
小红书 note detailsian.agency~xiaohongshu-rednote-scraper{"operation": "noteDetail", "noteId": "...", "xsecToken": "..."}
微博 hot searchgentle_cloud~weibo-hot-search-scraper{"mode": "hot_band", "includeScores": true}
微博 postszhorex~weibo-scraper(see actor input schema)
B站 videoszhorex~bilibili-scraper(see actor input schema)
京东 searchzen-studio~jd-com-search-scraper{"keyword": "...", "maxProducts": N}
京东 productssian.agency~jd-com-product-scraper{"operation": "productSearch", "keyword": "...", "maxPages": 1}
淘宝 productssian.agency~taobao-tmall-product-scraper{"operation": "keywordSearch", "keyword": "...", "maxPages": 1}
1688 wholesalezen-studio~1688-wholesale-scraper(see actor input schema)
闲鱼 searchzen-studio~goofish-xianyu-search-scraper(see actor input schema)
TikTokclockworks~tiktok-scraper{"hashtags": [...]} or {"profiles": [...]}

Two-step Xiaohongshu workflow (search → note detail): The search scraper (zen-studio~rednote-search-scraper) returns only a truncated desc (~60 chars). For the full post body, run a second call with sian.agency~xiaohongshu-rednote-scraper in noteDetail mode, passing the id and xsec_token from the search result row. This is the only reliable way to get full note text for price/lodge/itinerary details.

Usage:

import sys
sys.path.insert(0, "/data/workspace/skills/web-crawler")
from exports import apify_run

# Sync run (small batches, ≤100 results): blocks until done, returns result list
results = apify_run("zen-studio~douyin-search-scraper",
                     {"keywords": ["MacBook"], "maxResultsPerQuery": 5})
for r in results:
    print(r.get("text", "")[:80])

# To discover the right input fields for an unfamiliar actor, fetch its
# input-schema page via Firecrawl first:
from exports import scrape_markdown
schema_md = scrape_markdown("https://apify.com/<user>/<actor>/input-schema")

Billing: Apify uses pay-per-event billing (actor-start + per-result + add-ons), billed at 2× the real Apify cost in credits. A typical small search (5–10 results) costs $0.02–0.05. The proxy auto-calculates the charge from the response — you don't need to estimate it yourself.

Error handling:

  • 400 run-failed → bad input (wrong field name). Fetch the actor's input-schema page and fix.
  • 401 → proxy misconfigured (should not happen). Report to user.
  • Empty result [] → actor ran but found nothing. Try different keywords or another actor.
  • Timeout → increase timeout param (default 180s). Some actors are slow.

Cost discipline: Apify actors are more expensive than Firecrawl/ScrapeCreators. Only use Apify when the cheaper options can't get the data (China apps, structured e-commerce fields). For a single web page, always try Firecrawl first.

What each service is for

ScrapeCreators — Social media data extraction (27+ platforms)

Use for any request involving social media profiles, posts, videos, comments, transcripts, search, ads, trending content, or engagement metrics. Covers TikTok, Instagram, YouTube, LinkedIn, Facebook, Twitter/X, Reddit, Threads, Bluesky, Pinterest, Snapchat, Twitch, Kick, Truth Social, TikTok Shop, Google search, and link-in-bio services (Linktree, Komi, Pillar, Linkbio, Linkme, Amazon Shop).

Base URL: https://api.scrapecreators.com Auth: No user-supplied key needed. sc-proxy injects platform credentials automatically — just send the request. The x-api-key header can be any value or omitted entirely. Do NOT bail out or ask the user for a key if $SCRAPECREATORS_API_KEY looks unset; that env var is intentionally not required. Method: All endpoints use GET requests with query params. Responses are JSON.

Firecrawl — Fallback web page scraper

Only a fallback crawler for one web page when ordinary fetching fails. Use POST /v2/scrape with a single url and focused formats like markdown, html, rawHtml, links, summary, or constrained json/question/highlights extraction.

Auth: No user-supplied key needed. sc-proxy injects the Firecrawl credential automatically when you call through core.http_client.proxied_post — just send the request. Do NOT read $FIRECRAWL_API_KEY from env, do NOT check .env for it, and do NOT ask the user for a Firecrawl key if it looks unset; that env var is intentionally not required. The same proxy-injection model as ScrapeCreators applies here.

Do not use Firecrawl crawl/map/search/agent/browser endpoints. Do not request screenshots, audio, branding, images, or browser actions unless the proxy policy is expanded later.


ScrapeCreators — Intent routing

Map user intent to the right endpoint. Endpoint paths use the pattern /v1/platform/action.

Important: After selecting an endpoint from the tables below, fetch its OpenAPI spec at https://docs.scrapecreators.com/{path}/openapi.json for full parameter details, types, and example response before making the actual API call. For example: https://docs.scrapecreators.com/v1/tiktok/profile/openapi.json

Profiles / User Info

PlatformEndpointPrimary ParamExample
TikTok/v1/tiktok/profilehandlestoolpresidente
Instagram/v1/instagram/profilehandlejane
YouTube/v1/youtube/channelhandle, channelId, or urlThePatMcAfeeShow
LinkedIn (person)/v1/linkedin/profileurlhttps://www.linkedin.com/in/parrsam/
LinkedIn (company)/v1/linkedin/companyurlhttps://linkedin.com/company/shopify
Facebook/v1/facebook/profileurlhttps://www.facebook.com/mantraindianfolsom
Twitter/X/v1/twitter/profilehandleelonmusk
Reddit/v1/reddit/subreddit/detailssubreddit or urlAskReddit
Threads/v1/threads/profilehandlezuck
Bluesky/v1/bluesky/profilehandlejay.bsky.team
Pinterest/v1/pinterest/user/boardshandlepinterest
Truth Social/v1/truthsocial/profilehandlerealDonaldTrump
Twitch/v1/twitch/profilehandleninja
Snapchat/v1/snapchat/profilehandledjkhaled

Posts / Content Feeds

PlatformEndpointPrimary ParamExample
TikTok videos/v3/tiktok/profile/videoshandlestoolpresidente
Instagram posts/v2/instagram/user/postshandlejane
Instagram reels/v1/instagram/user/reelshandle or user_idjane or 2700692569
Instagram highlights/v1/instagram/user/highlightshandle or user_idjane or 2700692569
YouTube videos/v1/youtube/channel/videoshandle or channelIdThePatMcAfeeShow
YouTube shorts/v1/youtube/channel/shortshandle or channelIdstarterstory
YouTube playlist/v1/youtube/playlistplaylist_idPLP32wGpgzmIlInfgKVFfCwVsxgGqZNIiS
LinkedIn posts/v1/linkedin/company/postsurlhttps://linkedin.com/company/shopify
Facebook posts/v1/facebook/profile/postsurl or pageIdhttps://www.facebook.com/pacemorby
Facebook reels/v1/facebook/profile/reelsurlhttps://www.facebook.com/Spurs
Facebook photos/v1/facebook/profile/photosurlhttps://www.facebook.com/Spurs
Facebook group posts/v1/facebook/group/postsurl or group_id742354120555345
Twitter tweets/v1/twitter/user/tweetshandleelonmusk
Reddit posts/v1/reddit/subredditsubredditAskReddit
Threads posts/v1/threads/user/postshandlezuck
Bluesky posts/v1/bluesky/user/postshandle or user_idjay.bsky.team
Truth Social posts/v1/truthsocial/user/postshandle or user_idrealDonaldTrump
Pinterest board/v1/pinterest/boardurlhttps://www.pinterest.com/...

Single Post / Video Details

PlatformEndpointPrimary ParamExample
TikTok/v2/tiktok/videourlhttps://www.tiktok.com/@randomspamvideos25/video/7251387037834595630
Instagram/v1/instagram/posturlhttps://www.instagram.com/reel/DOq6eV6iIgD
Instagram highlight/v1/instagram/user/highlight/detailid18067016518767507
YouTube/v1/youtube/videourlhttps://www.youtube.com/watch?v=Y2Ah_DFr8cw
YouTube community post/v1/youtube/community-posturlhttps://www.youtube.com/post/Ugkxvj2KoApYAXoqLWnKVr6zZe5JjeHrQeP8
LinkedIn/v1/linkedin/posturlhttps://www.linkedin.com/pulse/being-father-has-made-me-better-leader...
Facebook/v1/facebook/posturlhttps://www.facebook.com/reel/1535656380759655
Twitter/X/v1/twitter/tweeturlhttps://twitter.com/elonmusk/status/...
Twitter/X community/v1/twitter/communityurlhttps://twitter.com/i/communities/...
Twitter/X community tweets/v1/twitter/community/tweetsurlhttps://twitter.com/i/communities/...
Reddit/v1/reddit/post/commentsurlhttps://www.reddit.com/r/AskReddit/comments/...
Threads/v1/threads/posturlhttps://www.threads.net/@zuck/post/...
Bluesky/v1/bluesky/posturlhttps://bsky.app/profile/.../post/...
Truth Social/v1/truthsocial/posturlhttps://truthsocial.com/@realDonaldTrump/posts/...
Pinterest/v1/pinterest/pinurlhttps://www.pinterest.com/pin/...
Twitch clip/v1/twitch/clipurlhttps://clips.twitch.tv/...
Kick clip/v1/kick/clipurlhttps://kick.com/...

Comments

PlatformEndpointPrimary ParamExample
TikTok/v1/tiktok/video/commentsurlhttps://www.tiktok.com/@stoolpresidente/video/7499229683859426602
Instagram/v2/instagram/post/commentsurlhttps://www.instagram.com/reel/DOq6eV6iIgD
YouTube/v1/youtube/video/commentsurlhttps://www.youtube.com/watch?v=dQw4w9WgXcQ
Facebook/v1/facebook/post/commentsurl or feedback_idhttps://www.facebook.com/reel/753347914167361
Reddit/v1/reddit/post/commentsurlhttps://www.reddit.com/r/AskReddit/comments/...

Transcripts

PlatformEndpointExampleNote
TikTok/v1/tiktok/video/transcripturl=https://www.tiktok.com/...&lang=enalso via /v2/tiktok/video with get_transcript=true
Instagram/v2/instagram/media/transcripturl=https://www.instagram.com/reel/...AI-powered, 10-30s, under 2min
YouTube/v1/youtube/video/transcripturl=https://www.youtube.com/watch?v=bjVIDXPP7Ukalso included in /v1/youtube/video response
Facebook/v1/facebook/post/transcripturl=https://www.facebook.com/reel/...under 2min only
Twitter/X/v1/twitter/tweet/transcripturl=https://twitter.com/...AI-powered, slow

Search

PlatformEndpointPrimary ParamExample
TikTok users/v1/tiktok/search/usersqueryfunny
TikTok videos (keyword)/v1/tiktok/search/keywordqueryfunny
TikTok videos (hashtag)/v1/tiktok/search/hashtaghashtagfyp
TikTok top (photos+videos)/v1/tiktok/search/topqueryfunny
Instagram reels/v2/instagram/reels/searchquerydogs
YouTube/v1/youtube/searchqueryfunny
YouTube hashtag/v1/youtube/search/hashtaghashtagfunny
Reddit (all)/v1/reddit/searchquerybest programming languages
Reddit (in subreddit)/v1/reddit/subreddit/searchsubreddit + queryAskReddit + funny
Threads posts/v1/threads/searchqueryAI
Threads users/v1/threads/search/usersqueryzuck
Pinterest/v1/pinterest/searchqueryhome decor
Google/v1/google/searchquerybest restaurants in NYC

Ad Libraries

PlatformEndpointPrimary ParamExample
Facebook ads search/v1/facebook/adLibrary/search/adsqueryrunning
Facebook company ads/v1/facebook/adLibrary/company/adspageId or companyNameLululemon
Facebook ad detail/v1/facebook/adLibrary/adid or url702369045530963
Facebook find companies/v1/facebook/adLibrary/search/companiesqueryNike
Google company ads/v1/google/company/adsdomain or advertiser_idnike.com
Google ad detail/v1/google/adurlhttps://adstransparency.google.com/...
Google find advertisers/v1/google/adLibrary/advertisers/searchqueryNike
LinkedIn ads search/v1/linkedin/ads/searchcompany or keywordShopify
LinkedIn ad detail/v1/linkedin/adurlhttps://www.linkedin.com/ad/...
Reddit ads search/v1/reddit/ads/searchquerygaming
Reddit ad detail/v1/reddit/adidt3_abc123

Trending / Popular

ContentEndpointParamExample
Trending feed/v1/tiktok/get-trending-feedregion (required)US
Popular videos/v1/tiktok/videos/popular
Popular creators/v1/tiktok/creators/popular
Popular hashtags/v1/tiktok/hashtags/popular
Popular songs/v1/tiktok/songs/popular
Song details/v1/tiktok/songclipId7439295283975702544
Videos using song/v1/tiktok/song/videosclipId7439295283975702544
Trending shorts (YT)/v1/youtube/shorts/trending

Followers / Following / Live (TikTok only)

TypeEndpointExample
Following/v1/tiktok/user/followinghandle=stoolpresidente
Followers/v1/tiktok/user/followershandle=stoolpresidente
Audience demographics/v1/tiktok/user/audience (26 credits!)handle=shakira
Live stream/v1/tiktok/user/livehandle=thejustalex

TikTok Shop

TypeEndpointPrimary ParamExample
Search products/v1/tiktok/shop/searchqueryshoes
Store products/v1/tiktok/shop/productsurlhttps://www.tiktok.com/shop/store/goli-nutrition/7495794203056835079
Product detail/v1/tiktok/producturlhttps://www.tiktok.com/shop/pdp/goli-ashwagandha-gummies.../1729587769570529799
Product reviews/v1/tiktok/shop/product/reviewsurl or product_id1731578642912612516
User showcase/v1/tiktok/user/showcasehandlemrtiktokreviews

Link-in-Bio / Other

ServiceEndpointParamExample
Linktree/v1/linktreeurlhttps://linktr.ee/...
Komi/v1/komiurlhttps://komi.io/...
Pillar/v1/pillarurlhttps://pillar.io/...
Linkbio/v1/linkbiourlhttps://linkbio.co/...
Linkme/v1/linkmeurlhttps://linkme.bio/...
Amazon Shop/v1/amazon/shopurlhttps://www.amazon.com/shop/...
Instagram basic profile/v1/instagram/basic/profileuserId314216
Instagram embed HTML/v1/instagram/user/embedhandlejane
Age/Gender detect/v1/detect/age-genderurl (social profile)https://www.tiktok.com/@charlidamelio
Credit balance/v1/credit/balance(none)

ScrapeCreators pagination

Paginated endpoints return a cursor/token in the response. Pass it back as a query param to get the next page.

Cursor FieldUsed By
cursorTikTok comments/search/song videos, Instagram comments, Reddit subreddit search, Pinterest, Bluesky, Facebook reels/photos/posts/comments, TikTok Shop products/user showcase
max_cursorTikTok profile videos
min_timeTikTok following/followers
continuationTokenYouTube (all paginated endpoints)
afterReddit posts, Reddit search
next_max_idInstagram posts, Truth Social posts
max_idInstagram reels
pageTikTok popular/shop, Instagram reels search, LinkedIn company posts, TikTok Shop reviews
paginationTokenLinkedIn ads

ScrapeCreators known limitations

  • Handles: pass without the @ symbol. Use charlidamelio not @charlidamelio. Applies to TikTok, Instagram, Twitter, Threads, Bluesky, Snapchat, Twitch, Pinterest, Truth Social
  • YouTube handles: pass without the @ symbol. Use ThePatMcAfeeShow not @ThePatMcAfeeShow. You can also pass a channelId or full URL instead
  • Hashtags: pass without the # symbol. Use fyp not #fyp. Applies to TikTok and YouTube hashtag search endpoints
  • Twitter: returns ~100 most popular tweets, not chronological/latest
  • Threads: only last 20-30 posts visible publicly
  • Facebook posts: only 3 posts per page (API limitation)
  • Facebook group posts: only 3 posts per page (same limitation)
  • LinkedIn company posts: max 7 pages
  • Instagram play counts: IG-only views (excludes cross-posted FB views)
  • Truth Social: only prominent users (Trump, Vance, etc.) work publicly
  • Transcripts: all transcript endpoints require video under 2 minutes
  • Reddit subreddit names: case-sensitive! Use "AskReddit" not "askreddit"

Access patterns

ScrapeCreators (social media)

Use Python with core.http_client.proxied_get so sc-proxy injects credentials and bills correctly. Include a typed SC-CALLER-ID header (chat:, job:, preview:, etc.) for cost tracking. Do not read $SCRAPECREATORS_API_KEY from env and do not ask the user for a key — the proxy handles it.

from core.http_client import proxied_get

headers = {"SC-CALLER-ID": "chat:youtube-transcript"}

transcript = proxied_get(
    "https://api.scrapecreators.com/v1/youtube/video/transcript",
    params={"url": "https://www.youtube.com/watch?v=VIDEO_ID", "language": "en"},
    headers=headers,
    timeout=20,
).json()

Bash/curl works too (proxy is transparent), but Python is preferred for cost tracking:

curl -s "https://api.scrapecreators.com/v1/tiktok/profile?handle=charlidamelio" \
  -H "x-api-key: any"

Each endpoint has its own OpenAPI spec at https://docs.scrapecreators.com/{path}/openapi.json. Always fetch the per-endpoint spec first to get full parameter details before making the actual API call. The full spec is at https://docs.scrapecreators.com/openapi.json (large file — prefer per-endpoint specs).

Common optional params:

  • trim (boolean): reduces response payload size. Use when you only need key metrics.
  • region (string): 2-letter country code for proxy location. Does NOT filter by region — just routes through that country's proxy.

Firecrawl (web page fallback via transparent proxy)

No Firecrawl API key required — sc-proxy injects it. Don't look it up in .env or ask the user. Just call:

from core.http_client import proxied_post

headers = {"SC-CALLER-ID": "chat:web-crawl-fallback"}

page = proxied_post(
    "https://api.firecrawl.dev/v2/scrape",
    json={
        "url": "https://example.com/article",
        "formats": ["markdown", "links"],
        "onlyMainContent": True,
        "timeout": 60000,
    },
    headers=headers,
).json()

Decision rules

Route every request to the right backend. The user should never need to specify which API to use.

Social media request (profile, posts, comments, search, ads, trending, transcripts)

Use ScrapeCreators. Match the user's intent to an endpoint from the routing tables above. Strip @ from handles and # from hashtags before calling. Fetch the per-endpoint OpenAPI spec first for full param details.

YouTube URL or YouTube content request

Use ScrapeCreators for YouTube (channel info, videos, shorts, playlists, comments, transcripts, search, trending shorts). Match the user's intent to the appropriate YouTube endpoint from the routing tables above.

For a YouTube URL, use /v1/youtube/video to get video details and transcript. If the user's goal is content analysis, summarization, quote extraction, or topic mining, the transcript is included in the video response.

For a YouTube topic query, use /v1/youtube/search to find relevant videos, then call /v1/youtube/video only for the videos needed. Avoid fetching many videos by default.

Blocked or JS-heavy web page

Use Firecrawl once with formats:["markdown","links"] and onlyMainContent:true. Treat the returned Markdown as the extraction substrate, not as final truth: parse the title, price/value fields, specs, body description, image URLs, outbound links, and obvious contact/location hints from the page structure.

General web-page extraction lessons:

  • Many listing/detail pages render important content with JavaScript, image galleries, hidden sections, or repeated UI labels. web_fetch may return boilerplate while Firecrawl can still recover the real main content.
  • Do not hard-code site-specific labels. Convert page text into a generic structured summary: what it is, where it is, key numbers, evidence snippets, media/links, and caveats.
  • Preserve source URLs for images and links when they help verify the page, but do not download or batch-process every media asset unless the user asks.
  • If Markdown misses important layout or structured fields, retry once with rawHtml; use json, question, or highlights only when the user asked for narrow extraction and the schema/prompt is specific.

Cost discipline

ScrapeCreators — most endpoints cost 1 credit per request. Exceptions: /v1/tiktok/user/audience costs 26 credits; /v1/tiktok/video/transcript with use_ai_as_fallback=true costs +10 credits; /v1/google/company/ads with get_ad_details=true costs 25 credits. Warn users before calling expensive endpoints.

Firecrawl — billed per page plus expensive modifiers.

Keep calls tight: one page, one video, or a small shortlist. Never batch-crawl whole websites or bulk-scrape entire feeds with this skill.

If the proxy returns 403, the request is outside the allowed use case. Change the approach instead of retrying.

If the proxy returns 429, back off; do not parallelize around the limit.

If the upstream returns a failure, report the exact failure and avoid repeated paid retries unless one parameter change is clearly justified.

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Categories
AI & Agent Building
First SeenMay 16, 2026
View on GitHub

Recommended

More AI & Agent Building →
agent-memory-mcp

sickn33/antigravity-awesome-skills

agent memory mcp
954
39.4k
agent-memory-mcp

davila7/claude-code-templates

agent memory mcp
521
27.7k
llm-application-dev-langchain-agent

sickn33/antigravity-awesome-skills

llm application dev langchain agent
306
39.4k
llm-application-dev

moizibnyousaf/ai-agent-skills

Building applications with Large Language Models - prompt engineering, RAG patterns, and LLM integration. Use for AI-powered features, chatbots, or LLM-based automation.
1.1k
ai-prompt-engineering-safety-review

github/awesome-copilot

Comprehensive safety analysis and improvement framework for AI prompts with detailed assessment methodologies.
9.4k
34.3k
emblem-ai-prompt-examples

emblemcompany/agent-skills

emblem ai prompt examples
8.7k
10