This connects Claude to Apify's web scraping platform through their REST API. You get access to both general purpose scrapers and specialized actors for Google Search, Instagram, Amazon, and other platforms. The skill documentation walks through async and sync run patterns, including the full workflow of starting a scraper, polling for completion, and fetching results from datasets. It's useful when you need to extract structured data from websites without writing scraping code yourself. The examples are thorough but watch out for the placeholder IDs in the docs, you need to capture actual run and dataset IDs from API responses. Apify handles the infrastructure and browser automation while you just configure what to scrape and where.
npx -y skills add vm0-ai/vm0-skills --skill apify --agent claude-codeInstalls into .claude/skills of the current project.
If requests fail, run zero doctor check-connector --env-name APIFY_TOKEN or zero doctor check-connector --url https://api.apify.com/v2/acts/apify~web-scraper/runs --method POST
Start an Actor run asynchronously:
Write to /tmp/apify_request.json:
{
"startUrls": [{"url": "https://example.com"}],
"maxPagesPerCrawl": 10,
"pageFunction": "async function pageFunction(context) { const { request, log, jQuery } = context; const $ = jQuery; const title = $(\"title\").text(); return { url: request.url, title }; }"
}
Then run:
curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" --header "Authorization: Bearer $APIFY_TOKEN" --header "Content-Type: application/json" -d @/tmp/apify_request.json
Response contains id (run ID) and defaultDatasetId for fetching results.
Wait for completion and get results directly (max 5 min):
Write to /tmp/apify_request.json:
{
"startUrls": [{"url": "https://news.ycombinator.com"}],
"maxPagesPerCrawl": 1,
"pageFunction": "async function pageFunction(context) { const { request, log, jQuery } = context; const $ = jQuery; const title = $(\"title\").text(); return { url: request.url, title }; }"
}
Then run:
curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/run-sync-get-dataset-items" --header "Authorization: Bearer $APIFY_TOKEN" --header "Content-Type: application/json" -d @/tmp/apify_request.json
⚠️ Important: The
{runId}below is a placeholder - replace it with the actual run ID from your async run response (found in.data.id). See the complete workflow example below.
Poll the run status:
# Replace {runId} with actual ID like "HG7ML7M8z78YcAPEB"
curl -s "https://api.apify.com/v2/actor-runs/{runId}" --header "Authorization: Bearer $APIFY_TOKEN" | jq -r '.data.status'
Complete workflow example (capture run ID and check status):
Write to /tmp/apify_request.json:
{
"startUrls": [{"url": "https://example.com"}],
"maxPagesPerCrawl": 10
}
Then run:
# Step 1: Start an async run and capture the run ID
RUN_ID=$(curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" --header "Authorization: Bearer $APIFY_TOKEN" --header "Content-Type: application/json" -d @/tmp/apify_request.json | jq -r '.data.id')
# Step 2: Check the run status
curl -s "https://api.apify.com/v2/actor-runs/${RUN_ID}" --header "Authorization: Bearer $APIFY_TOKEN" | jq '.data.status'
Statuses: READY, RUNNING, SUCCEEDED, FAILED, ABORTED, TIMED-OUT
⚠️ Important: The
{datasetId}below is a placeholder - do not use it literally! You must replace it with the actual dataset ID from your run response (found in.data.defaultDatasetId). See the complete workflow example below for how to capture and use the real ID.
Fetch results from a completed run:
# Replace {datasetId} with actual ID like "WkzbQMuFYuamGv3YF"
curl -s "https://api.apify.com/v2/datasets/{datasetId}/items" --header "Authorization: Bearer $APIFY_TOKEN"
Complete workflow example (run async, wait, and fetch results):
Write to /tmp/apify_request.json:
{
"startUrls": [{"url": "https://example.com"}],
"maxPagesPerCrawl": 10
}
Then run:
# Step 1: Start async run and capture IDs
RESPONSE=$(curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" --header "Authorization: Bearer $APIFY_TOKEN" --header "Content-Type: application/json" -d @/tmp/apify_request.json)
RUN_ID=$(echo "$RESPONSE" | jq -r '.data.id')
DATASET_ID=$(echo "$RESPONSE" | jq -r '.data.defaultDatasetId')
# Step 2: Wait for completion (poll status)
while true; do
STATUS=$(curl -s "https://api.apify.com/v2/actor-runs/${RUN_ID}" --header "Authorization: Bearer $APIFY_TOKEN" | jq -r '.data.status')
echo "Status: $STATUS"
[[ "$STATUS" == "SUCCEEDED" ]] && break
[[ "$STATUS" == "FAILED" || "$STATUS" == "ABORTED" ]] && exit 1
sleep 5
done
# Step 3: Fetch the dataset items
curl -s "https://api.apify.com/v2/datasets/${DATASET_ID}/items" --header "Authorization: Bearer $APIFY_TOKEN"
With pagination:
# Replace {datasetId} with actual ID
curl -s "https://api.apify.com/v2/datasets/{datasetId}/items?limit=100&offset=0" --header "Authorization: Bearer $APIFY_TOKEN"
Write to /tmp/apify_request.json:
{
"queries": "web scraping tools",
"maxPagesPerQuery": 1,
"resultsPerPage": 10
}
Then run:
curl -s -X POST "https://api.apify.com/v2/acts/apify~google-search-scraper/run-sync-get-dataset-items?timeout=120" --header "Authorization: Bearer $APIFY_TOKEN" --header "Content-Type: application/json" -d @/tmp/apify_request.json
Write to /tmp/apify_request.json:
{
"startUrls": [{"url": "https://docs.example.com"}],
"maxCrawlPages": 10,
"crawlerType": "cheerio"
}
Then run:
curl -s -X POST "https://api.apify.com/v2/acts/apify~website-content-crawler/run-sync-get-dataset-items?timeout=300" --header "Authorization: Bearer $APIFY_TOKEN" --header "Content-Type: application/json" -d @/tmp/apify_request.json
Write to /tmp/apify_request.json:
{
"directUrls": ["https://www.instagram.com/apaborotnikov/"],
"resultsType": "posts",
"resultsLimit": 10
}
Then run:
curl -s -X POST "https://api.apify.com/v2/acts/apify~instagram-scraper/runs" --header "Authorization: Bearer $APIFY_TOKEN" --header "Content-Type: application/json" -d @/tmp/apify_request.json
Write to /tmp/apify_request.json:
{
"categoryOrProductUrls": [{"url": "https://www.amazon.com/dp/B0BSHF7WHW"}],
"maxItemsPerStartUrl": 1
}
Then run:
curl -s -X POST "https://api.apify.com/v2/acts/junglee~amazon-crawler/runs" --header "Authorization: Bearer $APIFY_TOKEN" --header "Content-Type: application/json" -d @/tmp/apify_request.json
Get recent Actor runs:
curl -s "https://api.apify.com/v2/actor-runs?limit=10&desc=true" --header "Authorization: Bearer $APIFY_TOKEN" | jq '.data.items[] | {id, actId, status, startedAt}'
⚠️ Important: The
{runId}below is a placeholder - replace it with the actual run ID. See the complete workflow example below.
Stop a running Actor:
# Replace {runId} with actual ID like "HG7ML7M8z78YcAPEB"
curl -s -X POST "https://api.apify.com/v2/actor-runs/{runId}/abort" --header "Authorization: Bearer $APIFY_TOKEN"
Complete workflow example (start a run and abort it):
Write to /tmp/apify_request.json:
{
"startUrls": [{"url": "https://example.com"}],
"maxPagesPerCrawl": 100
}
Then run:
# Step 1: Start an async run and capture the run ID
RUN_ID=$(curl -s -X POST "https://api.apify.com/v2/acts/apify~web-scraper/runs" --header "Authorization: Bearer $APIFY_TOKEN" --header "Content-Type: application/json" -d @/tmp/apify_request.json | jq -r '.data.id')
echo "Started run: $RUN_ID"
# Step 2: Abort the run
curl -s -X POST "https://api.apify.com/v2/actor-runs/${RUN_ID}/abort" --header "Authorization: Bearer $APIFY_TOKEN"
Browse public Actors:
curl -s "https://api.apify.com/v2/store?limit=20&category=ECOMMERCE" --header "Authorization: Bearer $APIFY_TOKEN" | jq '.data.items[] | {name, username, title}'
| Actor ID | Description |
|---|---|
apify/web-scraper | General web scraper |
apify/website-content-crawler | Crawl entire websites |
apify/google-search-scraper | Google search results |
apify/instagram-scraper | Instagram posts/profiles |
junglee/amazon-crawler | Amazon products |
apify/twitter-scraper | Twitter/X posts |
apify/youtube-scraper | YouTube videos |
apify/linkedin-scraper | LinkedIn profiles |
lukaskrivka/google-maps | Google Maps places |
Find more at: https://apify.com/store
| Parameter | Type | Description |
|---|---|---|
timeout | number | Run timeout in seconds |
memory | number | Memory in MB (128, 256, 512, 1024, 2048, 4096) |
maxItems | number | Max items to return (for sync endpoints) |
build | string | Actor build tag (default: "latest") |
waitForFinish | number | Wait time in seconds (for async runs) |
Run object:
{
"data": {
"id": "HG7ML7M8z78YcAPEB",
"actId": "HDSasDasz78YcAPEB",
"status": "SUCCEEDED",
"startedAt": "2024-01-01T00:00:00.000Z",
"finishedAt": "2024-01-01T00:01:00.000Z",
"defaultDatasetId": "WkzbQMuFYuamGv3YF",
"defaultKeyValueStoreId": "tbhFDFDh78YcAPEB"
}
}
run-sync-get-dataset-items for quick tasks (<5 min), async for longer jobslimit and offset for large datasetsjuliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills