ByteDance's Seedance 2.0 Pro generates 4–15 second cinematic video clips with native lip-synced audio, and its real strength is multi-modal references: you can feed it up to 9 images, 3 videos, and 3 audio files in a single call. The prompting model is sensible: stable identity goes in image_url, evolving narrative goes in the text prompt. It's the right pick when you need a spokesperson ad or dialogue piece with consistent branding across languages, or when you want camera-shot grammar without manual compositing. Resolution caps at 720p on the playground tier, and you'll hit schema errors if your reference videos or audio fall outside the 2–15 second window.
npx -y skills add agentspace-so/runcomfy-agent-skills --skill seedance-v2 --agent claude-codeInstalls into .claude/skills of the current project.
runcomfy.com · Seedance 2.0 Pro · GitHub
ByteDance Seedance 2.0 Pro — multimodal cinematic video generator with native lip-synced audio — hosted on the RunComfy Model API.
npx skills add agentspace-so/runcomfy-skills --skill seedance-v2 -g
Seedance 2.0 Pro's distinct strength is multi-modal cinematic short-form: combine character images + scene videos + reference audio into one coherent shot. Pick it when fidelity to a reference identity / scene matters and you want native lip-sync.
| You want | Use |
|---|---|
| Lip-synced spokesperson / dialogue ad | Seedance 2.0 Pro |
| Multi-modal references (image + video + audio) | Seedance 2.0 Pro |
| Brand-consistent multi-language narrative | Seedance 2.0 Pro |
| Currently-#1 blind-vote video quality | HappyHorse 1.0 |
| Audio-driven lip-sync from your own track | Wan 2.7 (audio_url) |
| Motion editing on existing footage | Kling Video O1 |
| Ultra-fast iteration | LTX 2 |
If the user said "Seedance" / "Seedance 2" / "ByteDance video" explicitly, route here regardless.
npm i -g @runcomfy/cliruncomfy login opens a browser device-code flow.RUNCOMFY_TOKEN=<token> instead of runcomfy login.bytedance/seedance-v2/pro| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | — | CN ≤ 500 chars OR EN ≤ 1000 words. |
image_url | array | no | [] | 0–9 references (JPEG/PNG/WebP/BMP/TIFF/GIF). |
video_url | array | no | [] | 0–3 clips (MP4/MOV), 2–15s each. |
audio_url | array | no | [] | 0–3 audio refs (WAV/MP3), 2–15s, < 15MB each. |
aspect_ratio | enum | no | adaptive | adaptive, 16:9, 9:16, 4:3, 3:4, 1:1, 21:9. |
duration | int | no | 5 | 4–15 (whole seconds). |
resolution | enum | no | 720p | 480p or 720p. |
generate_audio | bool | no | true | In-pass synchronized speech / SFX / music. |
seed | int | no | — | Reproducibility. |
Default (text only, 5s, 720p with audio):
runcomfy run bytedance/seedance-v2/pro \
--input '{"prompt": "<user prompt>"}' \
--output-dir <absolute/path>
Lip-synced ad with character reference (image-stable, text-evolves):
runcomfy run bytedance/seedance-v2/pro \
--input '{
"prompt": "Medium close-up. The woman explains today'\''s special in a warm friendly tone, slow push-in, soft window light, gentle cafe ambience.",
"image_url": ["https://.../barista-headshot.jpg"],
"duration": 8,
"aspect_ratio": "9:16"
}' \
--output-dir <absolute/path>
Multi-modal (image + video + audio refs):
runcomfy run bytedance/seedance-v2/pro \
--input '{
"prompt": "Subject from image 1 walks through the café from video 1, voice tone matches audio 1.",
"image_url": ["https://.../subject.jpg"],
"video_url": ["https://.../cafe-locked-shot.mp4"],
"audio_url": ["https://.../voice-ref.mp3"]
}' \
--output-dir <absolute/path>
The CLI submits, polls, fetches the result, downloads *.runcomfy.net/*.runcomfy.com URLs into --output-dir.
Image vs text division. This is the single most important rule. Stable identity (face, costume, brand mark, logo) → put in image_url. Evolving narrative (action, mood, lighting, camera) → put in prompt. Trying to verbally describe a face in detail wastes tokens and produces drift.
Camera + motion in plain language. "Medium close-up", "slow push-in", "handheld follow", "locked-off wide" all work as directives. Combine: "Medium close-up. Slow push-in over 3 seconds. Handheld, slight breathing motion."
Audio direction with generate_audio: true — say the tone: "warm friendly conversational", "calm instructional", "crisp newsroom delivery". For ambient: "gentle cafe chatter, distant traffic, no foreground music".
Reference media specs — videos must be 2–15s; audio must be ≤15MB and 2–15s. Out-of-range files reject. Match aspect ratio of refs to your output to avoid crops.
Anti-patterns:
image_url instead.| Use case | Why Seedance 2.0 Pro |
|---|---|
| Spokesperson / dialogue ads | Native in-pass lip-sync, no separate TTS step |
| Brand-consistent multi-language narratives | Image refs hold identity; text drives translation |
| Cinematic short-form film previs | Camera-shot grammar + multi-modal refs |
| Ad creatives with reference music / VO tone | Audio refs guide voice / mood without locking lip-sync |
| Reproducible variant testing | Seed control + fixed schema |
Default playground example:
Golden hour on a quiet cafe terrace: a barista wipes the counter, then
looks up and explains today's special in a friendly tone, natural
lip-sync. Medium close-up, slow push-in; warm side light, soft bokeh
through glass, gentle cafe ambience and subtle film grain.
Multi-modal lip-sync (text + image):
Same person as image 1 in a softly-lit recording booth, leaning into
the mic, says: "We just shipped the biggest update of the year."
Calm conversational tone. Medium close-up, locked tripod, shallow DOF,
warm key light from camera-left.
@-syntax for character binding — relies on image refs + prompt alignment.| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
The skill invokes runcomfy run bytedance/seedance-v2/pro with a JSON body matching the schema. The CLI POSTs to https://model-api.runcomfy.net/v1/models/bytedance/seedance-v2/pro, polls the request, fetches the result, and downloads any .runcomfy.net/.runcomfy.com URL into --output-dir. Ctrl-C cancels the remote request before exit.
runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600 (owner-only read/write). Set RUNCOMFY_TOKEN env var to bypass the file entirely in CI / containers.--input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist for generated outputs). No telemetry, no callbacks.sickn33/antigravity-awesome-skills
moizibnyousaf/ai-agent-skills
github/awesome-copilot