Routes your video request through RunComfy's CLI to the right model: HappyHorse 1.0 if you want in-pass audio, Veo 3-1 for physics-accurate product spins, Wan 2-7 when you need lip-sync to a specific voiceover file, Kling 3.0 4K for final delivery, Seedance v2 for multi-reference cinematic work. Covers text-to-video, image-to-video, and video extension. The model catalog is deep enough that picking the wrong one wastes time and money, so the skill ships prompting patterns and intent heuristics for each. Helpful if you're doing ad creative or social clips at volume and don't want to memorize which ByteDance tier does what.
npx -y skills add doany-ai/skills --skill ai-video-generation --agent claude-codeInstalls into .claude/skills of the current project.
Generate videos with the full RunComfy video-model catalog through one CLI — text-to-video, image-to-video, and Veo's video-extend. This skill picks the right model for the user's intent and ships the documented prompt patterns + the exact runcomfy run invoke for each.
runcomfy.com · Video models · CLI docs
# 1. Install (see runcomfy-cli skill for details)
npm i -g @runcomfy/cli # or: npx -y @runcomfy/cli --version
# 2. Sign in
runcomfy login # or in CI: export RUNCOMFY_TOKEN=<token>
# 3. Generate
runcomfy run <vendor>/<model>/<endpoint> \
--input '{"prompt": "..."}' \
--output-dir ./out
CLI deep dive: runcomfy-cli skill.
npx skills add agentspace-so/runcomfy-agent-skills --skill ai-video-generation -g
HappyHorse 1.0 — happyhorse/happyhorse-1-0/text-to-video (default)
Currently #1 on Artificial Analysis Video Arena. Native synchronized audio generated in-pass (no separate Foley step). Native 1080p, up to ~15s, strong multi-shot character consistency. Pick for: general-purpose t2v, ad creative with audio, social-media clips, multi-shot narratives. Avoid for: audio-driven lip-sync to a specific voiceover MP3 — use Wan 2-7.
Kling 3.0 4K — kling/kling-3.0/4k/text-to-video
Kling's latest, 4K output, strong multi-shot character identity, premium camera language. Pick for: hero shots, final-delivery 4K cuts, multi-shot character narratives. Avoid for: cost-sensitive iteration — drop to Kling 2-6 Pro or Standard i2v.
Seedance v2 Pro — bytedance/seedance-v2/pro
ByteDance flagship — multi-modal (up to 9 reference images, 3 reference videos, 3 reference audio), in-pass synchronized audio, cinematic motion refinement, lens language honored. Pick for: cinematic ad frames, multi-reference composition (subject + scene + audio refs), 21:9 anamorphic looks. Avoid for: simple "single prompt → clip" jobs — overpowered, slower.
Seedance v2 Fast — bytedance/seedance-v2/fast
Faster variant of Seedance v2 Pro, same multi-modal capabilities. Pick for: iteration on Seedance v2 compositions before locking a final on Pro. Avoid for: hero-shot final delivery.
Wan 2-7 — wan-ai/wan-2-7/text-to-video
Open-weights flagship,
audio_urlfield for audio-driven lip-sync, pairs natively with Wan image models. Pick for: dialog scenes where mouth must sync to a specific voiceover file; open-weights pipeline requirement. Avoid for: in-pass audio generation (no MP3 input) — use HappyHorse 1.0.
Kling 2-6 Pro — kling/kling-2-6/pro/text-to-video
Previous Kling tier — still strong quality at much lower cost than 3.0 4K. Pick for: production at scale where 3.0 4K is too expensive. Avoid for: top-tier hero shots — use Kling 3.0 4K.
Seedance 1-5 Pro — bytedance/seedance-1-5/pro/text-to-video
Previous Seedance generation, cheaper. Pick for: identity-stable batches between 1-5 generations; cost-sensitive baseline. Avoid for: new work — prefer Seedance v2 Pro or Fast.
HappyHorse 1.0 I2V — happyhorse/happyhorse-1-0/image-to-video (default)
Animate any still with in-pass audio described in prompt, strong identity preservation. Pick for: animating a generated portrait or product still, vertical social clips, voiceover-described audio. Avoid for: physics-accurate object motion — use Veo 3-1.
Veo 3-1 — google-deepmind/veo-3-1/image-to-video
Google's flagship — physics-respecting motion, strong object permanence ("rotates 180 degrees" = 180°), pairs with
extend-videofor longer clips. Pick for: product spins, physics-accurate motion, scenes where "no other motion" must hold. Avoid for: audio-driven dialog — use Wan 2-7 or HappyHorse.
Veo 3-1 Fast — google-deepmind/veo-3-1/fast/image-to-video
Faster Veo 3-1 variant. Pick for: iteration on Veo compositions. Avoid for: hero delivery — use full Veo 3-1.
Kling 3.0 4K I2V — kling/kling-3.0/4k/image-to-video
Multi-shot character identity, 4K output from a still. Pick for: 4K hero shots, character-narrative cuts. Avoid for: cost iteration — drop to Pro or Standard.
Kling 3.0 Pro I2V — kling/kling-3.0/pro/image-to-video
Default Kling 3.0 quality tier. Pick for: high-quality i2v at moderate cost. Avoid for: 4K final delivery.
Kling 3.0 Standard I2V — kling/kling-3.0/standard/image-to-video
Cheapest 3.0 i2v tier. Pick for: concepting / drafts on Kling 3.0. Avoid for: final delivery.
Hailuo 2-3 Pro — minimax/hailuo-2-3/pro/image-to-video
MiniMax Hailuo latest — natural motion, strong on real-world subjects. Pick for: lifelike motion of real-people / real-product subjects. Avoid for: stylized characters — use Kling or Dreamina.
Dreamina 3-0 Pro — bytedance/dreamina-3-0/pro/image-to-video
ByteDance Dreamina i2v — illustration / stylized character lean. Pick for: animating illustrated heroes, painterly stills. Avoid for: photoreal motion.
Seedance 1-0 Pro Fast — bytedance/seedance-1-0/pro/fast/image-to-video
Older Seedance i2v generation, cheap. Pick for: cost-sensitive batch i2v on Seedance. Avoid for: new work — Seedance v2 Pro is more capable (t2v + i2v + multi-modal).
Veo 3-1 Extend — google-deepmind/veo-3-1/extend-video
Continue an existing Veo clip with consistent motion / lighting / identity. Pick for: extending a video past Veo's per-call duration cap; chained narrative shots.
Veo 3-1 Fast Extend — google-deepmind/veo-3-1/fast/extend-video
Faster Veo extend variant. Pick for: extending Veo Fast clips at matching latency tier.
For dedicated treatment of extend (input video preparation, frame-anchor strategy, chained extends), see the video-extend skill.
Model: happyhorse/happyhorse-1-0/text-to-video
Catalog: happyhorse-1-0
Currently #1 on the Artificial Analysis Video Arena — RunComfy's recommended default for general-purpose t2v. Native synchronized audio is generated in-pass (no separate Foley step).
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | — | Subject-first, describe motion + scene + audio in one declarative |
duration | int | no | 5 | Seconds. Up to ~15s |
aspect_ratio | enum | no | 16:9 | 16:9, 9:16, 1:1 typical |
resolution | enum | no | 1080p | 720p, 1080p |
seed | int | no | — | Reproducibility |
runcomfy run happyhorse/happyhorse-1-0/text-to-video \
--input '{
"prompt": "A red kite tumbles across a windy beach at golden hour, kids chasing it laughing, surf in the background. Audio: wind, gulls, distant laughter.",
"duration": 8,
"aspect_ratio": "16:9",
"resolution": "1080p"
}' \
--output-dir ./out
"Audio: wind, gulls, distant laughter." HappyHorse generates audio in-pass.Model: wan-ai/wan-2-7/text-to-video
Catalog: wan-2-7 · wan-models collection
Pick Wan 2-7 when you have a specific voiceover / dialog audio file and want the on-screen subject's mouth to sync to it. The audio_url field drives the lip motion.
With audio-driven lip-sync:
runcomfy run wan-ai/wan-2-7/text-to-video \
--input '{
"prompt": "Studio portrait of a woman in her 30s speaking confidently to camera, soft window light.",
"audio_url": "https://your-cdn.example/voiceover.mp3",
"duration": 6
}' \
--output-dir ./out
Plain t2v (no audio):
runcomfy run wan-ai/wan-2-7/text-to-video \
--input '{"prompt": "Drone shot over forest canopy at sunrise, soft fog drifting between trees"}' \
--output-dir ./out
Model: bytedance/seedance-v2/pro (or /fast)
Catalog: seedance-v2 Pro · seedance collection
Pick Seedance v2 Pro when the user needs multi-modal conditioning — up to 9 reference images, 3 reference videos, 3 reference audio tracks synthesized in-pass with cinematic motion refinement.
runcomfy run bytedance/seedance-v2/pro \
--input '{
"prompt": "Anamorphic 35mm shot — a vintage car drives down a coastal road at dusk, lens flares from oncoming headlights, cinematic color grade.",
"duration": 10,
"aspect_ratio": "21:9"
}' \
--output-dir ./out
"subject from ref image 1, mood from ref video 2, score from ref audio 1".Model: happyhorse/happyhorse-1-0/image-to-video
Catalog: happyhorse-1-0 i2v
runcomfy run happyhorse/happyhorse-1-0/image-to-video \
--input '{
"image_url": "https://your-cdn.example/portrait.jpg",
"prompt": "She turns her head slowly to look at the camera and smiles. Wind through her hair. Audio: gentle breeze.",
"duration": 6,
"aspect_ratio": "9:16"
}' \
--output-dir ./out
Model: google-deepmind/veo-3-1/image-to-video (or /fast/image-to-video)
Catalog: veo-3-1 i2v · veo-3 collection
Pick Veo when physics / realism / object permanence matters most. Veo 3-1 supports both 8s clips and longer with the extend-video companion endpoint.
runcomfy run google-deepmind/veo-3-1/image-to-video \
--input '{
"image_url": "https://your-cdn.example/product.jpg",
"prompt": "The bottle slowly rotates 180 degrees on a marble surface, soft daylight, no other motion."
}' \
--output-dir ./out
Model: kling/kling-3.0/{4k,pro,standard}/image-to-video
Catalog: kling collection
Three tiers — pick by quality / cost trade-off:
| Tier | Endpoint | When |
|---|---|---|
| 4K | kling/kling-3.0/4k/image-to-video | Hero shots, final delivery at 4K |
| Pro | kling/kling-3.0/pro/image-to-video | Default — high quality at lower cost |
| Standard | kling/kling-3.0/standard/image-to-video | Concepting, drafts |
runcomfy run kling/kling-3.0/pro/image-to-video \
--input '{
"image_url": "https://your-cdn.example/character.jpg",
"prompt": "The character walks toward the camera, soft handheld feel, end on a medium close-up."
}' \
--output-dir ./out
| Endpoint | When |
|---|---|
minimax/hailuo-2-3/pro/image-to-video · /standard/image-to-video | MiniMax Hailuo — natural motion, strong on real-world subjects |
bytedance/dreamina-3-0/pro/image-to-video | Dreamina — illustrative / concept art lean |
bytedance/seedance-1-0/pro/fast/image-to-video | Seedance 1-0 — cheaper baseline |
kling/kling-video-o1/standard | Kling Video O1 — reasoning-style video model |
kling/kling-2-6/motion-control-pro | Transfer motion from a reference video onto a target character |
Schemas live on each model page — pass field set through the CLI verbatim.
aspect_ratio: "9:16", duration: 6, audio described inline"rotates 180 degrees, no other motion" — Veo respects physicsaudio_url pointing at your voiceover MP3video-extend skillai-avatar-video skill for OmniHuman + HappyHorse + Wan compositionkling · seedance · veo-3 · hailuo · wan-models · dreamina brand collections/models/feature/lip-sync · /feature/character-swap · /feature/upscale-video capability tags| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
The skill classifies the user request into one of the t2v / i2v / extend routes above and invokes runcomfy run <model_id> with the matching JSON body. The CLI POSTs to the RunComfy Model API, polls request status, fetches the result, and downloads any .runcomfy.net / .runcomfy.com URLs into --output-dir. Ctrl-C cancels the remote request before exit.
npm i -g @runcomfy/cli or npx -y @runcomfy/cli. Agents must not pipe an arbitrary remote install script into a shell on the user's behalf.runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600. Set RUNCOMFY_TOKEN env var to bypass the file in CI / containers. Never echo the token into a prompt, log it, or check it in.--input. The CLI does not shell-expand prompt content. No shell-injection surface from prompt content.model-api.runcomfy.net and *.runcomfy.net / *.runcomfy.com. No telemetry, no callbacks.allowed-tools: Bash(runcomfy *). The skill never instructs the agent to run anything other than runcomfy <subcommand> — install lines are one-time operator setup.runcomfy-cli — the underlying CLI, schema discovery, polling modes, scriptingai-image-generation — text-to-image / image-to-image siblingai-avatar-video — talking-head / lip-sync video specialistimage-to-video — animate a still (i2v-focused router)video-edit — restyle / motion-control / identity edit on existing videovideo-extend — continue an existing clip via Veo extendlipsync · face-swap — narrow technique routerssickn33/antigravity-awesome-skills
moizibnyousaf/ai-agent-skills
github/awesome-copilot