This is an intent router for image-to-video generation on RunComfy that picks between three models based on what the user actually wants. If you need general portrait or product animation with native audio, it calls HappyHorse 1.0 (Arena #1, Elo 1392). If you have a custom voiceover track and need lip-sync, it routes to Wan 2.7 with audio_url. If you're composing a shot from multiple reference inputs (image + video + audio), it uses Seedance 2.0 Pro. The real value is that it bundles each model's documented prompting patterns so you don't waste iterations picking the wrong endpoint or writing prompts that don't match the model's expectations.
npx -y skills add agentspace-so/runcomfy-agent-skills --skill image-to-video --agent claude-codeInstalls into .claude/skills of the current project.
runcomfy.com · HappyHorse I2V · Wan 2.7 · Seedance 2.0 Pro · GitHub
Image-to-video, intent-routed. This skill doesn't lock you to one model — it picks the right i2v model in the RunComfy catalog based on what the user actually wants: portrait animation, custom-voiceover lip-sync, or multi-modal composition.
npx skills add agentspace-so/runcomfy-skills --skill image-to-video -g
| User intent | Model | Why |
|---|---|---|
| Animate a portrait — keep identity stable | HappyHorse 1.0 I2V | #1 on Artificial Analysis Arena (Elo 1392); strong facial fidelity |
| Product reveal / 360 / macro motion | HappyHorse 1.0 I2V | Geometry preservation + smooth camera moves |
| Native synchronized ambient audio in one pass | HappyHorse 1.0 I2V | In-pass audio synthesis |
| Animate and lip-sync to a custom voiceover track | Wan 2.7 + audio_url | Accepts your own MP3/WAV (3–30s, ≤15MB) and drives lip-sync to it |
| Multi-language dub variants (same image, different audio per call) | Wan 2.7 + audio_url | Same shot, swap audio_url per language |
| Multi-modal — image + reference video + reference audio together | Seedance 2.0 Pro | Up to 9 image refs, 3 video refs (2–15s each), 3 audio refs |
| Brand-consistent narrative with character ref + scene ref + voice ref | Seedance 2.0 Pro | Image holds identity, video holds scene, audio holds voice |
| Default if unspecified | HappyHorse 1.0 I2V | Best all-round quality + native audio |
The agent reads this table, classifies the user's intent, and picks the matching subsection below.
npm i -g @runcomfy/cliruncomfy login opens a browser device-code flow.RUNCOMFY_TOKEN=<token>.Model: happyhorse/happyhorse-1-0/image-to-video · Arena rank: #1 (Elo 1392)
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
image_url | string | yes | — | JPEG/JPG/PNG/WEBP. Min 300px. Aspect 1:2.5–2.5:1. ≤10MB. |
prompt | string | yes | — | ≤5000 non-CJK or 2500 CJK chars. Motion / camera / lighting description. |
resolution | enum | no | 1080P | 720P or 1080P. |
duration | int | no | 5 | 3–15 seconds. |
seed | int | no | 0 | Reuse for variant comparisons. |
watermark | bool | no | true | Provider watermark toggle. |
Output aspect = input aspect. No independent reframing.
runcomfy run happyhorse/happyhorse-1-0/image-to-video \
--input '{
"image_url": "https://.../portrait.jpg",
"prompt": "Gentle camera drift around the subject'\''s face, subtle breathing motion, identity-stable features, soft natural light."
}' \
--output-dir <absolute/path>
audio_url — when the user has a custom voiceoverModel: wan-ai/wan-2-7/text-to-video (NOT /image-to-video — Wan 2.7's t2v endpoint accepts an audio_url that drives lip-sync)
Note on i2v with Wan 2.7: Wan 2.7's primary i2v animation isn't on a dedicated endpoint here. For pure i2v (image animated by motion prompt only), prefer HappyHorse i2v. Use Wan 2.7 specifically when the user has a custom audio track they want lip-synced to a generated talking-head clip.
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | — | Up to ~5000 chars. Describe the talking-head shot: framing, lighting, motion. |
audio_url | string | yes (for lip-sync) | — | WAV/MP3, 3–30s, ≤15MB. Drives lip-sync. |
aspect_ratio | enum | no | 16:9 | 16:9, 9:16, 1:1, 4:3, 3:4. |
resolution | enum | no | 1080p | 720p or 1080p. |
duration | enum | no | 5 | 2–15 (whole seconds). Match your audio length. |
negative_prompt | string | no | — | Concrete issues to avoid (e.g. "no subtitles, no flicker"). |
seed | int | no | — | Reproducibility. |
runcomfy run wan-ai/wan-2-7/text-to-video \
--input '{
"prompt": "Medium close-up of a confident spokesperson in a softly-lit recording booth, leaning slightly toward the camera, locked tripod, shallow DOF, warm key light from camera-left.",
"audio_url": "https://.../voiceover-en.mp3",
"duration": 12,
"aspect_ratio": "9:16"
}' \
--output-dir <absolute/path>
duration to audio length — clip will be silent past the audio if too long.negative_prompt for issues: "no subtitles, no flicker, no distorted hands".audio_url per call. Lock seed for visual consistency across languages.Model: bytedance/seedance-v2/pro
Use when the user wants a single clip that combines: a subject image + scene from a reference video + voice tone from a reference audio.
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | — | CN ≤500 chars OR EN ≤1000 words. |
image_url | array | yes (for i2v) | [] | 0–9 images. First is the primary subject. |
video_url | array | no | [] | 0–3 reference clips (MP4/MOV), 2–15s each. |
audio_url | array | no | [] | 0–3 reference audio (WAV/MP3), 2–15s, < 15MB each. |
aspect_ratio | enum | no | adaptive | adaptive, 16:9, 9:16, 4:3, 3:4, 1:1, 21:9. |
duration | int | no | 5 | 4–15 (whole seconds). |
resolution | enum | no | 720p | 480p or 720p. |
generate_audio | bool | no | true | In-pass synchronized speech / SFX / music. |
seed | int | no | — | Reproducibility. |
runcomfy run bytedance/seedance-v2/pro \
--input '{
"prompt": "Subject from image 1 walks through the café in video 1, voice tone matches audio 1. Medium close-up, slow push-in, warm light, gentle ambience.",
"image_url": ["https://.../subject.jpg"],
"video_url": ["https://.../cafe-locked-shot.mp4"],
"audio_url": ["https://.../voice-tone.mp3"],
"duration": 8
}' \
--output-dir <absolute/path>
image_url for what must stay stable (face, costume, brand); use prompt for what should evolve (action, mood, lighting)."subject from image 1, lighting from video 1, voice from audio 1". Seedance routes cues correctly.wan-2-7, seedance-v2) instead of forcing it through here.| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
The skill picks one of HappyHorse 1.0 I2V / Wan 2.7 t2v+audio / Seedance 2.0 Pro based on user intent and invokes runcomfy run <model_id> with the matching JSON body. The CLI POSTs to the Model API, polls the request, fetches the result, and downloads any .runcomfy.net/.runcomfy.com URL into --output-dir. Ctrl-C cancels the remote request before exit.
runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600 (owner-only read/write). Set RUNCOMFY_TOKEN env var to bypass the file entirely in CI / containers.--input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist for generated outputs). No telemetry, no callbacks.sickn33/antigravity-awesome-skills
moizibnyousaf/ai-agent-skills
github/awesome-copilot