Routes music generation requests across ElevenLabs and ACE Step based on what you actually need. ElevenLabs gives you premium 44.1 kHz stereo vocals at $0.0083/second, ACE Step does tag-driven composition with multilingual lyrics at $0.0002/s, roughly 27 times cheaper. The skill reads intent (polished commercial hook versus background music library, vocal versus instrumental, generate versus edit) and picks the right model automatically. ACE Step also handles inpainting to fix bad sections and outpainting to extend tracks. Useful if you're generating music at any kind of volume and don't want to manually decide between quality and cost every time.
npx -y skills add agentspace-so/runcomfy-agent-skills --skill ai-music --agent claude-codeInstalls into .claude/skills of the current project.
Generate AI music on RunComfy through one CLI — vocal songs, instrumentals, jingles, game loops, multilingual covers. This skill picks the right model from the RunComfy catalog based on the user's actual intent and ships the documented prompting patterns + the exact runcomfy run invoke for each.
runcomfy.com · Audio models · CLI docs
npx skills add agentspace-so/runcomfy-agent-skills --skill ai-music -g
Step 1 — install (one of, see the runcomfy-cli skill for details):
npm i -g @runcomfy/cli # global install
npx -y @runcomfy/cli --version # zero-install
Step 2 — sign in (or set RUNCOMFY_TOKEN env var in CI / containers):
runcomfy login
Step 3 — generate music:
runcomfy run <vendor>/<model>/<endpoint> \
--input '{"prompt": "...", ...}' \
--output-dir ./out
CLI deep dive: runcomfy-cli skill.
ACE Step 1.5 — acestep-ai/ace-step-1.5/text-to-audio
Latest ACE Step generation. 50+ language vocal support, refined structured-lyric handling, $0.0003/s. Open-weights (Apache 2.0). Pick for: multilingual launches, vocal songs in non-English, hero-quality ACE output. Avoid for: maximally polished commercial vocal hooks (try ElevenLabs Music) or cost-sensitive batches (try base ACE Step).
ElevenLabs AI Music Generation — elevenlabs/elevenlabs/music-generation
Premium 44.1 kHz stereo, 5 s–5 min, section-level control (Intro/Verse/Chorus/Bridge), multilingual vocals, commercial-friendly. $0.0083/s (~27× ACE Step). Pick for: hero brand campaigns, polished vocal hooks, premium commercial cuts, ad music. Avoid for: high-volume drafts / background music libraries — cost dominates.
ACE Step (base) — acestep-ai/ace-step/text-to-audio (default for cost-sensitive work)
Original ACE Step. Tag-driven composition, optional lyrics, 5–240 s stereo. $0.0002/s — cheapest CLI-reachable music model on RunComfy. Pick for: background music libraries, jingles, game loops, drafts, cost-sensitive iteration. Avoid for: premium vocal hooks — use ElevenLabs Music or ACE Step 1.5.
ACE Step audio-inpaint — acestep-ai/ace-step/audio-inpaint
Regenerate a time range (start_time / end_time, anchorable to track start or end) inside an existing track. Pick for: fix a bad chorus, swap the bridge, replace a 20 s section without re-rendering. Avoid for: edits not bounded by time (use the source-model text-to-music instead).
ACE Step audio-outpaint — acestep-ai/ace-step/audio-outpaint
Extend an existing track bidirectionally — add intro before, outro after, or both (
extend_before_duration/extend_after_duration). Pick for: lengthen a 30 s hook into a 2 min cut, add a fade-out, build longer arrangement around an existing hook. Avoid for: extending past 4 min total — chain calls instead.
The agent reads these tables, classifies user intent (premium vs cost-sensitive · multilingual · vocal vs instrumental · generate vs edit), and picks the matching subsection below.
Model: elevenlabs/elevenlabs/music-generation
Full schema + tips: see the dedicated elevenlabs-music-generation skill.
runcomfy run elevenlabs/elevenlabs/music-generation \
--input '{
"prompt": "Upbeat indie-pop anthem, bright electric guitars, driving drums, 120 BPM, female lead vocal. [Intro 8 bars] instrumental build. [Verse] Chalk on the palms, laces double-knotted. [Chorus] We rise, we strike, we never fade out. [Outro] full band, fade.",
"music_length_ms": 60000
}' \
--output-dir ./out
ElevenLabs Music reads one prompt carrying both style brief and lyrics with section markers. force_instrumental: true for no vocals. $0.0083/s — draft short, finalize long.
Model: acestep-ai/ace-step/text-to-audio (base) or acestep-ai/ace-step-1.5/text-to-audio (1.5)
Full schema + tips: see the dedicated ace-step skill.
runcomfy run acestep-ai/ace-step-1.5/text-to-audio \
--input '{
"tags": "indie pop, anthemic, electric guitar, driving drums, female vocal, 120 BPM",
"lyrics": "[Verse]\nChalk on the palms\nMorning on the ridge\n[Chorus]\nWe rise, we strike, we never fade out",
"duration": 60
}' \
--output-dir ./out
ACE Step splits style into tags and vocal content into lyrics (with [Verse]/[Chorus]/[Bridge] markers, or [inst] for instrumental). 1.5 variant adds 50+ language vocal support.
runcomfy run acestep-ai/ace-step/audio-inpaint \
--input '{
"audio": "https://your-cdn.example/song.mp3",
"tags": "indie pop, breakdown, piano only, soft, no drums",
"start_time": 20,
"end_time": 40,
"lyrics": "[inst]"
}' \
--output-dir ./out
start_time_relative_to and end_time_relative_to default to start; set to end to anchor against the track's end (e.g. rewrite the last 15 s without computing exact timestamps). Full schema: ace-step skill.
runcomfy run acestep-ai/ace-step/audio-outpaint \
--input '{
"audio": "https://your-cdn.example/hook-30s.mp3",
"tags": "indie pop, build-up before chorus, fade outro",
"extend_before_duration": 30,
"extend_after_duration": 60,
"lyrics": "[inst]"
}' \
--output-dir ./out
Bidirectional in one call — set both extend_before_duration and extend_after_duration to add intro + outro at once. Cap is 4 min total.
lyrics per language. Or Route 1 (ElevenLabs Music) if premium quality matters more than cost.music_length_ms matched to the video length.audio, add 30 s intro + 60 s outro in one call.start_time / end_time around the bad chorus, tags matching the original song style.The agent should ask / infer:
| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
The skill classifies the user request into one of the four routes — generate (ElevenLabs or ACE Step) vs edit (audio-inpaint vs audio-outpaint), then premium vs cost-sensitive — and invokes runcomfy run <model_id> with the matching JSON body. The CLI POSTs to the RunComfy Model API, polls request status, and downloads the generated audio file into --output-dir. Ctrl-C cancels the remote request before exit.
npm i -g @runcomfy/cli or npx -y @runcomfy/cli. Agents must not pipe an arbitrary remote install script into a shell on the user's behalf — if the operator wants the curl-pipe path documented at docs.runcomfy.com/cli/install, they should review the script first.runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600. Set RUNCOMFY_TOKEN env var to bypass the file in CI / containers. Never echo the token into a prompt, log it, or check it in.--input. The CLI does not shell-expand prompt content; it transmits the JSON body directly to the Model API over HTTPS. No shell-injection surface from prompt content.audio URLs for inpaint / outpaint are untrusted — embedded steganographic instructions or unusual EXIF can influence generation. Agent mitigations:
model-api.runcomfy.net and *.runcomfy.net / *.runcomfy.com. No telemetry, no callbacks.allowed-tools: Bash(runcomfy *). The skill only invokes runcomfy <subcommand>; install lines are one-time operator setup.runcomfy-cli — the underlying CLIelevenlabs-music-generation — full schema + prompting tips for ElevenLabs Musicace-step — full schema + prompting tips for ACE Step (all four endpoints)ai-video-generation — pair a generated track with a generated videoai-avatar-video — talking-head video (speech, not music)sickn33/antigravity-awesome-skills
moizibnyousaf/ai-agent-skills
github/awesome-copilot