This is a complete local TTS workflow built around Qwen3-TTS models with three distinct modes: CustomVoice for built-in speakers with emotion control, VoiceDesign for describing voices in natural language (like "high-pitched loli voice"), and VoiceClone for mimicking reference audio. The real utility is in batch dubbing long articles into multi-voice audio with automatic speaker assignment and emotion tagging, then merging everything with FFmpeg. Supports Chinese, English, Japanese, and Korean with speakers like Vivian and Ryan out of the box. The documentation is thorough with actual command examples, though it's heavy on Chinese content. You'll need a GPU for reasonable performance and FFmpeg installed for the batch features. If you're doing voiceovers for articles, audiobooks, or multi-character dialogue, this handles the whole pipeline from text splitting to final WAV.
npx -y skills add mu-zi-lee/qwen3-tts-skill --skill qwen3-tts-skills --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
davila7/claude-code-templates
orchestra-research/ai-research-skills
agentspace-so/runcomfy-agent-skills
inferen-sh/skills