Generates AI images from text prompts using the ListenHub CLI, supporting both Gemini Pro and Flash models with resolutions up to 4K and various aspect ratios. The interaction flow asks one question at a time (model, resolution, aspect ratio, optional reference images) and requires explicit confirmation before generating. Flash model unlocks extreme ratios like 1:8 and 8:1 for panoramic shots. Output can be inline, downloaded to a dated artifact directory, or both. The prompt handling is sensible: it passes your text directly by default and only offers to enrich very short prompts if you haven't asked for verbatim generation. Supports both local files and URLs as style references, up to five per generation.
npx -y skills add marswaveai/skills --skill image-gen --agent claude-codeInstalls into .claude/skills of the current project.
/podcast, /speech)/explainer)/content-parser)Generate AI images using the ListenHub CLI. Supports text prompts with optional reference images (local files or URLs), multiple resolutions, and aspect ratios. Images are saved as local files.
shared/cli-authentication.mdshared/cli-patterns.md for command execution and error handlingshared/config-pattern.md before any interaction.listenhub/image-gen/YYYY-MM-DD-{jobId}/ — never ~/Downloads/Follow shared/cli-authentication.md § Auth Check. If CLI is not installed or not logged in, auto-install and auto-login — never ask the user to run commands manually.
Then follow shared/cli-authentication.md § Auth Mode Detection to determine AUTH_MODE and set:
if [ "$AUTH_MODE" = "openapi" ]; then
CMD_PREFIX="listenhub openapi image"
else
CMD_PREFIX="listenhub image"
fi
All subsequent CLI calls use $CMD_PREFIX instead of hardcoded listenhub image.
Follow shared/config-pattern.md Step 0 (Zero-Question Boot).
If file doesn't exist — silently create with defaults and proceed:
mkdir -p ".listenhub/image-gen"
echo '{"outputDir":".listenhub","outputMode":"inline"}' > ".listenhub/image-gen/config.json"
CONFIG_PATH=".listenhub/image-gen/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Do NOT ask any setup questions. Proceed directly to the Interaction Flow.
If file exists — read config silently and proceed:
CONFIG_PATH=".listenhub/image-gen/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/image-gen/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Only run when the user explicitly asks to reconfigure. Display current settings:
当前配置 (image-gen):
输出方式:{inline / download / both}
Then ask:
shared/output-mode.md § Setup Flow Question.Save immediately:
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
Free text input. Ask the user:
Describe the image you want to generate.
If the prompt is very short (< 10 words) and the user hasn't asked for verbatim generation, offer to help enrich the prompt. Otherwise, use as-is.
Ask:
Question: "Which model?"
Options:
- "pro (recommended)" — gemini-3-pro-image, higher quality
- "flash" — gemini-3.1-flash-image, faster and cheaper, unlocks extreme aspect ratios (1:4, 4:1, 1:8, 8:1)
Ask both together (independent parameters):
Question: "What resolution?"
Options:
- "1K" — Standard quality
- "2K (recommended)" — High quality, good balance
- "4K" — Ultra high quality, slower generation
Question: "What aspect ratio?"
Options (all models):
- "16:9" — Landscape, widescreen
- "1:1" — Square
- "9:16" — Portrait, phone screen
- "Other" — 2:3, 3:2, 3:4, 4:3, 21:9
If flash model was selected, also offer: 1:4 (narrow portrait), 4:1 (wide landscape), 1:8 (extreme portrait), 8:1 (panoramic)
Question: "Any reference images for style guidance?"
Options:
- "Yes" — Provide file paths or URLs
- "No references" — Generate from prompt only
If yes: Collect reference image paths or URLs (comma-separated). The CLI handles both local files and URLs natively — no need to distinguish between them.
Each reference will be passed as a --reference flag to the CLI.
Summarize all choices:
Ready to generate image:
Prompt: {prompt text}
Model: {pro / flash}
Resolution: {1K / 2K / 4K}
Aspect ratio: {ratio}
References: {yes — N image(s) / no}
Proceed?
Wait for explicit confirmation before running the CLI command.
Build CLI command: Construct the $CMD_PREFIX create command with all collected parameters.
Execute: Run the command with run_in_background: true and timeout: 180000:
$CMD_PREFIX create \
--prompt "{description}" \
--model "{model}" \
--lang "{lang}" \
--aspect-ratio {16:9|9:16|1:1} \
--size {1K|2K|4K} \
--json
If reference images were provided, add --reference for each:
$CMD_PREFIX create \
--prompt "{description}" \
--model "{model}" \
--lang "{lang}" \
--aspect-ratio 16:9 \
--size 2K \
--reference ./sketch.png \
--reference ./photo.jpg \
--json
The --lang flag provides a language hint for the prompt. Detect from the user's prompt language (e.g., Chinese prompt → zh, English prompt → en).
Parse result and present
Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.
Parse the CLI JSON output to extract the image URL:
IMAGE_URL=$(echo "$RESULT" | jq -r '.imageUrl')
inline or both: Download to a temp file, then use the Read tool.
JOB_ID=$(date +%s)
listenhub download "$IMAGE_URL" -o /tmp/image-gen-${JOB_ID}.jpg
Then use the Read tool on /tmp/image-gen-{jobId}.jpg. The image displays inline in the conversation.
Present:
图片已生成!
download or both: Save to the artifact directory.
JOB_ID=$(date +%s)
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/image-gen/${DATE}-${JOB_ID}"
mkdir -p "$JOB_DIR"
listenhub download "$IMAGE_URL" -o "${JOB_DIR}/${JOB_ID}.jpg"
Present:
图片已生成!
已保存到 .listenhub/image-gen/{YYYY-MM-DD}-{jobId}/:
{jobId}.jpg
Default: Pass the user's prompt directly without modification.
When to offer optimization:
When to never modify:
Optimization techniques (if user agrees):
shared/cli-authentication.mdshared/cli-patterns.mdshared/config-pattern.mdshared/output-mode.mdUser: "Generate an image: cyberpunk city at night"
Agent workflow:
$CMD_PREFIX create \
--prompt "cyberpunk city at night" \
--model "gemini-3-pro-image" \
--lang en \
--aspect-ratio 16:9 \
--size 2K \
--json
Parse CLI JSON output per outputMode (see shared/output-mode.md).
User: "Generate an image in this style" (provides local files and a URL)
Agent workflow:
/path/to/style-reference.png, https://example.com/photo.jpg$CMD_PREFIX create \
--prompt "a serene mountain lake at dawn" \
--model "gemini-3-pro-image" \
--lang en \
--aspect-ratio 16:9 \
--size 2K \
--reference /path/to/style-reference.png \
--reference https://example.com/photo.jpg \
--json
Parse CLI JSON output per outputMode (see shared/output-mode.md).
davila7/claude-code-templates
orchestra-research/ai-research-skills
agentspace-so/runcomfy-agent-skills
inferen-sh/skills