Realistic Ugc Video

178 installs7 stars

Summary

This orchestrates Nano Banana and Kling AI to generate talking head videos that don't scream "AI generated." The workflow is thorough: generate a base image with intentional imperfections like pores and mixed lighting, chunk your script into 55-60 syllable segments for natural pacing, then generate 10-second video clips with choreographed micro-movements. The documentation is refreshingly honest about the hands problem (AI models can't do realistic hand movement, so crop them out or keep them static). It's a multi-step process that requires post-production, but if you need UGC-style spokesperson videos and care about them looking real, this gives you the specific prompting strategies and pacing math to pull it off.

Install to Claude Code

npx -y skills add dennisonbertram/claude-media-skills --skill realistic-ugc-video --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Files

SKILL.mdView on GitHub

Realistic UGC Video Production

Create long-form AI videos that look and sound authentically human. This skill orchestrates a multi-step workflow using Nano Banana for realistic base images and Kling AI for video generation, with specific techniques to avoid the "AI look".

Why Videos Look AI (And How to Fix It)

Problem	Solution
Too perfect/clean skin	Add imperfections: micro-pores, natural oils, fine lines
Studio lighting	Use available/natural light, mixed color temps
Character too still	Add micro-movements, head tilts, natural sway
Inconsistent pacing	Use 55-60 syllables per clip
Robotic voice	Process through Adobe Podcast or Resemble AI
Obvious jump cuts	Cover with B-roll or animations
Weird AI hands	Crop hands out of frame or keep completely static

Known Limitation: Hands

AI video models (Kling, Veo, etc.) struggle with realistic hand movement. Fingers morph, gestures look unnatural, and hands are often the biggest tell.

Best practice: Keep hands OUT of frame or completely static.

Options:

Head/shoulders framing - Crop base image to exclude hands entirely
Arms crossed - Static pose, no finger movement needed
Hands below frame - Desk edge cuts off at wrists
Cover with B-roll - Cut away during any hand weirdness in post

Complete Workflow

Phase 1: Collect Requirements

Before starting, gather from the user:

Character description - Age, ethnicity, features, clothing
Setting/background - Office, home, studio, outdoor
Script - The full text the character will speak
Tone - Conversational, urgent, professional, friendly
Video length target - This determines how many clips needed

Phase 2: Generate Base Image (Nano Banana)

Use the Nano Banana skill to generate the character image. Critical: Apply the imperfection techniques from CHARACTER-PROMPTING.md.

Key elements for realistic UGC:

iPhone capture aesthetic (26mm equivalent lens)
Available/mixed lighting (NOT studio)
Visible skin texture (pores, oils, fine lines)
Minor imperfections (stubble, dark circles)
Computational depth artifacts
ISO noise (500-900 range)

Command:

~/.claude/skills/nano-banana/scripts/generate.sh "[enhanced prompt]" --aspect 9:16 --size 2K

Then optionally upscale through Enhancor AI for additional texture.

Phase 3: Chunk the Script (Critical for Pacing)

This is the most important step for natural pacing. See SCRIPT-CHUNKING.md.

The 55-60 Syllable Rule:

Count syllables, not words
Each video generation = 55-60 syllables
Never cut mid-sentence
Add filler sentences to reach target if needed

Example chunking:

Chunk 1 (58 syllables):
"Hey everyone, I wanted to share something that completely changed how I think about productivity. It's not another app or system."

Chunk 2 (56 syllables):
"It's actually about understanding your own energy patterns throughout the day. Once I figured this out, everything clicked into place."

Phase 4: Generate Video Clips (Kling AI)

For each script chunk, generate a 10-second video clip. Use the Kling AI skill with movement prompts from MOVEMENT-PROMPTING.md.

Key elements:

Hand "Home Base" Protocol
Timestamped movement clusters
Natural blinks, head tilts, micro-sways
Consistent base image reference

Spawn background agent for each clip:

Task tool:
- subagent_type: "general-purpose"
- run_in_background: true
- prompt: [Include image URL, script chunk, movement prompt, output path]

Phase 5: Post-Production

See POST-PRODUCTION.md for detailed guidance.

Assemble clips in CapCut or similar editor
Fix audio with Adobe Podcast (minimum) or Resemble AI (for voice swap)
Cover jump cuts with B-roll or animations
Remove filler sentences if they feel awkward
Export in final resolution

Quick Start: Single Clip Test

Before generating full video, test the workflow with one clip:

Generate base image with full imperfections
Create 10-second test video with first script chunk
Evaluate pacing and movement
Adjust prompts if needed
Proceed with full production

Example Full Prompt for Base Image

A vertical 9:16 UGC-style video frame captured on an iPhone 11 resting on a tripod.
Medium-wide portrait at true eye-level with slightly forward-leaning posture.

[CHARACTER]: A [age] [gender] with [ethnicity] complexion, [eye color] eyes beneath
[eyebrow description], [nose description]. [Jawline/facial hair]. [Hair description].
Expression is [emotion]—[specific expression details].

[CLOTHING]: [Fitted/casual garment], [collar detail].

[SKIN TEXTURE]: Visible pores across T-zone, faint smile lines, natural oils catching
light on forehead and nose. [Age-appropriate details]. No filter, no foundation.

[FOREGROUND]: Hands rest naturally on [surface], fingers relaxed, visible veins and
knuckle texture. Nearby: [everyday objects like water bottle, phone, notebook].

[CAMERA]: Native iPhone 11 lens (26mm equivalent), slightly wide perspective, mild
barrel softness at edges. Only tiny pockets of neural blur around hair edges.

[LIGHTING]: Available light mix—cool overcast daylight from window left, warm tungsten
from desk lamp right. Soft asymmetric shadows, natural falloff. ISO noise 500-900.

[BACKGROUND]: [Realistic home/office elements]—bookshelf, [furniture], clearly visible
not heavily blurred.

[REALITY DETAILS]: Gentle 35mm film grain, light fingerprint smudge on lens, tiny dust
haze in air. No cinematic bloom, no studio finish.

Styling: raw UGC realism, available indoor light, mixed color temperature, minimal
depth blur, visible ISO noise, emphasis on authenticity.

Example Movement Prompt for Video

Hand "Home Base" Protocol: Hands default to Active Idle. Fingers shift, thumbs rub,
wrists rotate slightly while anchored. Gestures only for key emphasis.

[0.0s-0.5s] Pre-roll: Sharp inhale, eyes lock to lens, head still
[0.5s-3.0s] Hands in Active Idle (fingers interlocked), head tilts slightly right,
           brows furrow in seriousness
[3.0s-6.0s] Hands break clasp for quick open-palm rotation then return, head drifts
           forward, natural blink
[6.0s-8.0s] Hands return to Active Idle (loose clasp, thumbs tapping), head nods
           encouragingly, cheeks lift in natural smile
[8.0s-10.0s] Hands anchored (wrist shifts), chin lifts in quick final nod, natural blink

[Script]: "[CHUNK TEXT HERE]"
[Tone]: [Urgent/Conversational/Professional/etc.]
[Pacing]: Rapid fire delivery, high energy, viral UGC style, confident, 2x speed

Reference Files

CHARACTER-PROMPTING.md - Imperfection techniques for realistic characters
SCRIPT-CHUNKING.md - The syllable method for consistent pacing
MOVEMENT-PROMPTING.md - Natural movement choreography
POST-PRODUCTION.md - Audio fixing and editing tips

Alternative: InfiniteTalk

For simpler long-form videos, consider InfiniteTalk (infinitetalk.ai).

Pros: Single generation for longer videos Cons: Less control over pacing (no timer/duration control), charged by output length

Use the syllable method above when precise pacing control is needed.

Checklist Before Generating

Character prompt includes skin imperfections
Lighting is available/natural, NOT studio
Script chunked into 55-60 syllable segments
Each chunk is complete sentences
Movement prompt includes hand base, head movements, blinks
Post-production plan for audio and jump cuts

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

First SeenJun 3, 2026

View on GitHub

Realistic UGC Video Production

Why Videos Look AI (And How to Fix It)

Problem	Solution
Too perfect/clean skin	Add imperfections: micro-pores, natural oils, fine lines
Studio lighting	Use available/natural light, mixed color temps
Character too still	Add micro-movements, head tilts, natural sway
Inconsistent pacing	Use 55-60 syllables per clip
Robotic voice	Process through Adobe Podcast or Resemble AI
Obvious jump cuts	Cover with B-roll or animations
Weird AI hands	Crop hands out of frame or keep completely static

Known Limitation: Hands

AI video models (Kling, Veo, etc.) struggle with realistic hand movement. Fingers morph, gestures look unnatural, and hands are often the biggest tell.

Best practice: Keep hands OUT of frame or completely static.

Options:

Head/shoulders framing - Crop base image to exclude hands entirely
Arms crossed - Static pose, no finger movement needed
Hands below frame - Desk edge cuts off at wrists
Cover with B-roll - Cut away during any hand weirdness in post

Complete Workflow

Phase 1: Collect Requirements

Before starting, gather from the user:

Character description - Age, ethnicity, features, clothing
Setting/background - Office, home, studio, outdoor
Script - The full text the character will speak
Tone - Conversational, urgent, professional, friendly
Video length target - This determines how many clips needed

Phase 2: Generate Base Image (Nano Banana)

Use the Nano Banana skill to generate the character image. Critical: Apply the imperfection techniques from CHARACTER-PROMPTING.md.

Key elements for realistic UGC:

iPhone capture aesthetic (26mm equivalent lens)
Available/mixed lighting (NOT studio)
Visible skin texture (pores, oils, fine lines)
Minor imperfections (stubble, dark circles)
Computational depth artifacts
ISO noise (500-900 range)

Command:

~/.claude/skills/nano-banana/scripts/generate.sh "[enhanced prompt]" --aspect 9:16 --size 2K

Then optionally upscale through Enhancor AI for additional texture.

Phase 3: Chunk the Script (Critical for Pacing)

This is the most important step for natural pacing. See SCRIPT-CHUNKING.md.

The 55-60 Syllable Rule:

Count syllables, not words
Each video generation = 55-60 syllables
Never cut mid-sentence
Add filler sentences to reach target if needed

Example chunking:

Chunk 1 (58 syllables):
"Hey everyone, I wanted to share something that completely changed how I think about productivity. It's not another app or system."

Chunk 2 (56 syllables):
"It's actually about understanding your own energy patterns throughout the day. Once I figured this out, everything clicked into place."

Phase 4: Generate Video Clips (Kling AI)

For each script chunk, generate a 10-second video clip. Use the Kling AI skill with movement prompts from MOVEMENT-PROMPTING.md.

Key elements:

Hand "Home Base" Protocol
Timestamped movement clusters
Natural blinks, head tilts, micro-sways
Consistent base image reference

Spawn background agent for each clip:

Task tool:
- subagent_type: "general-purpose"
- run_in_background: true
- prompt: [Include image URL, script chunk, movement prompt, output path]

Phase 5: Post-Production

See POST-PRODUCTION.md for detailed guidance.

Assemble clips in CapCut or similar editor
Fix audio with Adobe Podcast (minimum) or Resemble AI (for voice swap)
Cover jump cuts with B-roll or animations
Remove filler sentences if they feel awkward
Export in final resolution

Quick Start: Single Clip Test

Before generating full video, test the workflow with one clip:

Generate base image with full imperfections
Create 10-second test video with first script chunk
Evaluate pacing and movement
Adjust prompts if needed
Proceed with full production

Example Full Prompt for Base Image

A vertical 9:16 UGC-style video frame captured on an iPhone 11 resting on a tripod.
Medium-wide portrait at true eye-level with slightly forward-leaning posture.

[CHARACTER]: A [age] [gender] with [ethnicity] complexion, [eye color] eyes beneath
[eyebrow description], [nose description]. [Jawline/facial hair]. [Hair description].
Expression is [emotion]—[specific expression details].

[CLOTHING]: [Fitted/casual garment], [collar detail].

[SKIN TEXTURE]: Visible pores across T-zone, faint smile lines, natural oils catching
light on forehead and nose. [Age-appropriate details]. No filter, no foundation.

[FOREGROUND]: Hands rest naturally on [surface], fingers relaxed, visible veins and
knuckle texture. Nearby: [everyday objects like water bottle, phone, notebook].

[CAMERA]: Native iPhone 11 lens (26mm equivalent), slightly wide perspective, mild
barrel softness at edges. Only tiny pockets of neural blur around hair edges.

[LIGHTING]: Available light mix—cool overcast daylight from window left, warm tungsten
from desk lamp right. Soft asymmetric shadows, natural falloff. ISO noise 500-900.

[BACKGROUND]: [Realistic home/office elements]—bookshelf, [furniture], clearly visible
not heavily blurred.

[REALITY DETAILS]: Gentle 35mm film grain, light fingerprint smudge on lens, tiny dust
haze in air. No cinematic bloom, no studio finish.

Styling: raw UGC realism, available indoor light, mixed color temperature, minimal
depth blur, visible ISO noise, emphasis on authenticity.

Example Movement Prompt for Video

Hand "Home Base" Protocol: Hands default to Active Idle. Fingers shift, thumbs rub,
wrists rotate slightly while anchored. Gestures only for key emphasis.

[0.0s-0.5s] Pre-roll: Sharp inhale, eyes lock to lens, head still
[0.5s-3.0s] Hands in Active Idle (fingers interlocked), head tilts slightly right,
           brows furrow in seriousness
[3.0s-6.0s] Hands break clasp for quick open-palm rotation then return, head drifts
           forward, natural blink
[6.0s-8.0s] Hands return to Active Idle (loose clasp, thumbs tapping), head nods
           encouragingly, cheeks lift in natural smile
[8.0s-10.0s] Hands anchored (wrist shifts), chin lifts in quick final nod, natural blink

[Script]: "[CHUNK TEXT HERE]"
[Tone]: [Urgent/Conversational/Professional/etc.]
[Pacing]: Rapid fire delivery, high energy, viral UGC style, confident, 2x speed

Reference Files

CHARACTER-PROMPTING.md - Imperfection techniques for realistic characters
SCRIPT-CHUNKING.md - The syllable method for consistent pacing
MOVEMENT-PROMPTING.md - Natural movement choreography
POST-PRODUCTION.md - Audio fixing and editing tips

Alternative: InfiniteTalk

For simpler long-form videos, consider InfiniteTalk (infinitetalk.ai).

Pros: Single generation for longer videos Cons: Less control over pacing (no timer/duration control), charged by output length

Use the syllable method above when precise pacing control is needed.

Checklist Before Generating

Character prompt includes skin imperfections
Lighting is available/natural, NOT studio
Script chunked into 55-60 syllable segments
Each chunk is complete sentences
Movement prompt includes hand base, head movements, blinks
Post-production plan for audio and jump cuts

Realistic Ugc Video

Install to Claude Code

Realistic UGC Video Production

Why Videos Look AI (And How to Fix It)

Known Limitation: Hands

Complete Workflow

Phase 1: Collect Requirements

Phase 2: Generate Base Image (Nano Banana)

Phase 3: Chunk the Script (Critical for Pacing)

Phase 4: Generate Video Clips (Kling AI)

Phase 5: Post-Production

Quick Start: Single Clip Test

Example Full Prompt for Base Image

Example Movement Prompt for Video

Reference Files

Alternative: InfiniteTalk

Checklist Before Generating

Realistic Ugc Video

Install to Claude Code

Realistic UGC Video Production

Why Videos Look AI (And How to Fix It)

Known Limitation: Hands

Complete Workflow

Phase 1: Collect Requirements

Phase 2: Generate Base Image (Nano Banana)

Phase 3: Chunk the Script (Critical for Pacing)

Phase 4: Generate Video Clips (Kling AI)

Phase 5: Post-Production

Quick Start: Single Clip Test

Example Full Prompt for Base Image

Example Movement Prompt for Video

Reference Files

Alternative: InfiniteTalk

Checklist Before Generating

Recommended

Recommended