Wraps FFmpeg in 119 structured MCP tools so agents can trim, merge, caption, and export video without shell command guesswork. Includes preflight validation to catch bad parameters before render, AI transcription for subtitle generation, scene detection, audio normalization, and local repurposing workflows that turn one source into platform-ready variants for YouTube Shorts, Reels, and TikTok. Also surfaces Hyperframes 0.5 for code-driven composition and cinematic planning tools that parse style packs and storyboards. Best fit when you want Claude or Cursor to drive a repeatable video pipeline with quality checkpoints instead of hoping raw FFmpeg flags work. Python 3.11+, requires FFmpeg on PATH.
Public tool metadata for what this MCP can expose to an agent.
get_job_resultCheck job status and result. Poll every 60 seconds — do NOT poll more frequently. Video processing typically takes 3-5 minutes. Progress may stay at 20% during frame analysis for 1-3 minutes — this is completely normal. Do NOT interpret slow progress as failure. Only report fa...3 paramsCheck job status and result. Poll every 60 seconds — do NOT poll more frequently. Video processing typically takes 3-5 minutes. Progress may stay at 20% during frame analysis for 1-3 minutes — this is completely normal. Do NOT interpret slow progress as failure. Only report fa...
job_idstringapi_keystringdb_job_idstringget_upload_urlGET A SIGNED UPLOAD URL for uploading a local video to NarrateAI cloud storage. Use this ONLY when running in HTTP/remote mode and the user has a local video file. After getting the URL, upload the file with curl, then pass the returned temp_file_path to any processing tool as...2 paramsGET A SIGNED UPLOAD URL for uploading a local video to NarrateAI cloud storage. Use this ONLY when running in HTTP/remote mode and the user has a local video file. After getting the URL, upload the file with curl, then pass the returned temp_file_path to any processing tool as...
api_keystringfilenamestringgenerate_narration_scriptNARRATION SCRIPT – generates an AI-written timed script for a SILENT video. No audio output. Use when the user wants a timed narration script, text-only narration, or sync data for a silent video. This does NOT extract existing speech (use transcribe_video for that). This does...4 paramsNARRATION SCRIPT – generates an AI-written timed script for a SILENT video. No audio output. Use when the user wants a timed narration script, text-only narration, or sync data for a silent video. This does NOT extract existing speech (use transcribe_video for that). This does...
api_keystringlanguagestringvideo_sourcestringmanual_contextstringnarrate_video_fullFULL NARRATED VIDEO – produces a downloadable video with AI voiceover. Use when the user wants: "narrate this video", "add voiceover", "make a narrated video". VOICE OPTIONS — ask the user which they prefer: 1. AI voice: male1 (default, fastest), female1 (default, fastest), fe...6 paramsFULL NARRATED VIDEO – produces a downloadable video with AI voiceover. Use when the user wants: "narrate this video", "add voiceover", "make a narrated video". VOICE OPTIONS — ask the user which they prefer: 1. AI voice: male1 (default, fastest), female1 (default, fastest), fe...
api_keystringlanguagestringvoice_typestringvideo_sourcestringvoice_samplestringmanual_contextstringabandon_jobAbandon/cancel a processing job. Call this when the user cancels on the agent side. Stops the backend from continuing audio generation and video assembly. Use after narrate_video_transcript or when continue_to_full_video was started but user cancelled. Returns: JSON with succe...2 paramsAbandon/cancel a processing job. Call this when the user cancels on the agent side. Stops the backend from continuing audio generation and video assembly. Use after narrate_video_transcript or when continue_to_full_video was started but user cancelled. Returns: JSON with succe...
job_idstringapi_keystringtranscribe_videoTRANSCRIPTION ONLY – video with existing voice -> speech-to-text -> timed transcript. No translation, no narrated video. Returns original speech as-is. Use when the user wants to transcribe a video that already has spoken audio (podcast, interview, meeting recording, etc.). CR...3 paramsTRANSCRIPTION ONLY – video with existing voice -> speech-to-text -> timed transcript. No translation, no narrated video. Returns original speech as-is. Use when the user wants to transcribe a video that already has spoken audio (podcast, interview, meeting recording, etc.). CR...
api_keystringvideo_sourcestringsource_languagestringtranscribe_and_translateTRANSCRIBE & TRANSLATE (new upload) – video with voice -> speech-to-text -> translate -> translated transcript. No TTS, no video output. Returns translated timed transcript only. Use when the user uploads a new video and wants a translated transcript (e.g. Spanish podcast -> E...4 paramsTRANSCRIBE & TRANSLATE (new upload) – video with voice -> speech-to-text -> translate -> translated transcript. No TTS, no video output. Returns translated timed transcript only. Use when the user uploads a new video and wants a translated transcript (e.g. Spanish podcast -> E...
api_keystringvideo_sourcestringsource_languagestringtarget_languagestringtranslate_existing_videoTRANSLATION (existing video) – Translate transcript of a video already in the user's library. Loads transcript from cloud, translates, returns. No upload. Sync – returns immediately. Use when the user wants to translate a video they already narrated/dubbed with NarrateAI (e.g....4 paramsTRANSLATION (existing video) – Translate transcript of a video already in the user's library. Loads transcript from cloud, translates, returns. No upload. Sync – returns immediately. Use when the user wants to translate a video they already narrated/dubbed with NarrateAI (e.g....
job_idstringapi_keystringsource_languagestringtarget_languagestringdub_video_fullFULL AUTO-DUBBING – transcribe -> translate -> extract speaker voice -> TTS with cloned voice -> dubbed video. No refinement screen. Uses the video's own speaker voice for the dubbed audio. Use when the user wants a complete dubbed video (e.g. Spanish video -> English dubbed)....5 paramsFULL AUTO-DUBBING – transcribe -> translate -> extract speaker voice -> TTS with cloned voice -> dubbed video. No refinement screen. Uses the video's own speaker voice for the dubbed audio. Use when the user wants a complete dubbed video (e.g. Spanish video -> English dubbed)....
api_keystringvideo_sourcestringsource_languagestringtarget_languagestringpreserve_background_musicbooleangenerate_documentDOCUMENT GENERATION – produces a structured markdown document from a silent video. Use when the user wants: a document, article, guide, tutorial, or written content based on a video. NOT for narrated video or voiceover. The agent MUST ask which document type the user wants bef...5 paramsDOCUMENT GENERATION – produces a structured markdown document from a silent video. Use when the user wants: a document, article, guide, tutorial, or written content based on a video. NOT for narrated video or voiceover. The agent MUST ask which document type the user wants bef...
api_keystringlanguagestringvideo_sourcestringdocument_typestringmanual_contextstringgenerate_ttsTEXT-TO-SPEECH – generate audio from text. Returns a downloadable audio URL. Use when the user wants: "read this aloud", "generate speech", "text to speech", "convert text to audio", "make an audio file from this text". VOICE OPTIONS — ask the user which they prefer: 1. AI voi...5 paramsTEXT-TO-SPEECH – generate audio from text. Returns a downloadable audio URL. Use when the user wants: "read this aloud", "generate speech", "text to speech", "convert text to audio", "make an audio file from this text". VOICE OPTIONS — ask the user which they prefer: 1. AI voi...
textstringapi_keystringlanguagestringvoice_typestringvoice_samplestringnarrate_batchBATCH NARRATION – narrate multiple videos in parallel. Each gets a full narrated video with voiceover. Use when the user has multiple videos to narrate (e.g. "narrate these 3 videos"). Maximum 5 videos per batch. Each video is processed independently – one failure does not aff...6 paramsBATCH NARRATION – narrate multiple videos in parallel. Each gets a full narrated video with voiceover. Use when the user has multiple videos to narrate (e.g. "narrate these 3 videos"). Maximum 5 videos per batch. Each video is processed independently – one failure does not aff...
api_keystringlanguagestringvoice_typestringcontexts_jsonstringmanual_contextstringvideo_sources_jsonstringbatch_generate_scriptsBATCH SCRIPT GENERATION – generate AI narration scripts for multiple silent videos in parallel. Each video gets a timed narration script (text only, no audio). Maximum 5 videos per batch. One failure does not affect others. CRITICAL – Context handling: Before calling, ask the...5 paramsBATCH SCRIPT GENERATION – generate AI narration scripts for multiple silent videos in parallel. Each video gets a timed narration script (text only, no audio). Maximum 5 videos per batch. One failure does not affect others. CRITICAL – Context handling: Before calling, ask the...
api_keystringlanguagestringcontexts_jsonstringmanual_contextstringvideo_sources_jsonstringbatch_transcribeBATCH TRANSCRIPTION – transcribe speech from multiple videos in parallel. Each video must have existing spoken audio. Returns timed transcript per video. CRITICAL: source_language is REQUIRED – ask user if not specified. Applies to all videos. Maximum 5 videos per batch. One f...3 paramsBATCH TRANSCRIPTION – transcribe speech from multiple videos in parallel. Each video must have existing spoken audio. Returns timed transcript per video. CRITICAL: source_language is REQUIRED – ask user if not specified. Applies to all videos. Maximum 5 videos per batch. One f...
api_keystringsource_languagestringvideo_sources_jsonstringbatch_dubBATCH DUBBING – dub multiple videos into another language in parallel. Each video gets full auto-dubbing (transcribe -> translate -> voice clone -> dubbed video). CRITICAL: source_language, target_language, preserve_background_music are REQUIRED – ask user. All videos share th...5 paramsBATCH DUBBING – dub multiple videos into another language in parallel. Each video gets full auto-dubbing (transcribe -> translate -> voice clone -> dubbed video). CRITICAL: source_language, target_language, preserve_background_music are REQUIRED – ask user. All videos share th...
api_keystringsource_languagestringtarget_languagestringvideo_sources_jsonstringpreserve_background_musicbooleanupdate_transcriptUPDATE TRANSCRIPT – edit the narration script before continuing to full video. Use after generate_narration_script returns a transcript and the user wants to change wording, timing, or content of specific segments. The user describes changes naturally; you apply them and call...5 paramsUPDATE TRANSCRIPT – edit the narration script before continuing to full video. Use after generate_narration_script returns a transcript and the user wants to change wording, timing, or content of specific segments. The user describes changes naturally; you apply them and call...
job_idstringapi_keystringtarget_languagestringtranscript_jsonstringreset_for_reprocessingbooleanlist_videosLIST VIDEOS – get the user's video library (previously processed videos). Use when the user wants to see their existing videos, re-translate a previously narrated video, or work with videos they already processed. Returns paginated list with job IDs, filenames, status, and tim...3 paramsLIST VIDEOS – get the user's video library (previously processed videos). Use when the user wants to see their existing videos, re-translate a previously narrated video, or work with videos they already processed. Returns paginated list with job IDs, filenames, status, and tim...
pageintegerapi_keystringper_pageintegercontinue_to_full_videoContinue from transcript to full narrated video. Use after generate_narration_script returns a transcript and the user is satisfied with it. VOICE OPTIONS — ask the user which they prefer: 1. AI voice: male1 (default, fastest), female1 (default, fastest), female2, female3, fem...5 paramsContinue from transcript to full narrated video. Use after generate_narration_script returns a transcript and the user is satisfied with it. VOICE OPTIONS — ask the user which they prefer: 1. AI voice: male1 (default, fastest), female1 (default, fastest), female2, female3, fem...
job_idstringapi_keystringdb_job_idstringvoice_typestringvoice_samplestring
Guardrailed video editing MCP server for AI agents.
Structured tools for FFmpeg video editing, cinematic prompt planning, media analysis, subtitles, audio, effects, Hyperframes video creation, local repurposing packages, and preflight validation that helps prevent silent bad media output.
Install • Quick Start • Agent Workflows • Tools • Tool Reference • AI Discovery • Agent Skill • llms.txt • MCP Registry
mcp-video is a free, open-source Model Context Protocol (MCP) server, Python library, and CLI that gives AI agents a real video-editing surface. It wraps FFmpeg, PUSHING CREATION-style planning, media analysis, quality checks, subtitles, audio generation, effects, Hyperframes rendering, local repurposing packages, and guardrails for risky edit parameters behind structured tool schemas.
Best-fit searches:
AI agents can write FFmpeg commands, but they should not have to guess flags, parse brittle stderr, or silently publish broken media. mcp-video gives agents typed operations, inspectable tool metadata, structured results, preflight guardrails, and quality checkpoints so a video workflow can be automated and reviewed without turning into shell-command roulette.
Use it when you want an AI assistant to:
Prerequisite: FFmpeg must be installed and available on PATH.
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
Run without a global install:
uvx --from mcp-video mcp-video doctor
Or install with pip:
pip install mcp-video
mcp-video doctor
Hyperframes tools additionally need Node.js 22+ and a resolvable Hyperframes CLI. Install/pin Hyperframes in the active Node package layout, add hyperframes to PATH, or set MCP_VIDEO_HYPERFRAMES_COMMAND.
The core install covers all FFmpeg editing tools. Optional features ship as extras — install only what you use:
| You want | Install | Approx. extra size |
|---|---|---|
| Speech-to-text subtitles (Whisper) | pip install "mcp-video[transcribe]" | ~1 GB (torch) |
| Image analysis (colors, layout, contrast) | pip install "mcp-video[image]" | ~50 MB |
| Vocal/instrument stem separation | pip install "mcp-video[stems]" | ~2 GB (torch + demucs) |
| AI upscaling | pip install "mcp-video[upscale]" | ~2 GB (Python ≤3.12) |
| Procedural audio/music tools | pip install "mcp-video[audio]" | ~30 MB (numpy) |
| Everything AI | pip install "mcp-video[ai]" | several GB |
Mix freely, e.g. pip install "mcp-video[transcribe,image]". Run mcp-video doctor afterward — it reports exactly which features are available and what is missing.
mcp-video es un servidor MCP de edición de video para agentes de IA: 119 herramientas estructuradas sobre FFmpeg para recortar, unir, subtitular, mezclar audio, aplicar efectos y reutilizar contenido (Shorts, Reels, TikTok), con barreras de seguridad que detectan parámetros riesgosos antes de renderizar.
Requisito: FFmpeg instalado y disponible en el PATH.
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
# Instalación y diagnóstico
pip install mcp-video
mcp-video doctor
Para Claude Code:
claude mcp add mcp-video -- uvx --from mcp-video mcp-video
mcp-video doctor informa qué funciones están disponibles y qué falta instalar. La documentación completa está en inglés; los mensajes de error principales son bilingües.
From a clone of this repo, run the smallest confidence workflow before wiring an agent host:
uv run --no-project --with mcp-video python workflows/05-confidence-baseline/workflow.py
uv run --no-project --with mcp-video python workflows/benchmarks/run_confidence_benchmark.py
The workflow generates a tiny source clip, creates a checked vertical video, runs quality/release checkpoint steps, and writes workflows/05-confidence-baseline/output/video_receipt.json.
Proof notes live in docs/proofs/.
claude mcp add mcp-video -- uvx --from mcp-video mcp-video
{
"mcpServers": {
"mcp-video": {
"command": "uvx",
"args": ["--from", "mcp-video", "mcp-video"]
}
}
}
{
"mcpServers": {
"mcp-video": {
"command": "uvx",
"args": ["--from", "mcp-video", "mcp-video"]
}
}
}
Then ask your agent:
Trim this interview into a 45-second vertical clip, add burned captions, normalize the audio, make a thumbnail, and create a release checkpoint before export.
mcp-video includes a public agent skill at skills/mcp-video/SKILL.md. Use $mcp-video in compatible agent hosts when you want the agent to choose between the MCP server, CLI, and Python client while preserving the inspect, edit, verify, and human-review workflow.
from mcp_video import Client
editor = Client()
clip = editor.trim("interview.mp4", start="00:02:15", duration="00:00:45")
caption_file = "captions.srt"
editor.ai_transcribe(clip.output_path, output_srt=caption_file)
captioned = editor.subtitles(clip.output_path, subtitle_file=caption_file)
vertical = editor.resize(captioned.output_path, aspect_ratio="9:16")
checkpoint = editor.release_checkpoint(vertical.output_path)
print(checkpoint["thumbnail"])
print(checkpoint["storyboard"])
mcp-video info interview.mp4
mcp-video trim interview.mp4 -s 00:02:15 -d 45
mcp-video video-ai-transcribe clip.mp4 --output captions.srt
mcp-video subtitles clip.mp4 captions.srt
mcp-video resize clip.mp4 --aspect-ratio 9:16
mcp-video video-quality-check clip.mp4
mcp-video repurpose clip.mp4 --platforms youtube-shorts instagram-reel tiktok
| Workflow | Example prompt |
|---|---|
| Social clips | "Turn this landscape recording into a captioned TikTok and YouTube Short." |
| Podcast production | "Find the strongest segment, trim it, normalize audio, add chapters, and export." |
| Product demos | "Create a short launch video from screenshots, title cards, and voiceover." |
| Cinematic planning | "Create a style pack and storyboard, then render shot prompts for generation." |
| Quality review | "Compare these two exports, make thumbnails, and flag visual or audio problems." |
| Batch automation | "Convert this folder of clips to web-ready MP4 with consistent loudness." |
| Code-created video | "Scaffold a Hyperframes composition, inspect it, render it, then add subtitles and a watermark." |
| Local repurposing | "Turn this master clip into Shorts, Reels, TikTok, and YouTube assets with thumbnails and a manifest." |
mcp-video currently registers 119 MCP tools. The table below summarizes the documented core categories; search_tools lets agents discover the exact operation they need without loading every tool description into context.
| Category | Count | Highlights |
|---|---|---|
| Core video editing | 32 | trim, merge, resize, crop, rotate, convert, overlays, subtitles, export, cleanup, templates, merge-compatibility guardrails |
| Cinematic creation | 4 | project scaffold, style-pack parsing, storyboard parsing, shot prompt expansion |
| AI-assisted media | 11 | transcription, scene detection, upscaling, stem separation, silence removal, color grading |
| Hyperframes | 18 | init, preview, render, snapshots, inspect, catalog, website capture, local TTS, transcription, background removal, diagnostics, benchmark, post-process |
| Repurposing | 2 | dry-run manifests, platform-ready variants, thumbnails, storyboards, release checkpoints |
| Procedural audio | 7 | synthesize, compose, presets, effects, sequences, generated audio, spatial audio, mix-parameter guardrails |
| Visual effects | 8 | vignette, glow, noise, scanlines, chromatic aberration, luma key, mask, shape mask, bounded filter parameters |
| Transitions | 3 | glitch, morph, pixelate |
| Layout and motion | 6 | grid, picture-in-picture, split-screen, animated text, counters, progress bars, auto-chapters, layout mismatch warnings |
| Analysis | 8 | scene detection, thumbnail, preview, storyboard, quality compare, metadata, waveform, release checkpoint |
| Image analysis | 3 | extract colors, generate palettes, analyze product images |
| Discovery | 1 | search_tools |
from mcp_video import Client
editor = Client()
matches = editor.search_tools("subtitle")
print(matches["tools"])
Full reference: docs/TOOLS.md
For autonomous agents, the intended path is inspect, edit, verify, then ask a human to review release artifacts:
from mcp_video import Client
client = Client()
print(client.inspect("trim"))
result = client.pipeline(
[
{"op": "trim", "input": "source.mp4", "start": "00:01:00", "duration": "00:00:45"},
{"op": "add_text", "text": "Launch clip", "position": "top-center"},
{"op": "normalize_audio"},
{"op": "resize", "aspect_ratio": "9:16"},
{"op": "export", "quality": "high"},
{"op": "release_checkpoint"},
],
output_path="final-short.mp4",
)
Safety contract:
search_tools() and Client.inspect().MCPVideoError guidance.video_quality_check, video_release_checkpoint, and human visual/audio inspection.Development verification lives in docs/TESTING.md. Keep public-surface, media workflow, and security checks current when changing tool behavior.
git clone https://github.com/KyaniteLabs/mcp-video.git
cd mcp-video
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v -m "not slow and not hyperframes"
Apache 2.0. See LICENSE.
Built with FFmpeg, Hyperframes, and the Model Context Protocol.
More from KyaniteLabs. Related projects:
→ More at kyanitelabs.tech
io.github.socialapishub/social-media-api
io.github.xpaysh/social-media
com.thenextgennexus/youtube-media-mcp-server
io.github.ludmila-omlopes/youtube-video-analyzer
csoai-org/social-media-ai-mcp
com.ezbizservices/social-media