A remote MCP server that runs full multimodal analysis on video URLs from YouTube, TikTok, Instagram, Vimeo, Twitter, and direct links. It exposes four tools: quick_transcribe for timestamped audio with speaker ID, deep_analyze for the full pipeline (transcript plus keyframe vision plus OCR in one structured output), clip_context for analyzing specific timestamp ranges, and batch_analyze for processing up to 10 videos in parallel. Connects over streamable HTTP with OAuth, no local installation. Built on yt-dlp, Groq Whisper, Tesseract OCR, and Claude Vision. Reach for it when you need an agent to reason over what's shown on screen, not just what's said in the audio.
Public tool metadata for what this MCP can expose to an agent.
quick_transcribeTRANSCRIPTION ONLY (no visual analysis). For vision, OCR, charts, or keyframe analysis, use deep_analyze instead. Returns timestamped transcript with speaker labels. Supports YouTube, Instagram Reels, Vimeo, Twitter/X, TikTok, and direct video URLs. Costs 1 credit.2 paramsTRANSCRIPTION ONLY (no visual analysis). For vision, OCR, charts, or keyframe analysis, use deep_analyze instead. Returns timestamped transcript with speaker labels. Supports YouTube, Instagram Reels, Vimeo, Twitter/X, TikTok, and direct video URLs. Costs 1 credit.
languagestringvideo_urlstringdeep_analyzePRIMARY tool for video understanding. Full multimodal pipeline: transcript + visual keyframe analysis + OCR + chart/diagram extraction. Returns unified analysis with Summary, Key Claims, Visual Assets, Data Extracted, Entities Mentioned. Costs 5 credits.2 paramsPRIMARY tool for video understanding. Full multimodal pipeline: transcript + visual keyframe analysis + OCR + chart/diagram extraction. Returns unified analysis with Summary, Key Claims, Visual Assets, Data Extracted, Entities Mentioned. Costs 5 credits.
focusstringgeneral · technical · cryptodefault: generalvideo_urlstringclip_contextAnalyze a specific segment of a video by timestamp range. Default is full multimodal analysis (transcript + vision + OCR, 3 credits). Pass mode='quick' for transcript-only (1 credit). Use when you only need a section, not the full video.4 paramsAnalyze a specific segment of a video by timestamp range. Default is full multimodal analysis (transcript + vision + OCR, 3 credits). Pass mode='quick' for transcript-only (1 credit). Use when you only need a section, not the full video.
modestringquick · deepdefault: deepend_timestringvideo_urlstringstart_timestringbatch_analyzeProcess multiple videos and get cross-video synthesis. Max 10 URLs. Returns individual results plus common themes, entity overlap, and contradictions. Credit cost is per-video rate with 10% discount on 5+ videos.2 paramsProcess multiple videos and get cross-video synthesis. Max 10 URLs. Returns individual results plus common themes, entity overlap, and contradictions. Credit cost is per-video rate with 10% discount on 5+ videos.
modestringquick · deepdefault: quickvideo_urlsarrayGive your agent eyes.
Contendeo is the multimodal layer that lets your AI actually see video — not just read its transcript.
Contendeo is a remote MCP server that gives LLMs frame-level context from video — transcription, keyframe vision analysis, OCR, and structured output, unified into a single response your agent can reason over.
Paste a YouTube, Instagram Reels, Vimeo, Twitter/X, TikTok, or direct video URL into Claude (or any MCP client). Contendeo downloads, transcribes, extracts keyframes, runs OCR, analyzes visuals, and returns a structured document.
Live at contendeo.app. MCP endpoint at contendeo.app/mcp/.
Transcripts capture what was said. They don't capture what was shown — chart values, UI states, overlays, code on screen, circled regions, dashboard numbers. For any video where the visual layer carries information (trading tutorials, product demos, technical walkthroughs, data dashboards, design reviews), transcript-only analysis misses half the signal.
Contendeo closes that gap.
See the side-by-side comparison at contendeo.app/demo.
Four tools exposed over MCP. Credit costs charged per successful call. Cache hits are free.
| Tool | Credits | What it does |
|---|---|---|
quick_transcribe | 1 | Timestamped transcript with speaker identification. Audio-only. Fast. |
deep_analyze | 5 | Full multimodal pipeline — transcript + keyframe vision + OCR, unified output. |
clip_context | 1 (quick) / 3 (deep) | Analyze a specific timestamp range without paying for the full video. |
batch_analyze | per-video, −10% at 5+ | Process up to 10 videos in parallel with cross-video synthesis. |
Full schemas and response formats: docs/tools.md.
Contendeo is a remote MCP server — no local install, no package download. Connect via URL from your MCP client.
Add to your claude_desktop_config.json:
{
"mcpServers": {
"contendeo": {
"url": "https://contendeo.app/mcp/"
}
}
}
Claude will walk you through OAuth on first use.
claude mcp add --transport http contendeo https://contendeo.app/mcp/
Any client that supports remote MCP servers over streamable HTTP. Point it at https://contendeo.app/mcp/.
Full walkthrough with screenshots: docs/installation.md.
Contendeo uses OAuth 2.0. First use flow:
New accounts get 10 free credits. No card required.
Details: docs/authentication.md.
| Plan | Price | Credits | Notes |
|---|---|---|---|
| Free | $0 | 10 on signup | All 4 tools, no card |
| Pro | $12/mo (₹999) | 100/mo | Priority queue, rollover to 200 |
| Power | $39/mo (₹3,299) | 500/mo | Batch, webhooks, rollover to 1000 |
| PAYG | $0.15/credit | on demand | No subscription |
Cache hits are free. Failed analyses auto-refund.
Details: docs/pricing.md.
Under the hood:
docs/Questions, bug reports, feature requests: open an issue.
Commercial/partnership inquiries: @0xKaroshi on X.
MIT — covers this wrapper repo (documentation, examples, manifest). The production server source is not open-sourced.
See LICENSE.
Contendeo — every frame, every word, every edge.