CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Multimodal

rsmdt/multimodal-mcp
1authSTDIOregistry active
Summary

Connects Claude to OpenAI, xAI, Gemini, ElevenLabs, and BFL APIs for generating and editing images, videos, audio, and transcriptions through a single interface. Exposes tools like generate_image, generate_video, generate_audio, and transcribe_audio with automatic provider selection based on which API keys you've configured. You can explicitly choose a provider per request or let it auto-select from what's available. All generated media saves to disk with descriptive filenames. Reach for this when you want Claude to generate visual or audio content without writing provider-specific code for each API, or when you're working across multiple media generation services and want consistent tool parameters.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

multimodal-mcp

Multi-provider media generation MCP server. Generate images, videos, audio, and transcriptions from text prompts using OpenAI, xAI, Gemini, ElevenLabs, and BFL (FLUX) through a single unified interface.

Features

  • 🎨 Image Generation — Generate images via OpenAI (gpt-image-1), xAI (grok-imagine-image), Gemini (imagen-4), or BFL (FLUX Pro 1.1)
  • ✏️ Image Editing — Edit images via OpenAI, xAI, Gemini, or BFL (FLUX Kontext)
  • 🎬 Video Generation — Generate videos via OpenAI (sora-2), xAI (grok-imagine-video), or Gemini (veo-3.1)
  • 🔊 Audio Generation — Text-to-speech via OpenAI (tts-1), Gemini, or ElevenLabs (Flash v2.5). Sound effects via ElevenLabs
  • 🎙️ Audio Transcription — Speech-to-text via OpenAI (Whisper) or ElevenLabs (Scribe)
  • 🔄 Auto-Discovery — Automatically detects configured providers from environment variables
  • 🎯 Provider Selection — Auto-selects or explicitly choose a provider per request
  • 📁 File Output — Saves all generated media to disk with descriptive filenames

Quick Start

Set the API key for at least one provider. Most users only need one — add more to access additional providers.

# Using OpenAI
claude mcp add multimodal-mcp -e OPENAI_API_KEY=sk-... -- npx -y @r16t/multimodal-mcp@latest

# Or using xAI
# claude mcp add multimodal-mcp -e XAI_API_KEY=xai-... -- npx -y @r16t/multimodal-mcp@latest

# Or using Gemini
# claude mcp add multimodal-mcp -e GEMINI_API_KEY=AIza... -- npx -y @r16t/multimodal-mcp@latest

# Or using ElevenLabs (audio + transcription)
# claude mcp add multimodal-mcp -e ELEVENLABS_API_KEY=xi-... -- npx -y @r16t/multimodal-mcp@latest

# Or using BFL/FLUX (images)
# claude mcp add multimodal-mcp -e BFL_API_KEY=... -- npx -y @r16t/multimodal-mcp@latest

Using a different editor? See setup instructions for Claude Desktop, Cursor, VS Code, Windsurf, and Cline.

Environment Variables

VariableRequiredDescription
OPENAI_API_KEYAt least one provider keyOpenAI API key — enables image, video, audio generation, and transcription via gpt-image-1, sora-2, tts-1, and whisper-1
XAI_API_KEYAt least one provider keyxAI API key — enables image and video generation via grok-imagine-image and grok-imagine-video
GEMINI_API_KEYAt least one provider keyGemini API key — enables image, video, and audio generation via imagen-4, veo-3.1, and gemini-2.5-flash-preview-tts
GOOGLE_API_KEY—Alias for GEMINI_API_KEY; either name is accepted
ELEVENLABS_API_KEYAt least one provider keyElevenLabs API key — enables audio generation (TTS, sound effects) and transcription via Flash v2.5 and Scribe v1
BFL_API_KEYAt least one provider keyBFL API key — enables image generation and editing via FLUX Pro 1.1 and FLUX Kontext
MEDIA_OUTPUT_DIRNoDirectory for saved media files. Defaults to the current working directory

Available Tools

generate_image

Generate an image from a text prompt.

ParameterTypeRequiredDescription
promptstringYesText description of the image to generate
providerstringNoProvider to use: openai, xai, google, bfl. Auto-selects if omitted
aspectRatiostringNoAspect ratio: 1:1, 16:9, 9:16, 4:3, 3:4
qualitystringNoQuality level: low, standard, high
outputDirectorystringNoDirectory to save the generated file. Absolute or relative path. Defaults to MEDIA_OUTPUT_DIR or cwd
providerOptionsobjectNoProvider-specific parameters passed through directly

generate_video

Generate a video from a text prompt. Video generation is asynchronous and may take several minutes.

ParameterTypeRequiredDescription
promptstringYesText description of the video to generate
providerstringNoProvider to use: openai, xai, google. Auto-selects if omitted
durationnumberNoVideo duration in seconds (provider limits apply)
aspectRatiostringNoAspect ratio: 16:9, 9:16, 1:1
resolutionstringNoResolution: 480p, 720p, 1080p
outputDirectorystringNoDirectory to save the generated file. Absolute or relative path. Defaults to MEDIA_OUTPUT_DIR or cwd
providerOptionsobjectNoProvider-specific parameters passed through directly

generate_audio

Generate audio from text. Supports text-to-speech and sound effects. Audio generation is synchronous.

ParameterTypeRequiredDescription
textstringYesText to convert to speech, or a description of the sound effect to generate
providerstringNoProvider to use: openai, google, elevenlabs. Auto-selects if omitted
voicestringNoVoice name (provider-specific). OpenAI: alloy, ash, coral, echo, fable, nova, onyx, sage, shimmer. Google: Kore, Charon, Fenrir, Aoede, Puck, etc. ElevenLabs: voice ID
speednumberNoSpeech speed multiplier (OpenAI only): 0.25 to 4.0
formatstringNoOutput format (OpenAI only): mp3, opus, aac, flac, wav, pcm
outputDirectorystringNoDirectory to save the generated file. Absolute or relative path. Defaults to MEDIA_OUTPUT_DIR or cwd
providerOptionsobjectNoProvider-specific parameters passed through directly. ElevenLabs: set mode: "sound-effect" for sound effects, model for TTS model selection

transcribe_audio

Transcribe audio to text (speech-to-text).

ParameterTypeRequiredDescription
audioPathstringYesAbsolute path to the audio file to transcribe
providerstringNoProvider to use: openai, elevenlabs. Auto-selects if omitted
languagestringNoLanguage code (e.g., en, fr, es) to hint the transcription language
providerOptionsobjectNoProvider-specific parameters passed through directly

list_providers

List all configured media generation providers and their capabilities. Takes no parameters.

Provider Capabilities

ProviderImageImage EditingVideoAudioTranscriptionKey Models
OpenAI✅✅✅✅✅gpt-image-1, sora-2, tts-1, whisper-1
xAI✅✅✅——grok-imagine-image, grok-imagine-video
Gemini✅✅✅✅—imagen-4, veo-3.1, gemini-2.5-flash-preview-tts
ElevenLabs———✅✅eleven_flash_v2_5, scribe_v1
BFL✅✅———flux-pro-1.1, flux-kontext-pro

Image Aspect Ratios

Provider1:116:99:164:33:4
OpenAI✅✅✅✅✅
xAI✅✅✅✅✅
Gemini✅✅✅✅✅
BFL✅✅✅✅✅

Video Aspect Ratios & Resolutions

Provider16:99:161:1480p720p1080p
OpenAI✅✅✅✅✅✅
xAI✅✅✅—✅✅
Gemini✅✅——✅✅

Audio Formats

Providermp3opusaacflacwavpcm
OpenAI✅✅✅✅✅✅
Gemini————✅—
ElevenLabs✅✅———✅

Troubleshooting

No providers configured

[config] No provider API keys detected

Set at least one of OPENAI_API_KEY, XAI_API_KEY, GEMINI_API_KEY, ELEVENLABS_API_KEY, or BFL_API_KEY in the MCP server's env block.

Provider not available for requested media type

Each provider supports different media types (see Provider Capabilities). If you specify a provider that isn't configured (no API key) or doesn't support the requested media type, you'll receive an error. Omit the provider parameter to auto-select from configured providers.

Video generation timeout

Video generation polls for up to 10 minutes. If your video hasn't completed in that window, the request will fail with a timeout error. Try a shorter duration or a simpler prompt.

xAI image generation returned no data

This indicates the xAI API returned an empty response. Check that your XAI_API_KEY is valid and that your prompt does not violate xAI content policies.

Gemini image/video generation failed: 403

Verify your GEMINI_API_KEY has the Generative Language API enabled in Google Cloud Console.

Development

npm run build      # Compile TypeScript to build/
npm test           # Run tests with Vitest
npm run lint       # Lint and auto-fix with ESLint
npm run typecheck  # Type-check without emitting
npm run dev        # Watch mode for TypeScript compilation

Editor Setup

Replace OPENAI_API_KEY with your provider of choice (XAI_API_KEY, GEMINI_API_KEY, ELEVENLABS_API_KEY, BFL_API_KEY). You can set multiple keys to enable multiple providers.

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "multimodal-mcp": {
      "command": "npx",
      "args": ["@r16t/multimodal-mcp@latest"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Cursor

Add to .cursor/mcp.json in your project root (or ~/.cursor/mcp.json globally):

{
  "mcpServers": {
    "multimodal-mcp": {
      "command": "npx",
      "args": ["@r16t/multimodal-mcp@latest"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

VS Code (GitHub Copilot)

Add to .vscode/mcp.json in your project root:

{
  "servers": {
    "multimodal-mcp": {
      "command": "npx",
      "args": ["@r16t/multimodal-mcp@latest"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "multimodal-mcp": {
      "command": "npx",
      "args": ["@r16t/multimodal-mcp@latest"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Cline

Add to ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json:

{
  "mcpServers": {
    "multimodal-mcp": {
      "command": "npx",
      "args": ["@r16t/multimodal-mcp@latest"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

License

MIT

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

Configuration

OPENAI_API_KEYsecret

OpenAI API key for image, video, audio generation and transcription

XAI_API_KEYsecret

xAI API key for image and video generation

GEMINI_API_KEYsecret

Google Gemini API key for image, video, and audio generation

ELEVENLABS_API_KEYsecret

ElevenLabs API key for audio generation and transcription

BFL_API_KEYsecret

BFL API key for FLUX image generation and editing

MEDIA_OUTPUT_DIR

Directory for saved media files (defaults to cwd)

Categories
Media & Entertainment
Registryactive
Package@r16t/multimodal-mcp
TransportSTDIO
AuthRequired
UpdatedMar 3, 2026
View on GitHub

Related Media & Entertainment MCP Servers

View all →
Social Media Api

io.github.socialapishub/social-media-api

Unified social media API for AI agents. Access Facebook, Instagram, TikTok, and more.
1
xpay Social Media

io.github.xpaysh/social-media

96 social media scraping tools. Twitter/X, LinkedIn, Instagram, TikTok, Reddit, YouTube.
Youtube Media Mcp Server

com.thenextgennexus/youtube-media-mcp-server

YouTube video search with transcript extraction as first-class output.
Youtube Video Analyzer

io.github.ludmila-omlopes/youtube-video-analyzer

MCP stdio server for analyzing YouTube videos with Google Gemini
2
Social Media Ai Mcp

csoai-org/social-media-ai-mcp

social-media-ai-mcp MCP server by MEOK AI Labs
EzBiz Social Media Analytics

com.ezbizservices/social-media

AI-powered social media intelligence: profile analysis, engagement scoring, and trend detection.