Connects Claude to OpenAI, xAI, Gemini, ElevenLabs, and BFL APIs for generating and editing images, videos, audio, and transcriptions through a single interface. Exposes tools like generate_image, generate_video, generate_audio, and transcribe_audio with automatic provider selection based on which API keys you've configured. You can explicitly choose a provider per request or let it auto-select from what's available. All generated media saves to disk with descriptive filenames. Reach for this when you want Claude to generate visual or audio content without writing provider-specific code for each API, or when you're working across multiple media generation services and want consistent tool parameters.
Multi-provider media generation MCP server. Generate images, videos, audio, and transcriptions from text prompts using OpenAI, xAI, Gemini, ElevenLabs, and BFL (FLUX) through a single unified interface.
Set the API key for at least one provider. Most users only need one — add more to access additional providers.
# Using OpenAI
claude mcp add multimodal-mcp -e OPENAI_API_KEY=sk-... -- npx -y @r16t/multimodal-mcp@latest
# Or using xAI
# claude mcp add multimodal-mcp -e XAI_API_KEY=xai-... -- npx -y @r16t/multimodal-mcp@latest
# Or using Gemini
# claude mcp add multimodal-mcp -e GEMINI_API_KEY=AIza... -- npx -y @r16t/multimodal-mcp@latest
# Or using ElevenLabs (audio + transcription)
# claude mcp add multimodal-mcp -e ELEVENLABS_API_KEY=xi-... -- npx -y @r16t/multimodal-mcp@latest
# Or using BFL/FLUX (images)
# claude mcp add multimodal-mcp -e BFL_API_KEY=... -- npx -y @r16t/multimodal-mcp@latest
Using a different editor? See setup instructions for Claude Desktop, Cursor, VS Code, Windsurf, and Cline.
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY | At least one provider key | OpenAI API key — enables image, video, audio generation, and transcription via gpt-image-1, sora-2, tts-1, and whisper-1 |
XAI_API_KEY | At least one provider key | xAI API key — enables image and video generation via grok-imagine-image and grok-imagine-video |
GEMINI_API_KEY | At least one provider key | Gemini API key — enables image, video, and audio generation via imagen-4, veo-3.1, and gemini-2.5-flash-preview-tts |
GOOGLE_API_KEY | — | Alias for GEMINI_API_KEY; either name is accepted |
ELEVENLABS_API_KEY | At least one provider key | ElevenLabs API key — enables audio generation (TTS, sound effects) and transcription via Flash v2.5 and Scribe v1 |
BFL_API_KEY | At least one provider key | BFL API key — enables image generation and editing via FLUX Pro 1.1 and FLUX Kontext |
MEDIA_OUTPUT_DIR | No | Directory for saved media files. Defaults to the current working directory |
generate_imageGenerate an image from a text prompt.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text description of the image to generate |
provider | string | No | Provider to use: openai, xai, google, bfl. Auto-selects if omitted |
aspectRatio | string | No | Aspect ratio: 1:1, 16:9, 9:16, 4:3, 3:4 |
quality | string | No | Quality level: low, standard, high |
outputDirectory | string | No | Directory to save the generated file. Absolute or relative path. Defaults to MEDIA_OUTPUT_DIR or cwd |
providerOptions | object | No | Provider-specific parameters passed through directly |
generate_videoGenerate a video from a text prompt. Video generation is asynchronous and may take several minutes.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text description of the video to generate |
provider | string | No | Provider to use: openai, xai, google. Auto-selects if omitted |
duration | number | No | Video duration in seconds (provider limits apply) |
aspectRatio | string | No | Aspect ratio: 16:9, 9:16, 1:1 |
resolution | string | No | Resolution: 480p, 720p, 1080p |
outputDirectory | string | No | Directory to save the generated file. Absolute or relative path. Defaults to MEDIA_OUTPUT_DIR or cwd |
providerOptions | object | No | Provider-specific parameters passed through directly |
generate_audioGenerate audio from text. Supports text-to-speech and sound effects. Audio generation is synchronous.
| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | Yes | Text to convert to speech, or a description of the sound effect to generate |
provider | string | No | Provider to use: openai, google, elevenlabs. Auto-selects if omitted |
voice | string | No | Voice name (provider-specific). OpenAI: alloy, ash, coral, echo, fable, nova, onyx, sage, shimmer. Google: Kore, Charon, Fenrir, Aoede, Puck, etc. ElevenLabs: voice ID |
speed | number | No | Speech speed multiplier (OpenAI only): 0.25 to 4.0 |
format | string | No | Output format (OpenAI only): mp3, opus, aac, flac, wav, pcm |
outputDirectory | string | No | Directory to save the generated file. Absolute or relative path. Defaults to MEDIA_OUTPUT_DIR or cwd |
providerOptions | object | No | Provider-specific parameters passed through directly. ElevenLabs: set mode: "sound-effect" for sound effects, model for TTS model selection |
transcribe_audioTranscribe audio to text (speech-to-text).
| Parameter | Type | Required | Description |
|---|---|---|---|
audioPath | string | Yes | Absolute path to the audio file to transcribe |
provider | string | No | Provider to use: openai, elevenlabs. Auto-selects if omitted |
language | string | No | Language code (e.g., en, fr, es) to hint the transcription language |
providerOptions | object | No | Provider-specific parameters passed through directly |
list_providersList all configured media generation providers and their capabilities. Takes no parameters.
| Provider | Image | Image Editing | Video | Audio | Transcription | Key Models |
|---|---|---|---|---|---|---|
| OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | gpt-image-1, sora-2, tts-1, whisper-1 |
| xAI | ✅ | ✅ | ✅ | — | — | grok-imagine-image, grok-imagine-video |
| Gemini | ✅ | ✅ | ✅ | ✅ | — | imagen-4, veo-3.1, gemini-2.5-flash-preview-tts |
| ElevenLabs | — | — | — | ✅ | ✅ | eleven_flash_v2_5, scribe_v1 |
| BFL | ✅ | ✅ | — | — | — | flux-pro-1.1, flux-kontext-pro |
| Provider | 1:1 | 16:9 | 9:16 | 4:3 | 3:4 |
|---|---|---|---|---|---|
| OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ |
| xAI | ✅ | ✅ | ✅ | ✅ | ✅ |
| Gemini | ✅ | ✅ | ✅ | ✅ | ✅ |
| BFL | ✅ | ✅ | ✅ | ✅ | ✅ |
| Provider | 16:9 | 9:16 | 1:1 | 480p | 720p | 1080p |
|---|---|---|---|---|---|---|
| OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| xAI | ✅ | ✅ | ✅ | — | ✅ | ✅ |
| Gemini | ✅ | ✅ | — | — | ✅ | ✅ |
| Provider | mp3 | opus | aac | flac | wav | pcm |
|---|---|---|---|---|---|---|
| OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Gemini | — | — | — | — | ✅ | — |
| ElevenLabs | ✅ | ✅ | — | — | — | ✅ |
[config] No provider API keys detected
Set at least one of OPENAI_API_KEY, XAI_API_KEY, GEMINI_API_KEY, ELEVENLABS_API_KEY, or BFL_API_KEY in the MCP server's env block.
Each provider supports different media types (see Provider Capabilities). If you specify a provider that isn't configured (no API key) or doesn't support the requested media type, you'll receive an error. Omit the provider parameter to auto-select from configured providers.
Video generation polls for up to 10 minutes. If your video hasn't completed in that window, the request will fail with a timeout error. Try a shorter duration or a simpler prompt.
This indicates the xAI API returned an empty response. Check that your XAI_API_KEY is valid and that your prompt does not violate xAI content policies.
Verify your GEMINI_API_KEY has the Generative Language API enabled in Google Cloud Console.
npm run build # Compile TypeScript to build/
npm test # Run tests with Vitest
npm run lint # Lint and auto-fix with ESLint
npm run typecheck # Type-check without emitting
npm run dev # Watch mode for TypeScript compilation
Replace OPENAI_API_KEY with your provider of choice (XAI_API_KEY, GEMINI_API_KEY, ELEVENLABS_API_KEY, BFL_API_KEY). You can set multiple keys to enable multiple providers.
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"multimodal-mcp": {
"command": "npx",
"args": ["@r16t/multimodal-mcp@latest"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}
Add to .cursor/mcp.json in your project root (or ~/.cursor/mcp.json globally):
{
"mcpServers": {
"multimodal-mcp": {
"command": "npx",
"args": ["@r16t/multimodal-mcp@latest"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}
Add to .vscode/mcp.json in your project root:
{
"servers": {
"multimodal-mcp": {
"command": "npx",
"args": ["@r16t/multimodal-mcp@latest"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}
Add to ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"multimodal-mcp": {
"command": "npx",
"args": ["@r16t/multimodal-mcp@latest"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}
Add to ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json:
{
"mcpServers": {
"multimodal-mcp": {
"command": "npx",
"args": ["@r16t/multimodal-mcp@latest"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}
MIT
OPENAI_API_KEYsecretOpenAI API key for image, video, audio generation and transcription
XAI_API_KEYsecretxAI API key for image and video generation
GEMINI_API_KEYsecretGoogle Gemini API key for image, video, and audio generation
ELEVENLABS_API_KEYsecretElevenLabs API key for audio generation and transcription
BFL_API_KEYsecretBFL API key for FLUX image generation and editing
MEDIA_OUTPUT_DIRDirectory for saved media files (defaults to cwd)
io.github.socialapishub/social-media-api
io.github.xpaysh/social-media
com.thenextgennexus/youtube-media-mcp-server
io.github.ludmila-omlopes/youtube-video-analyzer
csoai-org/social-media-ai-mcp
com.ezbizservices/social-media