Wraps Google's Gemini 2.5 Live API and DeepMind's Lyria 3 models for audio synthesis tasks. Exposes four main tools: generate_soundscape for ambient textures, generate_music for structured compositions with BPM and key control, generate_voice for narration using native audio, and transition_soundscape for crossfading between environments. Uses a hybrid architecture with WebSocket connections for voice and REST calls for music generation, plus an internal Rust pipeline that handles PCM looping to extend short clips into seamless audio. Requires FFmpeg for transcoding and a Google AI Studio API key with Lyria access. Reach for this when you need programmatic audio generation in Claude workflows, like building soundscapes for games, generating background music, or creating dynamic narration.
Gemini Audio MCP is a high-performance Model Context Protocol (MCP) server engineered for professional-grade audio synthesis. It leverages the Gemini 2.5 Multimodal Live API and Google DeepMind's Lyria 3 models to deliver high-fidelity environmental soundscapes, musical compositions, and expressive narration on-demand.
Before deploying the server, ensure your environment meets the following technical requirements:
Required for high-performance audio encoding, decoding, and transcoding.
brew install ffmpegwinget install ffmpeg or download from ffmpeg.org.sudo apt update && sudo apt install ffmpegRequired to build the server from source.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | shRequired if using the pre-compiled NPM package.
node >= 18.0.0The fastest way to integrate the server into your MCP client (e.g., Claude Desktop).
{
"mcpServers": {
"gemini-audio": {
"command": "npx",
"args": ["-y", "gemini-audio-mcp"],
"env": {
"GEMINI_API_KEY": "YOUR_SECURE_API_KEY"
}
}
}
}
For maximum performance, build the Rust binary locally:
git clone https://github.com/mcp-servers/gemini-audio-mcp.git
cd gemini-audio-mcp
cargo build --release
./target/release/gemini-audio-mcp.The server requires a valid Google AI Studio API key.
GEMINI_API_KEY environment variable.generate_soundscape)Synthesizes immersive, vocal-free ambient textures.
{
"name": "generate_soundscape",
"arguments": {
"prompt": "Deep underwater abyss, low-frequency whale songs, rhythmic air bubbles rising, muffled aquatic pressure.",
"duration": 60,
"quality": "high",
"auto_play": true
}
}
generate_music)Generates structural compositions with optional vocal control.
{
"name": "generate_music",
"arguments": {
"prompt": "Melancholic solo cello in a vast cathedral with 5-second decay reverb.",
"bpm": 72,
"song_key": "D minor",
"intensity": 4
}
}
generate_voice)Narration and character dialogue using Gemini 2.5 Native Audio.
{
"name": "generate_voice",
"arguments": {
"text": "The artifacts are stable, but the rift remains open.",
"voice_direction": "Gravelly, urgent, whispered"
}
}
transition_soundscape)Crossfades two distinct environments for seamless scene transitions.
{
"name": "transition_soundscape",
"arguments": {
"from_prompt": "Quiet library silence.",
"to_prompt": "Sudden heavy rain on a tin roof.",
"transition_duration": 8
}
}
| Parameter | Type | Description |
|---|---|---|
seed | Integer | Ensures deterministic, reproducible audio outputs. |
image_path | String | Multimodal: Uses a local image to guide the acoustic mood (e.g., resonance). |
bpm | Number | Explicitly sets the rhythmic tempo (essential for music). |
intensity | Number | 1-10 scale controlling dynamic range and complexity. |
guidance | Number | 0.0-6.0 scale for prompt adherence (Lyria models). |
duration | Number | Target length in seconds. Triggers the Seamless Looping Engine. |
Gemini Audio MCP employs a unique Hybrid Engine Strategy:
decode -> crossfade -> loop -> encode) ensures that short clips are transformed into seamless, infinite soundscapes without audible clicks.ffmpeg is in your system PATH. Run ffmpeg -version in your terminal to verify.libmp3lame for MP3). Most standard FFmpeg installations include these.GEMINI_API_KEY is correct and that your account has access to the requested model (especially lyria-3-pro-preview).Licensed under the MIT License. Engineered with precision by the MCP community.
GEMINI_API_KEY*secretYour Google AI Studio API Key
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent