Gives Claude microphone access with three tools: list audio devices, capture raw WAV files to disk, and run a full offline voice pipeline. The voice_query tool chains local whisper.cpp transcription with Ollama, so you can speak a question and get an LLM response without anything leaving your machine. Built on decibri for cross-platform audio capture with no ffmpeg dependencies. You specify recording duration up front since there's no VAD stop detection. Useful if you want voice input in Claude Desktop or need to prototype voice workflows that stay local.
Give your AI agents the ability to listen
Microphone capture and speech-to-text tools for MCP-compatible agents.
| Tool | Description |
|---|---|
list_audio_devices | List available microphone input devices |
capture_audio | Record audio from the microphone and save as WAV |
voice_query | Capture, transcribe (whisper.cpp), and query a local LLM (Ollama) |
claude mcp add mcp-listen npx mcp-listen
Add to your MCP configuration:
{
"mcpServers": {
"mcp-listen": {
"command": "npx",
"args": ["-y", "mcp-listen"]
}
}
}
Compatible with Claude Desktop, ChatGPT Desktop, Cursor, GitHub Copilot, Windsurf, VS Code, Gemini, Zed, and any MCP-compatible client.
npm install -g mcp-listen
For list_audio_devices and capture_audio:
For voice_query (optional):
Returns a JSON array of available audio input devices.
Parameters: None
Example response:
[
{ "index": 3, "name": "Microphone (Creative Live! Cam)", "isDefault": true, "maxInputChannels": 2, "defaultSampleRate": 48000 },
{ "index": 4, "name": "Microphone Array (Intel)", "isDefault": false, "maxInputChannels": 2, "defaultSampleRate": 48000 }
]
Records audio from the microphone and saves as a WAV file.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
duration_ms | number | 5000 | Recording duration in milliseconds (100-30000) |
device | number | system default | Device index from list_audio_devices |
Example response:
{
"path": "/tmp/mcp-listen-1712345678901.wav",
"duration_ms": 5000,
"sample_rate": 16000,
"channels": 1,
"size_bytes": 160044
}
Full voice pipeline: capture audio, transcribe with whisper.cpp, send to Ollama, return the response. Entirely offline.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
duration_ms | number | 5000 | Recording duration in milliseconds (100-30000) |
device | number | system default | Device index from list_audio_devices |
whisper_model | string | ggml-base.en.bin | Path or filename of Whisper GGML model |
language | string | en | Language code for transcription |
model | string | llama3.2 | Ollama model name |
prompt | string | You are a helpful assistant. | System prompt for the LLM |
Example response:
{
"transcription": "What is the default port for PostgreSQL?",
"response": "PostgreSQL runs on port 5432 by default.",
"model": "llama3.2"
}
mcp-listen uses decibri for cross-platform microphone capture. No ffmpeg, no SoX, no system audio tools required. Pre-built native binaries with zero setup.
Audio is captured as 16-bit PCM at 16kHz mono, the standard format for speech-to-text engines.
The voice_query tool replicates the pipeline from voxagent: capture audio, transcribe locally with whisper.cpp, and send to a local Ollama LLM. Fully offline, nothing leaves your machine.
The voice_query tool requires a Whisper GGML model file. Download one:
Linux / macOS:
mkdir -p ~/.mcp-listen/models
curl -L -o ~/.mcp-listen/models/ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
Windows (PowerShell):
mkdir "$env:USERPROFILE\.mcp-listen\models" -Force
Invoke-WebRequest -Uri "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin" -OutFile "$env:USERPROFILE\.mcp-listen\models\ggml-base.en.bin"
The model is ~150MB and downloads once. You can also set the WHISPER_MODEL_PATH environment variable to a custom directory.
ollama pull llama3.2ollama servevoice_query requires Ollama running. If Ollama isn't running, the tool returns a clear error message.voice_query call requires a pre-downloaded model (~150MB).capture_audio writes WAV files to the system temp directory. They are not automatically cleaned up. voice_query cleans up after itself.Windows: "Error opening microphone" Windows may block microphone access by default. Go to Settings > Privacy & security > Microphone and ensure microphone access is enabled for desktop apps.
Ollama: "Ollama is not running"
Some Ollama installations start as a background service automatically. If you see this error, run ollama serve manually or check that the Ollama service is running.
Whisper: "model not found" The whisper model file must be downloaded before first use. See Whisper Model Setup for instructions.
Apache-2.0. See LICENSE for details.
Copyright 2026 Decibri