Mcp Listen

5STDIOregistry active

Summary

Gives Claude microphone access with three tools: list audio devices, capture raw WAV files to disk, and run a full offline voice pipeline. The voice_query tool chains local whisper.cpp transcription with Ollama, so you can speak a question and get an LLM response without anything leaving your machine. Built on decibri for cross-platform audio capture with no ffmpeg dependencies. You specify recording duration up front since there's no VAD stop detection. Useful if you want voice input in Claude Desktop or need to prototype voice workflows that stay local.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

mcp-listen

Give your AI agents the ability to listen

Microphone capture and speech-to-text tools for MCP-compatible agents.

Meta
Powered by

Tools

Tool	Description
`list_audio_devices`	List available microphone input devices
`capture_audio`	Record audio from the microphone and save as WAV
`voice_query`	Capture, transcribe (whisper.cpp), and query a local LLM (Ollama)

Quick Start

Claude Code

claude mcp add mcp-listen npx mcp-listen

Claude Desktop / ChatGPT Desktop / Cursor / Windsurf / VS Code

Add to your MCP configuration:

{
  "mcpServers": {
    "mcp-listen": {
      "command": "npx",
      "args": ["-y", "mcp-listen"]
    }
  }
}

Compatible with Claude Desktop, ChatGPT Desktop, Cursor, GitHub Copilot, Windsurf, VS Code, Gemini, Zed, and any MCP-compatible client.

Global Install

npm install -g mcp-listen

Requirements

For list_audio_devices and capture_audio:

Node.js 18+
A microphone

For voice_query (optional):

Ollama running locally
Whisper GGML model (see Whisper Model Setup)

Tool Reference

list_audio_devices

Returns a JSON array of available audio input devices.

Parameters: None

Example response:

[
  { "index": 3, "name": "Microphone (Creative Live! Cam)", "isDefault": true, "maxInputChannels": 2, "defaultSampleRate": 48000 },
  { "index": 4, "name": "Microphone Array (Intel)", "isDefault": false, "maxInputChannels": 2, "defaultSampleRate": 48000 }
]

capture_audio

Records audio from the microphone and saves as a WAV file.

Parameters:

Parameter	Type	Default	Description
`duration_ms`	number	5000	Recording duration in milliseconds (100-30000)
`device`	number	system default	Device index from `list_audio_devices`

Example response:

{
  "path": "/tmp/mcp-listen-1712345678901.wav",
  "duration_ms": 5000,
  "sample_rate": 16000,
  "channels": 1,
  "size_bytes": 160044
}

voice_query

Full voice pipeline: capture audio, transcribe with whisper.cpp, send to Ollama, return the response. Entirely offline.

Parameters:

Parameter	Type	Default	Description
`duration_ms`	number	5000	Recording duration in milliseconds (100-30000)
`device`	number	system default	Device index from `list_audio_devices`
`whisper_model`	string	ggml-base.en.bin	Path or filename of Whisper GGML model
`language`	string	en	Language code for transcription
`model`	string	llama3.2	Ollama model name
`prompt`	string	You are a helpful assistant.	System prompt for the LLM

Example response:

{
  "transcription": "What is the default port for PostgreSQL?",
  "response": "PostgreSQL runs on port 5432 by default.",
  "model": "llama3.2"
}

How It Works

mcp-listen uses decibri for cross-platform microphone capture. No ffmpeg, no SoX, no system audio tools required. Pre-built native binaries with zero setup.

Audio is captured as 16-bit PCM at 16kHz mono, the standard format for speech-to-text engines.

The voice_query tool replicates the pipeline from voxagent: capture audio, transcribe locally with whisper.cpp, and send to a local Ollama LLM. Fully offline, nothing leaves your machine.

Whisper Model Setup

The voice_query tool requires a Whisper GGML model file. Download one:

Linux / macOS:

mkdir -p ~/.mcp-listen/models
curl -L -o ~/.mcp-listen/models/ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin

Windows (PowerShell):

mkdir "$env:USERPROFILE\.mcp-listen\models" -Force
Invoke-WebRequest -Uri "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin" -OutFile "$env:USERPROFILE\.mcp-listen\models\ggml-base.en.bin"

The model is ~150MB and downloads once. You can also set the WHISPER_MODEL_PATH environment variable to a custom directory.

Ollama Setup

Install Ollama from https://ollama.com
Pull a model: ollama pull llama3.2
Ensure Ollama is running: ollama serve

Known Limitations

Fixed recording duration. You specify how long to record. There is no "stop when I stop talking" mode yet.
voice_query requires Ollama running. If Ollama isn't running, the tool returns a clear error message.
Whisper model downloads on first use. The first voice_query call requires a pre-downloaded model (~150MB).
No streaming. MCP's request/response pattern means the entire recording is captured, then transcribed, then sent to the LLM. No real-time partial results.
Temp files. capture_audio writes WAV files to the system temp directory. They are not automatically cleaned up. voice_query cleans up after itself.

Troubleshooting

Windows: "Error opening microphone" Windows may block microphone access by default. Go to Settings > Privacy & security > Microphone and ensure microphone access is enabled for desktop apps.

Ollama: "Ollama is not running" Some Ollama installations start as a background service automatically. If you see this error, run ollama serve manually or check that the Ollama service is running.

Whisper: "model not found" The whisper model file must be downloaded before first use. See Whisper Model Setup for instructions.

Powered By

decibri: Cross-platform microphone capture for Node.js
voxagent: Voice-powered terminal agent (inspiration for the voice_query pipeline)

License

Apache-2.0. See LICENSE for details.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Registryactive

Packagemcp-listen

TransportSTDIO

UpdatedApr 5, 2026

View on GitHub

mcp-listen

Give your AI agents the ability to listen

Microphone capture and speech-to-text tools for MCP-compatible agents.

Meta
Powered by

Tools

Tool	Description
`list_audio_devices`	List available microphone input devices
`capture_audio`	Record audio from the microphone and save as WAV
`voice_query`	Capture, transcribe (whisper.cpp), and query a local LLM (Ollama)

Quick Start

Claude Code

claude mcp add mcp-listen npx mcp-listen

Claude Desktop / ChatGPT Desktop / Cursor / Windsurf / VS Code

Add to your MCP configuration:

{
  "mcpServers": {
    "mcp-listen": {
      "command": "npx",
      "args": ["-y", "mcp-listen"]
    }
  }
}

Compatible with Claude Desktop, ChatGPT Desktop, Cursor, GitHub Copilot, Windsurf, VS Code, Gemini, Zed, and any MCP-compatible client.

Global Install

npm install -g mcp-listen

Requirements

For list_audio_devices and capture_audio:

Node.js 18+
A microphone

For voice_query (optional):

Ollama running locally
Whisper GGML model (see Whisper Model Setup)

Tool Reference

list_audio_devices

Returns a JSON array of available audio input devices.

Parameters: None

Example response:

[
  { "index": 3, "name": "Microphone (Creative Live! Cam)", "isDefault": true, "maxInputChannels": 2, "defaultSampleRate": 48000 },
  { "index": 4, "name": "Microphone Array (Intel)", "isDefault": false, "maxInputChannels": 2, "defaultSampleRate": 48000 }
]

capture_audio

Records audio from the microphone and saves as a WAV file.

Parameters:

Parameter	Type	Default	Description
`duration_ms`	number	5000	Recording duration in milliseconds (100-30000)
`device`	number	system default	Device index from `list_audio_devices`

Example response:

{
  "path": "/tmp/mcp-listen-1712345678901.wav",
  "duration_ms": 5000,
  "sample_rate": 16000,
  "channels": 1,
  "size_bytes": 160044
}

voice_query

Full voice pipeline: capture audio, transcribe with whisper.cpp, send to Ollama, return the response. Entirely offline.

Parameters:

Parameter	Type	Default	Description
`duration_ms`	number	5000	Recording duration in milliseconds (100-30000)
`device`	number	system default	Device index from `list_audio_devices`
`whisper_model`	string	ggml-base.en.bin	Path or filename of Whisper GGML model
`language`	string	en	Language code for transcription
`model`	string	llama3.2	Ollama model name
`prompt`	string	You are a helpful assistant.	System prompt for the LLM

Example response:

{
  "transcription": "What is the default port for PostgreSQL?",
  "response": "PostgreSQL runs on port 5432 by default.",
  "model": "llama3.2"
}

How It Works

mcp-listen uses decibri for cross-platform microphone capture. No ffmpeg, no SoX, no system audio tools required. Pre-built native binaries with zero setup.

Audio is captured as 16-bit PCM at 16kHz mono, the standard format for speech-to-text engines.

The voice_query tool replicates the pipeline from voxagent: capture audio, transcribe locally with whisper.cpp, and send to a local Ollama LLM. Fully offline, nothing leaves your machine.

Whisper Model Setup

The voice_query tool requires a Whisper GGML model file. Download one:

Linux / macOS:

mkdir -p ~/.mcp-listen/models
curl -L -o ~/.mcp-listen/models/ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin

Windows (PowerShell):

mkdir "$env:USERPROFILE\.mcp-listen\models" -Force
Invoke-WebRequest -Uri "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin" -OutFile "$env:USERPROFILE\.mcp-listen\models\ggml-base.en.bin"

The model is ~150MB and downloads once. You can also set the WHISPER_MODEL_PATH environment variable to a custom directory.

Ollama Setup

Install Ollama from https://ollama.com
Pull a model: ollama pull llama3.2
Ensure Ollama is running: ollama serve

Known Limitations

Fixed recording duration. You specify how long to record. There is no "stop when I stop talking" mode yet.
voice_query requires Ollama running. If Ollama isn't running, the tool returns a clear error message.
Whisper model downloads on first use. The first voice_query call requires a pre-downloaded model (~150MB).
No streaming. MCP's request/response pattern means the entire recording is captured, then transcribed, then sent to the LLM. No real-time partial results.
Temp files. capture_audio writes WAV files to the system temp directory. They are not automatically cleaned up. voice_query cleans up after itself.

Troubleshooting

Windows: "Error opening microphone" Windows may block microphone access by default. Go to Settings > Privacy & security > Microphone and ensure microphone access is enabled for desktop apps.

Whisper: "model not found" The whisper model file must be downloaded before first use. See Whisper Model Setup for instructions.

Powered By

decibri: Cross-platform microphone capture for Node.js
voxagent: Voice-powered terminal agent (inspiration for the voice_query pipeline)

License

Apache-2.0. See LICENSE for details.