Connects Claude to Sber's SaluteSpeech API for Russian speech recognition and synthesis. You get five tools: recognize_speech for transcribing base64 audio, synthesize_speech for generating TTS audio, list_models to see available voices and recognition models, get_task_status for async jobs, and recognize_file for local audio files. Handles OAuth token management automatically once you provide client credentials. Part of a larger Russian API MCP collection with 50 servers. Reach for this when you need Claude to transcribe Russian audio files or generate speech output using Sber's voices. Includes guided workflows for common transcription and synthesis tasks. Works over stdio or as an HTTP endpoint.
MCP server for Sber SaluteSpeech API — speech recognition (STT) and synthesis (TTS). 5 tools.
Part of Russian API MCP (50 servers) by @theYahia.
{
"mcpServers": {
"salutespeech": {
"command": "npx",
"args": ["-y", "@theyahia/salutespeech-mcp"],
"env": { "SALUTESPEECH_API_KEY": "your-base64-key" }
}
}
}
claude mcp add salutespeech -e SALUTESPEECH_API_KEY=your-key -- npx -y @theyahia/salutespeech-mcp
SALUTESPEECH_API_KEY=your-key npx @theyahia/salutespeech-mcp --http --port=3000
# POST http://localhost:3000/mcp
# GET http://localhost:3000/health
Three options (checked in order):
| Env var | Format |
|---|---|
SALUTESPEECH_API_KEY | Base64-encoded client_id:client_secret |
SALUTE_AUTH_KEY | Same (legacy alias) |
SALUTE_SPEECH_CLIENT_ID + SALUTE_SPEECH_CLIENT_SECRET | Raw credentials (auto-encoded) |
OAuth tokens are obtained and refreshed automatically. The scope defaults to
SALUTE_SPEECH_PERS (individuals); set SALUTE_SPEECH_SCOPE for corporate accounts
(SALUTE_SPEECH_CORP — postpaid, SALUTE_SPEECH_B2B — prepaid).
Get credentials at developers.sber.ru.
| Tool | Description |
|---|---|
recognize_speech | STT from Base64 audio |
synthesize_speech | TTS, returns Base64 audio |
list_models | List recognition models and synthesis voices |
get_task_status | Check async recognition task status |
recognize_file | STT from a local file path |
skill-transcribe — guided workflow for audio transcriptionskill-synthesize — guided workflow for speech synthesisTranscribe the audio file /tmp/meeting.wav
Synthesize "Hello world" with voice Bys_24000 in wav16 format
List available voices
recognize_speech / recognize_file use the synchronous endpoint, capped at
2 MB / 1 minute of audio (larger input returns HTTP 413). For multi-channel audio only
the first channel is recognized. Longer recordings need the asynchronous flow
(data:upload → speech:async_recognize → task:get → data:download) — not yet exposed
as tools; get_task_status covers the polling step.
Synthesis input is capped at 4000 characters (incl. spaces and SSML markup).
self-signed certificate in certificate chain / UNABLE_TO_VERIFY_LEAF_SIGNATURESber's endpoints use the Russian Trusted Root CA (НУЦ Минцифры), which is not in Node.js's default trust store — so the very first OAuth call fails until you trust it.
Fix: download the root CA (russian_trusted_root_ca_pem.crt) from
gosuslugi.ru/crt and point Node at it:
export NODE_EXTRA_CA_CERTS=/path/to/russian_trusted_root_ca_pem.crt
In an MCP client, add it to the server's env block. Do not set
NODE_TLS_REJECT_UNAUTHORIZED=0 in production — it disables TLS verification entirely.
Official guide: SaluteSpeech certificates.
MIT
SALUTE_AUTH_KEY*secretAPI key for the service