Straightforward speech-to-text that pipes audio through Groq's Whisper API and spits out clean text. You point it at an audio file (m4a, mp3, wav, whatever), and it returns transcription with proper punctuation and capitalization. Needs a Groq API key, which is free to get. The 25MB file limit is the main constraint, so this works for voice memos, meeting recordings, and podcast clips, but you'll need to chunk longer files. Does one thing well without any configuration fussing. If you're already using Groq for LLM calls, this slots right into your workflow since you're using the same API key.
npx -y skills add badlogic/pi-skills --skill transcribe --agent claude-codeInstalls into .claude/skills of the current project.
Local speech-to-text using parakeet-cpp-transcribe on Apple Silicon macOS.
{baseDir}/transcribe.sh <audio-file>
The first run downloads the macOS arm64 binary from the latest badlogic/pibot GitHub release into the extension's ignored bin directory:
{extensionDir}/bin/parakeet-cpp-transcribe
({extensionDir} is the parent directory of this skill's {baseDir}.) The binary downloads its GGUF model automatically if missing.
Plain text timestamped in 15 second chunks is written to stdout:
[00:00-00:15] transcript text
[00:15-00:30] more transcript text
Model/GGML diagnostic logs are written to stderr. Redirect stderr to hide them:
{baseDir}/transcribe.sh <audio-file> 2>/dev/null
curl and tarffmpeg for non-WAV input: brew install ffmpegjuliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills