This handles local speech-to-text using Faster Whisper, built specifically for the JARVIS voice assistant. It's a complete TDD-based implementation that prioritizes privacy (processes locally, deletes audio immediately) and real-time performance with VAD filtering and streaming transcription. The guide walks through model selection from tiny to large-v3, includes concrete latency targets (under 300ms for short audio), and shows optimization patterns like int8 quantization for CPU and chunked processing to avoid waiting for complete recordings. The medium risk rating reflects audio processing and privacy concerns, but the approach is solid if you need offline voice recognition without cloud dependencies.
npx -y skills add martinholovsky/claude-skills-generator --skill speech-to-text --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
sickn33/antigravity-awesome-skills
rohitg00/pro-workflow
supercent-io/skills-template