Audio feedback for Claude Code using pocket-tts.
When the Claude Code agent completes a task, it provides a spoken summary of what was accomplished.
For a complete voice workflow, pair this TTS plugin with Handy (open-source) using the Parakeet V3 model for speech-to-text. It's stunningly fast with near-instant transcription.
The slight accuracy drop compared to larger models is immaterial when talking to an AI. Pro tip: Ask the agent to restate what it understood - this confirms understanding and helps keep the CLI agent on track.
- uv (for running pocket-tts via
uvx) - macOS (with
afplay) or Linux (withaplayorpaplay) - Recommended: FFmpeg (provides
ffplayfor lower-latency streaming audio)
Install from the cctools-plugins marketplace:
claude plugin add voiceThe plugin uses a multi-hook strategy to get fast, reliable voice summaries:
UserPromptSubmit hook → Injects full voice instructions each turn
↓
PostToolUse hook → Short reminder after each tool call
↓
Agent generates 📢 marker → "📢 Done, fixed the auth bug!"
↓
Stop hook extracts it → Instant playback (no API call!)
↓
[Fallback: headless Claude if agent forgets the marker]
UserPromptSubmit hook — Silently injects voice instructions at the start of
each turn, telling Claude to end longer responses with a 📢 spoken summary.
Uses additionalContext for silent injection (no terminal noise).
PostToolUse hook — Injects a brief reminder after each tool call to keep the voice instructions fresh during long tool chains where Claude might forget.
Stop hook — When the agent stops, this hook:
- Checks if voice is enabled (via
~/.claude/voice.local.md) - Looks for a
📢marker in the last assistant message (instant extraction) - If no marker but response is short (≤25 words), speaks it directly
- Falls back to headless Claude summarization only if needed
- Plays the audio via pocket-tts
- Short responses (≤25 words): Spoken directly, no summary needed
- Explicit summaries (📢 marker or headless Claude): Flexible 1.5× limit (37 words)
- Last resort truncation: Strict limit (25 words)
The limit is configurable via MAX_SPOKEN_WORDS in hooks/voice_common.py.
Control voice feedback with the slash command:
/voice:speak- Enable voice feedback/voice:speak <voice>- Set voice (e.g., azure, alba) and enable/voice:speak stop- Disable voice feedback/voice:speak prompt <text>- Set custom instruction for summaries/voice:speak prompt- Clear custom prompt
Config is stored in ~/.claude/voice.local.md.
Use custom prompts to personalize how summaries are delivered:
# Be more enthusiastic
/voice:speak prompt "be upbeat and encouraging"
# Keep it ultra-brief
/voice:speak prompt "use 5 words or less"
# Add a sign-off
/voice:speak prompt "always end with 'back to you, boss'"The custom prompt is appended as an additional instruction to the summarizer.
The scripts/say script is a standalone TTS utility that:
- Checks if the pocket-tts server is running
- Starts the server if needed (first run may take ~30-60 seconds)
- Sends text to the TTS endpoint
- Plays the generated audio
You can use the say script directly from the command line:
# Basic usage
./scripts/say "Hello, world!"
# With a specific voice
./scripts/say --voice azure "Hello, world!"
# Show help
./scripts/say --helpTTS_HOST: TTS server host (default:localhost)TTS_PORT: TTS server port (default:8000)
Disable voice feedback temporarily:
/voice:speak stop
Or uninstall the plugin entirely:
claude plugin remove voiceCheck the server log:
cat /tmp/pocket-tts-server.log- macOS: Ensure
afplayis available (built-in) - Linux: Ensure
aplayorpaplayis installed
If there's a noticeable delay before audio starts, install FFmpeg to enable streaming mode:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpegWith FFmpeg installed, audio streams directly to ffplay as it's generated,
reducing latency. Without it, the script waits for the full audio file before
playing.