Pipecat TTS Cache is a lightweight caching layer for the Pipecat ecosystem. It transparently wraps existing TTS services to eliminate API costs for repeated phrases and reduce response latency to <5ms.
See it in action: Watch the Demo Video
- Ultra-Low Latency β Delivers cached audio in ~0.1ms (Memory) or ~1-5ms (Redis).
- Cost Reduction β Stop paying your TTS provider for common phrases like "Hello," "One moment," or "I didn't catch that."
- Universal Compatibility β Works as a Mixin with all Pipecat TTS services (Cartesia, ElevenLabs, Deepgram, Google, etc.).
- Smart Interruption β Automatically clears pending cache tasks and resets state when users interrupt the bot.
- Precision Alignment β Preserves word-level timestamps for perfect lip-syncing and subtitles, even on cached replays.
# Standard installation (Memory backend only)
pip install pipecat-tts-cache
# Production installation (with Redis support)
pip install "pipecat-tts-cache[redis]"
The caching layer intelligently handles different TTS architectures to ensure smooth playback regardless of the provider.
| Service Type | Caching Strategy | Supported Providers (Examples) |
|---|---|---|
| AudioContextWordTTS | Batch Caching Splits audio at word boundaries and caches individual sentences. |
Cartesia, Rime |
| WordTTSService | Full Caching w/ Timestamps Caches the full response and preserves alignment data. |
ElevenLabs, Hume |
| TTSService | Standard Caching Caches the full audio response (no alignment data). |
Google, OpenAI, Deepgram (HTTP) |
| InterruptibleTTS | Sentence Caching Caches single-sentence responses only. |
Sarvam, Deepgram (WebSocket) |
The MemoryCacheBackend is perfect for local development or single-process bots. It uses an LRU (Least Recently Used) eviction policy.
from pipecat_tts_cache import TTSCacheMixin, MemoryCacheBackend
from pipecat.services.google.tts import GoogleHttpTTSService
# 1. Create a cached class using the Mixin
class CachedGoogleTTS(TTSCacheMixin, GoogleHttpTTSService):
pass
# 2. Initialize with memory backend
tts = CachedGoogleTTS(
voice_id="en-US-Chirp3-HD-Charon",
cache_backend=MemoryCacheBackend(max_size=1000),
cache_ttl=86400, # Cache for 24 hours
)For production deployments, use RedisCacheBackend. This allows the cache to persist across restarts and be shared among multiple bot instances.
from pipecat_tts_cache.backends import RedisCacheBackend
tts = CachedGoogleTTS(
voice_id="en-US-Chirp3-HD-Charon",
cache_backend=RedisCacheBackend(
redis_url="redis://localhost:6379/0",
key_prefix="pipecat:tts:",
),
cache_ttl=604800, # Cache for 1 week
)The system utilizes a Frame Interception Architecture to seamlessly integrate with the Pipecat pipeline:
- Deterministic Key Gen: Before requesting audio, a unique key is generated based on the normalized text, voice ID, model, speed, and pitch. Sensitive data (API keys) is excluded.
- Cache Check (
run_tts):
- Hit: The system immediately pushes cached audio frames and timestamps to the pipeline.
- Miss: The system calls the parent TTS service.
- Collection (
push_frame): As the parent service generates audio, the Mixin intercepts the frames, aggregates them, and stores them in the backend for future use.
When an InterruptionFrame is received, the cache mixin immediately:
- Clears all pending cache write tasks.
- Resets the internal batch state.
- Ensures no partial or cut-off audio is committed to the pipeline.
You can monitor cache performance or clear entries programmatically.
# Check performance
stats = await tts.get_cache_stats()
print(f"Hit Rate: {stats['hit_rate']:.1%}")
print(f"Total Saved Calls: {stats['hits']}")
# Maintenance
await tts.clear_cache() # Clear all
await tts.clear_cache(namespace="user_123") # Clear specific namespace| Metric | Direct API | Memory Cache | Redis Cache |
|---|---|---|---|
| Latency | 200ms - 1500ms | ~0.1ms | ~2ms |
| Cost | $ per character | $0 | $0 |
| Consistency | Variable | Deterministic | Deterministic |
# Install with example dependencies
pip install "pipecat-tts-cache[examples]"
# Optional: Install with Redis support
pip install "pipecat-tts-cache[examples,redis]"
# Set environment variables
export DEEPGRAM_API_KEY=your_key
export CARTESIA_API_KEY=your_key
export GOOGLE_API_KEY=your_key
# Optional: For Redis backend
export USE_REDIS_CACHE=true
export REDIS_URL=redis://localhost:6379/0# Start the bot server
python examples/basic_caching.py --host 0.0.0.0 --port 7860
# Connect via Daily Bots or your Daily room# Run with local WebRTC transport
python examples/basic_caching.py -t webrtc --host localhost --port 8765| Pipecat Version | Status |
|---|---|
| v0.0.91+ | β Tested |
β‘οΈ Reach out via mail
β‘οΈ Connect on LinkedIn
