This page introduces the Pipecat AI framework, covering its core purpose, architectural philosophy, and key capabilities for building real-time conversational AI agents. It provides a conceptual map of the framework's major components and their relationships.
For installation instructions and running your first bot, see Getting Started For detailed explanations of the frame-based pipeline architecture, see Core Architecture For information about specific AI service integrations, see AI Service Integrations For transport layer details, see Transport Layer
Sources: README.md1-140 pyproject.toml1-47
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates the complex interactions between audio/video processing, AI services (LLMs, STT, TTS), transport protocols, and conversation state management through a composable pipeline architecture. README.md7-21
The framework enables developers to:
Core Design Principle: Pipecat treats everything as a stream of typed frames flowing through a pipeline of processors, where each processor performs a specific transformation or side-effect before passing frames to the next processor. src/pipecat/processors/frame_processor.py9-12
Sources: README.md7-57 pyproject.toml6-8 src/pipecat/processors/frame_processor.py7-12
Pipecat's architecture centers on three foundational concepts:
| Concept | Description | Primary Classes |
|---|---|---|
| Frame | Immutable data packets flowing through the pipeline. | Frame, SystemFrame, DataFrame, ControlFrame |
| FrameProcessor | Linked components that process frames in order. | FrameProcessor, Pipeline, PipelineSource, PipelineSink |
| PipelineTask | Lifecycle manager that orchestrates execution. | PipelineTask, PipelineRunner |
All data and control signals—audio chunks, video frames, text tokens, LLM messages, start/stop signals, interruptions—are represented as Frame objects. src/pipecat/frames/frames.py7-12 This uniform abstraction enables consistent handling across diverse components.
Title: Frame Hierarchy and Code Entities
Frame Priority: SystemFrame objects (start, end, interruption signals) have higher processing priority than DataFrame and ControlFrame objects. This is enforced by FrameProcessorQueue, a specialized asyncio.PriorityQueue that separates system frames to ensure critical events are handled immediately. src/pipecat/processors/frame_processor.py85-133
Sources: src/pipecat/frames/frames.py59-134 src/pipecat/processors/frame_processor.py55-133
Title: Pipeline Execution Model and Component Relationships
Each FrameProcessor has a process_frame(frame, direction) method that handles both downstream (input to output) and upstream (output to input) frame flow. src/pipecat/processors/frame_processor.py286-302 Processors are linked via _next and _prev references, creating a doubly-linked chain. src/pipecat/processors/frame_processor.py177-178
Sources: src/pipecat/processors/frame_processor.py141-376 src/pipecat/pipeline/task.py140-187
Pipecat provides a universal LLMContext that works across all LLM providers. The context is managed by aggregator pairs:
LLMUserAggregator: Collects user input (transcriptions, images) and updates context.LLMAssistantAggregator: Collects LLM output (text, function calls) and updates context.LLMFullResponseAggregator: Collects complete responses between LLMFullResponseStartFrame and LLMFullResponseEndFrame. src/pipecat/processors/aggregators/llm_response.py20-38LLMMessagesTransformFrame: Facilitates programmatically editing context in a frame-based way without race conditions. CHANGELOG.md64-73This separation enables independent configuration of user turn detection and assistant response processing. src/pipecat/processors/aggregators/llm_response.py7-12
Sources: src/pipecat/processors/aggregators/llm_response.py1-87 src/pipecat/frames/frames.py40-46 CHANGELOG.md64-73
Pipecat integrates 60+ third-party AI services through a consistent base class hierarchy. README.md86-137 Recent updates include the addition of Inworld Realtime for cascade STT/LLM/TTS CHANGELOG.md25-27 and MistralTTSService for streaming text-to-speech. CHANGELOG.md100-105
| Service Category | Base Class | Example Providers |
|---|---|---|
| Speech-to-Text | STTService | Deepgram, AssemblyAI, Gladia, Whisper, Azure, Google |
| Text-to-Speech | TTSService | ElevenLabs, Cartesia, OpenAI, Azure, Deepgram, xAI, Mistral |
| Large Language Models | LLMService | OpenAI, Anthropic, Google Gemini, Groq, xAI (Grok), Inworld |
| Speech-to-Speech | LLMService (multimodal) | OpenAI Realtime, Gemini Live, AWS Nova Sonic |
| Vision & Image | Various | Moondream (vision), fal, Google Image |
Each service category provides runtime-configurable settings, allowing dynamic parameter updates through ServiceUpdateSettingsFrame or specific service frames like STTUpdateSettingsFrame. CHANGELOG.md113-118
Sources: README.md86-137 pyproject.toml54-128 CHANGELOG.md10-105
Title: Transport Architecture and Data Flow
Transports handle protocol-specific communication while presenting a uniform frame-based interface. BaseInputTransport handles Voice Activity Detection (VAD) and turn analysis, src/pipecat/transports/base_input.py34-40 while BaseOutputTransport manages audio mixing and chunking to ensure smooth streaming. src/pipecat/transports/base_output.py59-65
Sources: src/pipecat/transports/base_input.py34-159 src/pipecat/transports/base_output.py59-108 src/pipecat/transports/base_transport.py86-127
| Feature | Implementation | Key Components |
|---|---|---|
| Voice Activity Detection | VADAnalyzer (Silero, AIC, Krisp VIVA) integrated with transports. | VADAnalyzer, VADParams, KrispVivaVadAnalyzer |
| Interruption Handling | InterruptionFrame broadcast with task cancellation. | InterruptionFrame, UninterruptibleFrame, InterruptionTaskFrame |
| Turn Detection | Strategies for detecting user turn completion. | VADParams.stop_secs, ExternalUserTurnStrategies |
| Audio Filtering | Noise reduction and enhancement filters. | AICFilter, RNNoiseFilter, KrispFilter |
Sources: src/pipecat/transports/base_input.py17-27 src/pipecat/frames/frames.py142-152 CHANGELOG.md42-48 CHANGELOG.md86-88
The framework provides comprehensive metrics and tracing via the observer pattern:
BaseObserver: Base class for monitoring frame events like on_push_frame and on_process_frame. src/pipecat/observers/base_observer.py15TurnTrackingObserver: Maintains conversation turn state machine. src/pipecat/pipeline/task.py44UserBotLatencyObserver: Measures end-to-end latency per service. src/pipecat/pipeline/task.py45TurnTraceObserver: Generates OpenTelemetry traces for performance analysis. src/pipecat/pipeline/task.py55Sources: src/pipecat/pipeline/task.py43-56 src/pipecat/processors/frame_processor.py48-49
The codebase is organized into logical modules: pyproject.toml154-155
| Module Path | Purpose |
|---|---|
src/pipecat/frames/ | Core frame definitions (System, Data, Control). |
src/pipecat/pipeline/ | Pipeline execution and task management. |
src/pipecat/processors/ | Base FrameProcessor and common aggregators/filters. |
src/pipecat/services/ | 60+ AI service integrations (LLM, TTS, STT, Vision). |
src/pipecat/transports/ | Network and local transport implementations (WebRTC, WebSocket). |
src/pipecat/audio/ | Audio processing, VAD, and turn detection logic. |
src/pipecat/observers/ | Monitoring, latency tracking, and tracing. |
Optional Dependencies: Pipecat uses optional dependency groups to keep the core lightweight. pyproject.toml54-128 Install only what you need, e.g., pip install "pipecat-ai[daily,openai,deepgram]".
Sources: pyproject.toml54-155 README.md86-137
Refresh this wiki
This wiki was recently refreshed. Please wait 3 days to refresh again.