Overview

Relevant source files

Purpose and Scope

This page introduces the Pipecat AI framework, covering its core purpose, architectural philosophy, and key capabilities for building real-time conversational AI agents. It provides a conceptual map of the framework's major components and their relationships.

For installation instructions and running your first bot, see Getting Started For detailed explanations of the frame-based pipeline architecture, see Core Architecture For information about specific AI service integrations, see AI Service Integrations For transport layer details, see Transport Layer

Sources: README.md1-140 pyproject.toml1-47

What is Pipecat?

Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates the complex interactions between audio/video processing, AI services (LLMs, STT, TTS), transport protocols, and conversation state management through a composable pipeline architecture. README.md7-21

The framework enables developers to:

Build voice assistants with natural streaming conversations. README.md15
Create multimodal interfaces combining voice, video, and images. README.md17
Deploy agents across multiple platforms including WebRTC (Daily, LiveKit), WebSockets, and telephony (Twilio, Vonage). README.md27 pyproject.toml54-128
Swap AI service providers without changing application logic via pluggable integrations. README.md25
Monitor and debug pipeline execution with comprehensive observability tools like Whisker and Tail. README.md50-57

Core Design Principle: Pipecat treats everything as a stream of typed frames flowing through a pipeline of processors, where each processor performs a specific transformation or side-effect before passing frames to the next processor. src/pipecat/processors/frame_processor.py9-12

Sources: README.md7-57 pyproject.toml6-8 src/pipecat/processors/frame_processor.py7-12

Architecture Philosophy

Frame-Based Processing Pipeline

Pipecat's architecture centers on three foundational concepts:

Concept	Description	Primary Classes
Frame	Immutable data packets flowing through the pipeline.	`Frame`, `SystemFrame`, `DataFrame`, `ControlFrame`
FrameProcessor	Linked components that process frames in order.	`FrameProcessor`, `Pipeline`, `PipelineSource`, `PipelineSink`
PipelineTask	Lifecycle manager that orchestrates execution.	`PipelineTask`, `PipelineRunner`

All data and control signals—audio chunks, video frames, text tokens, LLM messages, start/stop signals, interruptions—are represented as Frame objects. src/pipecat/frames/frames.py7-12 This uniform abstraction enables consistent handling across diverse components.

Title: Frame Hierarchy and Code Entities

Frame Priority: SystemFrame objects (start, end, interruption signals) have higher processing priority than DataFrame and ControlFrame objects. This is enforced by FrameProcessorQueue, a specialized asyncio.PriorityQueue that separates system frames to ensure critical events are handled immediately. src/pipecat/processors/frame_processor.py85-133

Sources: src/pipecat/frames/frames.py59-134 src/pipecat/processors/frame_processor.py55-133

Processor Linking and Frame Flow

Title: Pipeline Execution Model and Component Relationships

Each FrameProcessor has a process_frame(frame, direction) method that handles both downstream (input to output) and upstream (output to input) frame flow. src/pipecat/processors/frame_processor.py286-302 Processors are linked via _next and _prev references, creating a doubly-linked chain. src/pipecat/processors/frame_processor.py177-178

Sources: src/pipecat/processors/frame_processor.py141-376 src/pipecat/pipeline/task.py140-187

Universal Context Management

Pipecat provides a universal LLMContext that works across all LLM providers. The context is managed by aggregator pairs:

LLMUserAggregator: Collects user input (transcriptions, images) and updates context.
LLMAssistantAggregator: Collects LLM output (text, function calls) and updates context.
LLMFullResponseAggregator: Collects complete responses between LLMFullResponseStartFrame and LLMFullResponseEndFrame. src/pipecat/processors/aggregators/llm_response.py20-38
LLMMessagesTransformFrame: Facilitates programmatically editing context in a frame-based way without race conditions. CHANGELOG.md64-73

This separation enables independent configuration of user turn detection and assistant response processing. src/pipecat/processors/aggregators/llm_response.py7-12

Sources: src/pipecat/processors/aggregators/llm_response.py1-87 src/pipecat/frames/frames.py40-46 CHANGELOG.md64-73

Key Capabilities

Multi-Service AI Integration

Pipecat integrates 60+ third-party AI services through a consistent base class hierarchy. README.md86-137 Recent updates include the addition of Inworld Realtime for cascade STT/LLM/TTS CHANGELOG.md25-27 and MistralTTSService for streaming text-to-speech. CHANGELOG.md100-105

Service Category	Base Class	Example Providers
Speech-to-Text	`STTService`	Deepgram, AssemblyAI, Gladia, Whisper, Azure, Google
Text-to-Speech	`TTSService`	ElevenLabs, Cartesia, OpenAI, Azure, Deepgram, xAI, Mistral
Large Language Models	`LLMService`	OpenAI, Anthropic, Google Gemini, Groq, xAI (Grok), Inworld
Speech-to-Speech	`LLMService` (multimodal)	OpenAI Realtime, Gemini Live, AWS Nova Sonic
Vision & Image	Various	Moondream (vision), fal, Google Image

Each service category provides runtime-configurable settings, allowing dynamic parameter updates through ServiceUpdateSettingsFrame or specific service frames like STTUpdateSettingsFrame. CHANGELOG.md113-118

Sources: README.md86-137 pyproject.toml54-128 CHANGELOG.md10-105

Transport Layer Abstraction

Title: Transport Architecture and Data Flow

Transports handle protocol-specific communication while presenting a uniform frame-based interface. BaseInputTransport handles Voice Activity Detection (VAD) and turn analysis, src/pipecat/transports/base_input.py34-40 while BaseOutputTransport manages audio mixing and chunking to ensure smooth streaming. src/pipecat/transports/base_output.py59-65

Sources: src/pipecat/transports/base_input.py34-159 src/pipecat/transports/base_output.py59-108 src/pipecat/transports/base_transport.py86-127

Real-Time Processing Features

Feature	Implementation	Key Components
Voice Activity Detection	`VADAnalyzer` (Silero, AIC, Krisp VIVA) integrated with transports.	`VADAnalyzer`, `VADParams`, `KrispVivaVadAnalyzer`
Interruption Handling	`InterruptionFrame` broadcast with task cancellation.	`InterruptionFrame`, `UninterruptibleFrame`, `InterruptionTaskFrame`
Turn Detection	Strategies for detecting user turn completion.	`VADParams.stop_secs`, `ExternalUserTurnStrategies`
Audio Filtering	Noise reduction and enhancement filters.	`AICFilter`, `RNNoiseFilter`, `KrispFilter`

Sources: src/pipecat/transports/base_input.py17-27 src/pipecat/frames/frames.py142-152 CHANGELOG.md42-48 CHANGELOG.md86-88

Observability and Metrics

The framework provides comprehensive metrics and tracing via the observer pattern:

BaseObserver: Base class for monitoring frame events like on_push_frame and on_process_frame. src/pipecat/observers/base_observer.py15
TurnTrackingObserver: Maintains conversation turn state machine. src/pipecat/pipeline/task.py44
UserBotLatencyObserver: Measures end-to-end latency per service. src/pipecat/pipeline/task.py45
TurnTraceObserver: Generates OpenTelemetry traces for performance analysis. src/pipecat/pipeline/task.py55

Sources: src/pipecat/pipeline/task.py43-56 src/pipecat/processors/frame_processor.py48-49

Framework Organization

The codebase is organized into logical modules: pyproject.toml154-155

Module Path	Purpose
`src/pipecat/frames/`	Core frame definitions (System, Data, Control).
`src/pipecat/pipeline/`	Pipeline execution and task management.
`src/pipecat/processors/`	Base `FrameProcessor` and common aggregators/filters.
`src/pipecat/services/`	60+ AI service integrations (LLM, TTS, STT, Vision).
`src/pipecat/transports/`	Network and local transport implementations (WebRTC, WebSocket).
`src/pipecat/audio/`	Audio processing, VAD, and turn detection logic.
`src/pipecat/observers/`	Monitoring, latency tracking, and tracing.

Optional Dependencies: Pipecat uses optional dependency groups to keep the core lightweight. pyproject.toml54-128 Install only what you need, e.g., pip install "pipecat-ai[daily,openai,deepgram]".

Sources: pyproject.toml54-155 README.md86-137

Next Steps

Quick Start: See Getting Started for installation and your first bot tutorial.
Architecture Deep Dive: See Core Architecture for detailed frame and pipeline mechanics.
Service Integration: See AI Service Integrations for working with LLM, TTS, and STT services.
Transport Setup: See Transport Layer for connecting to Daily, LiveKit, or WebSocket.
Advanced Topics: See Advanced Topics for function calling and custom processors.

Overview

Relevant source files

Purpose and Scope

Sources: README.md1-140 pyproject.toml1-47

What is Pipecat?

The framework enables developers to:

Build voice assistants with natural streaming conversations. README.md15
Create multimodal interfaces combining voice, video, and images. README.md17
Deploy agents across multiple platforms including WebRTC (Daily, LiveKit), WebSockets, and telephony (Twilio, Vonage). README.md27 pyproject.toml54-128
Swap AI service providers without changing application logic via pluggable integrations. README.md25
Monitor and debug pipeline execution with comprehensive observability tools like Whisker and Tail. README.md50-57

Sources: README.md7-57 pyproject.toml6-8 src/pipecat/processors/frame_processor.py7-12

Architecture Philosophy

Frame-Based Processing Pipeline

Pipecat's architecture centers on three foundational concepts:

Concept	Description	Primary Classes
Frame	Immutable data packets flowing through the pipeline.	`Frame`, `SystemFrame`, `DataFrame`, `ControlFrame`
FrameProcessor	Linked components that process frames in order.	`FrameProcessor`, `Pipeline`, `PipelineSource`, `PipelineSink`
PipelineTask	Lifecycle manager that orchestrates execution.	`PipelineTask`, `PipelineRunner`

Title: Frame Hierarchy and Code Entities

Sources: src/pipecat/frames/frames.py59-134 src/pipecat/processors/frame_processor.py55-133

Processor Linking and Frame Flow

Title: Pipeline Execution Model and Component Relationships

Sources: src/pipecat/processors/frame_processor.py141-376 src/pipecat/pipeline/task.py140-187

Universal Context Management

Pipecat provides a universal LLMContext that works across all LLM providers. The context is managed by aggregator pairs:

LLMUserAggregator: Collects user input (transcriptions, images) and updates context.
LLMAssistantAggregator: Collects LLM output (text, function calls) and updates context.
LLMFullResponseAggregator: Collects complete responses between LLMFullResponseStartFrame and LLMFullResponseEndFrame. src/pipecat/processors/aggregators/llm_response.py20-38
LLMMessagesTransformFrame: Facilitates programmatically editing context in a frame-based way without race conditions. CHANGELOG.md64-73

This separation enables independent configuration of user turn detection and assistant response processing. src/pipecat/processors/aggregators/llm_response.py7-12

Sources: src/pipecat/processors/aggregators/llm_response.py1-87 src/pipecat/frames/frames.py40-46 CHANGELOG.md64-73

Key Capabilities

Multi-Service AI Integration

Service Category	Base Class	Example Providers
Speech-to-Text	`STTService`	Deepgram, AssemblyAI, Gladia, Whisper, Azure, Google
Text-to-Speech	`TTSService`	ElevenLabs, Cartesia, OpenAI, Azure, Deepgram, xAI, Mistral
Large Language Models	`LLMService`	OpenAI, Anthropic, Google Gemini, Groq, xAI (Grok), Inworld
Speech-to-Speech	`LLMService` (multimodal)	OpenAI Realtime, Gemini Live, AWS Nova Sonic
Vision & Image	Various	Moondream (vision), fal, Google Image

Sources: README.md86-137 pyproject.toml54-128 CHANGELOG.md10-105

Transport Layer Abstraction

Title: Transport Architecture and Data Flow

Sources: src/pipecat/transports/base_input.py34-159 src/pipecat/transports/base_output.py59-108 src/pipecat/transports/base_transport.py86-127

Real-Time Processing Features

Feature	Implementation	Key Components
Voice Activity Detection	`VADAnalyzer` (Silero, AIC, Krisp VIVA) integrated with transports.	`VADAnalyzer`, `VADParams`, `KrispVivaVadAnalyzer`
Interruption Handling	`InterruptionFrame` broadcast with task cancellation.	`InterruptionFrame`, `UninterruptibleFrame`, `InterruptionTaskFrame`
Turn Detection	Strategies for detecting user turn completion.	`VADParams.stop_secs`, `ExternalUserTurnStrategies`
Audio Filtering	Noise reduction and enhancement filters.	`AICFilter`, `RNNoiseFilter`, `KrispFilter`

Sources: src/pipecat/transports/base_input.py17-27 src/pipecat/frames/frames.py142-152 CHANGELOG.md42-48 CHANGELOG.md86-88

Observability and Metrics

The framework provides comprehensive metrics and tracing via the observer pattern:

BaseObserver: Base class for monitoring frame events like on_push_frame and on_process_frame. src/pipecat/observers/base_observer.py15
TurnTrackingObserver: Maintains conversation turn state machine. src/pipecat/pipeline/task.py44
UserBotLatencyObserver: Measures end-to-end latency per service. src/pipecat/pipeline/task.py45
TurnTraceObserver: Generates OpenTelemetry traces for performance analysis. src/pipecat/pipeline/task.py55

Sources: src/pipecat/pipeline/task.py43-56 src/pipecat/processors/frame_processor.py48-49

Framework Organization

The codebase is organized into logical modules: pyproject.toml154-155

Module Path	Purpose
`src/pipecat/frames/`	Core frame definitions (System, Data, Control).
`src/pipecat/pipeline/`	Pipeline execution and task management.
`src/pipecat/processors/`	Base `FrameProcessor` and common aggregators/filters.
`src/pipecat/services/`	60+ AI service integrations (LLM, TTS, STT, Vision).
`src/pipecat/transports/`	Network and local transport implementations (WebRTC, WebSocket).
`src/pipecat/audio/`	Audio processing, VAD, and turn detection logic.
`src/pipecat/observers/`	Monitoring, latency tracking, and tracing.

Sources: pyproject.toml54-155 README.md86-137

Next Steps

Quick Start: See Getting Started for installation and your first bot tutorial.
Architecture Deep Dive: See Core Architecture for detailed frame and pipeline mechanics.
Service Integration: See AI Service Integrations for working with LLM, TTS, and STT services.
Transport Setup: See Transport Layer for connecting to Daily, LiveKit, or WebSocket.
Advanced Topics: See Advanced Topics for function calling and custom processors.

Overview

Purpose and Scope

What is Pipecat?

Architecture Philosophy

Frame-Based Processing Pipeline

Processor Linking and Frame Flow

Universal Context Management

Key Capabilities

Multi-Service AI Integration

Transport Layer Abstraction

Real-Time Processing Features

Observability and Metrics

Framework Organization

Next Steps

On this page

Overview

Purpose and Scope

What is Pipecat?

Architecture Philosophy

Frame-Based Processing Pipeline

Processor Linking and Frame Flow

Universal Context Management

Key Capabilities

Multi-Service AI Integration

Transport Layer Abstraction

Real-Time Processing Features

Observability and Metrics

Framework Organization

Next Steps

On this page