Streaming
Jamie Grier | @jgrier
1
Agenda
• Goals of Lyft’s Streaming Platform
• Streaming Platform Overview
• Why Flink
• Why Kafka
• Open problems
2
Goals of Lyft’s Streaming Platform
• Make it easy to build real-time, event-driven, stateful,
microservices
• Solve the hard parts of stream processing ONCE for the entire
company
• Be a force multiplier for other teams within Lyft
3
Streaming Platform Overview
Stream Compute
Pub/Sub Streaming Pub/Sub
Service One
Streaming
Service Two
Streaming
Service Three
Stream / Schema Deployment Metrics &
Alerts Logging
Registry Tooling Dashboards
Amazon Salt
Amazon S3 Wavefront Docker
EC2 (Conifg / Orca) 4
Lyft Streaming Platform - Streaming Compute Criteria
API Considerations: Operational Considerations
● Stateful Computation and Exactly-
● Functional / Fluent API
once Processing Semantics
● Flexible Windowing API ● Robust State Management
● Event Time Support ● Data Reprocessing (backfill)
● Asynchronous Checkpoints
● Apache Beam Support
● Back-pressure
● Stream SQL
● High throughput and low-latency
● Powerful Direct API ● Deployment Architecture
● Late Data Handling
The contenders: Apache Flink, Apache Spark Streaming, Apache Kafka Streams
5
Why Flink? API Considerations
• Functional / Fluent API
• Flexible Windowing API
• Event Time Support
• Apache Beam Support
• Stream SQL
• Powerful Direct API
• Late Data Handling
6
Why Flink? Operational Considerations
• Stateful Computation and Exactly-once Processing Semantics
• Robust State Management
• Stateful Data Reprocessing (backfill)
• Asynchronous Checkpoints
• Back-pressure
• High throughput and low-latency
• Deployment Architecture
7
Lyft Streaming Platform - Pub/Sub Criteria
Semantics / Features Operational Considerations
● Write Latency
● Durability
● Read Latency
● Consumer Fanout
● Project Maturity
● Transactions / Idempotent Writes ● Vendor Support
● Per-Key Ordering Guarantees
● Long-Term Data Storage
● Auto-Scaling
The contenders: Apache Kafka, Amazon Kinesis, Pravega
8
Why Kafka?
Pros
• Durability & Write Latency
• Read Latency & Consumer Fanout
• Transactions & Idempotent Writes
• Operational Concerns & Vendor Support
Cons
• No ordering by key, only partition
• Long term data storage still an issue
• Auto-Scaling still an issue
9
Open Problems
• Rescaling Kafka while preserving per-key ordering
• Efficient Dynamic Computations over streams
• Long term storage for events: real-time and historical reads
• Zero Downtime deployments for streaming services
10
Rescaling Kafka
• Rescaling Kafka while preserving per-key ordering
• Kafka only provides partition ordering guarantees!
• We want per-key ordering guarantees
• Guarantees should hold across re-partitioning events
• Basic approach: Read old partitions completely before reading
new
• Achieve this using something akin to Flink’s checkpoint 11
Rescaling Kafka while preserving per-key ordering
12
Rescaling Kafka while preserving per-key ordering
13
Efficient Dynamic Computation Over Streams
• Enable many users to dynamically submit small streaming
computations
• Share bandwidth amongst multiple computations
• Share computed sub-results amongst multiple computations
• Correctly handle bootstrapping of computations which
depend on historical data
• Basic approach: Map any computation into a fixed/general
14
Efficient Dynamic Computations over streams
15
Efficient Dynamic Computations over streams
16
Long term storage for events: Real-time and historical reads
17
Zero Downtime deployments for streaming services
18
Summary
• Lyft is building a next generation streaming platform based
on Apache Flink and Apache Kafka
• Stateful stream processing is not a “solved problem”
• There are many hard / open problems left to solve
• If these sort of problems interest you please come join us!
We’re Hiring!
19
Thank you!
Jamie Grier
20