0% found this document useful (0 votes)

91 views20 pages

Stream Processing at Lyft

Lyft has built a streaming platform using Apache Flink for stream processing and Apache Kafka for messaging. The goals were to make building real-time microservices easy and solve stream processing problems once for the company. Some open problems discussed include efficiently rescaling Kafka while preserving per-key ordering, enabling dynamic computations over streams, long-term storage for real-time and historical data access, and achieving zero downtime deployments for streaming services. Lyft is still working on these challenging problems and is hiring for engineers interested in streaming systems.

Uploaded by

Jamie Grier

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views20 pages

Stream Processing at Lyft

Uploaded by

Jamie Grier

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Streaming

Jamie Grier | @jgrier

1
Agenda

• Goals of Lyft’s Streaming Platform

• Streaming Platform Overview

• Why Flink

• Why Kafka

• Open problems

2
Goals of Lyft’s Streaming Platform

• Make it easy to build real-time, event-driven, stateful,

microservices

• Solve the hard parts of stream processing ONCE for the entire
company

• Be a force multiplier for other teams within Lyft

3
Streaming Platform Overview
Stream Compute
Pub/Sub Streaming Pub/Sub
Service One

Streaming
Service Two

Streaming
Service Three

Stream / Schema Deployment Metrics &

Alerts Logging
Registry Tooling Dashboards

Amazon Salt
Amazon S3 Wavefront Docker
EC2 (Conifg / Orca) 4
Lyft Streaming Platform - Streaming Compute Criteria

API Considerations: Operational Considerations

● Stateful Computation and Exactly-
● Functional / Fluent API
once Processing Semantics
● Flexible Windowing API ● Robust State Management
● Event Time Support ● Data Reprocessing (backfill)
● Asynchronous Checkpoints
● Apache Beam Support
● Back-pressure
● Stream SQL
● High throughput and low-latency
● Powerful Direct API ● Deployment Architecture
● Late Data Handling

The contenders: Apache Flink, Apache Spark Streaming, Apache Kafka Streams
5
Why Flink? API Considerations
• Functional / Fluent API
• Flexible Windowing API
• Event Time Support
• Apache Beam Support
• Stream SQL
• Powerful Direct API
• Late Data Handling

6
Why Flink? Operational Considerations

• Stateful Computation and Exactly-once Processing Semantics

• Robust State Management
• Stateful Data Reprocessing (backfill)
• Asynchronous Checkpoints
• Back-pressure
• High throughput and low-latency
• Deployment Architecture

7
Lyft Streaming Platform - Pub/Sub Criteria

Semantics / Features Operational Considerations

● Write Latency
● Durability
● Read Latency
● Consumer Fanout
● Project Maturity
● Transactions / Idempotent Writes ● Vendor Support
● Per-Key Ordering Guarantees
● Long-Term Data Storage
● Auto-Scaling

The contenders: Apache Kafka, Amazon Kinesis, Pravega

8
Why Kafka?
Pros
• Durability & Write Latency
• Read Latency & Consumer Fanout
• Transactions & Idempotent Writes
• Operational Concerns & Vendor Support
Cons
• No ordering by key, only partition
• Long term data storage still an issue
• Auto-Scaling still an issue

9
Open Problems

• Rescaling Kafka while preserving per-key ordering

• Efficient Dynamic Computations over streams

• Long term storage for events: real-time and historical reads

• Zero Downtime deployments for streaming services

10
Rescaling Kafka

• Rescaling Kafka while preserving per-key ordering

• Kafka only provides partition ordering guarantees!

• We want per-key ordering guarantees

• Guarantees should hold across re-partitioning events

• Basic approach: Read old partitions completely before reading

new

• Achieve this using something akin to Flink’s checkpoint 11

Rescaling Kafka while preserving per-key ordering

12
Rescaling Kafka while preserving per-key ordering

13
Efficient Dynamic Computation Over Streams

• Enable many users to dynamically submit small streaming

computations

• Share bandwidth amongst multiple computations

• Share computed sub-results amongst multiple computations

• Correctly handle bootstrapping of computations which

depend on historical data

• Basic approach: Map any computation into a fixed/general

14
Efficient Dynamic Computations over streams

15
Efficient Dynamic Computations over streams

16
Long term storage for events: Real-time and historical reads

17
Zero Downtime deployments for streaming services

18
Summary

• Lyft is building a next generation streaming platform based

on Apache Flink and Apache Kafka

• Stateful stream processing is not a “solved problem”

• There are many hard / open problems left to solve

• If these sort of problems interest you please come join us!

We’re Hiring!
19
Thank you!
Jamie Grier

Apache Flink for Big Data Experts
No ratings yet
Apache Flink for Big Data Experts
68 pages
Senior Big Data Engineer Profile
No ratings yet
Senior Big Data Engineer Profile
6 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
CB Queryoptimization 01
No ratings yet
CB Queryoptimization 01
78 pages
Load Unstructured Data into Hive with PySpark
No ratings yet
Load Unstructured Data into Hive with PySpark
9 pages
Hadoop YARN Security and Kubernetes Integration
No ratings yet
Hadoop YARN Security and Kubernetes Integration
63 pages
Unit 3 - IoT-new
No ratings yet
Unit 3 - IoT-new
31 pages
Overview of Apache Druid Architecture
No ratings yet
Overview of Apache Druid Architecture
12 pages
AWS Cloud Storage Use Cases Guide
No ratings yet
AWS Cloud Storage Use Cases Guide
12 pages
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
No ratings yet
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
40 pages
Dokumen - Pub - Service Mesh Patterns 9781492086451 9781492086383
No ratings yet
Dokumen - Pub - Service Mesh Patterns 9781492086451 9781492086383
42 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Data Lake Implementation Improved Processing Time by 4X
No ratings yet
Data Lake Implementation Improved Processing Time by 4X
5 pages
Apache Spark 2.3: Key Updates
No ratings yet
Apache Spark 2.3: Key Updates
57 pages
Near Real-Time Big Data Processing
No ratings yet
Near Real-Time Big Data Processing
59 pages
Databricks Certified Data Engineer Associate Course V2 Release
No ratings yet
Databricks Certified Data Engineer Associate Course V2 Release
300 pages
Slide 3 Hadoop MapReduce Tutorial
No ratings yet
Slide 3 Hadoop MapReduce Tutorial
119 pages
Smart Home NLP for Non-Tech Users
No ratings yet
Smart Home NLP for Non-Tech Users
9 pages
FLUME
No ratings yet
FLUME
31 pages
CCA175 Demo Examenes
No ratings yet
CCA175 Demo Examenes
19 pages
HDFS Internals for Developers
No ratings yet
HDFS Internals for Developers
30 pages
Cloudera Kudu
100% (1)
Cloudera Kudu
102 pages
Trivago Pipeline
No ratings yet
Trivago Pipeline
18 pages
De Mod 5 Deploy Workloads With Databricks Workflows
No ratings yet
De Mod 5 Deploy Workloads With Databricks Workflows
19 pages
Ambari Operations
No ratings yet
Ambari Operations
194 pages
2016 05 10 Apache Nifi Deep Dive 160511170654
No ratings yet
2016 05 10 Apache Nifi Deep Dive 160511170654
34 pages
Spark
No ratings yet
Spark
160 pages
05.azure Data Lake Authentication
No ratings yet
05.azure Data Lake Authentication
16 pages
Why Do You Need Apache Iceberg
No ratings yet
Why Do You Need Apache Iceberg
10 pages
Presto for Big Data Analytics in Cloud
100% (1)
Presto for Big Data Analytics in Cloud
31 pages
Handbook Version Confluent Exercise
No ratings yet
Handbook Version Confluent Exercise
160 pages
FastAPI Prometheus Integration Guide
No ratings yet
FastAPI Prometheus Integration Guide
15 pages
Big Data and Visualization
No ratings yet
Big Data and Visualization
141 pages
Spring Cloud Dataflow Reference
No ratings yet
Spring Cloud Dataflow Reference
130 pages
Data Stream Processing Insights
No ratings yet
Data Stream Processing Insights
67 pages
Train With Shubham Syllabus
No ratings yet
Train With Shubham Syllabus
61 pages
Installing and Using Impala
No ratings yet
Installing and Using Impala
248 pages
Amazon Ec2
No ratings yet
Amazon Ec2
9 pages
Slide 13 - Kafka
No ratings yet
Slide 13 - Kafka
109 pages
Student Handbook Version 5.5.0-V1.1.0
No ratings yet
Student Handbook Version 5.5.0-V1.1.0
160 pages
Snowflake Architecture Guide
No ratings yet
Snowflake Architecture Guide
18 pages
AWS & Devops BOOKLET
No ratings yet
AWS & Devops BOOKLET
39 pages
01-Docker - 02 - Install Docker Desktop On Windows
No ratings yet
01-Docker - 02 - Install Docker Desktop On Windows
6 pages
Mastering Azure Databricks Day-5
No ratings yet
Mastering Azure Databricks Day-5
9 pages
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
No ratings yet
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
39 pages
07 - Ingesting New Datasets Into Google BigQuery
No ratings yet
07 - Ingesting New Datasets Into Google BigQuery
8 pages
DevOps Glossary for IT Professionals
No ratings yet
DevOps Glossary for IT Professionals
17 pages
Neo4j-Manual-2 0 1
No ratings yet
Neo4j-Manual-2 0 1
593 pages
NoSQL for Data Engineers
No ratings yet
NoSQL for Data Engineers
144 pages
Data Engineering Expert Profile
No ratings yet
Data Engineering Expert Profile
1 page
ETL Process Overview in Agriculture
100% (1)
ETL Process Overview in Agriculture
42 pages
Relational Databases for CS Students
No ratings yet
Relational Databases for CS Students
111 pages
Big Data Architecture Overview
No ratings yet
Big Data Architecture Overview
8 pages
Databricks Widgets Overview and Usage
No ratings yet
Databricks Widgets Overview and Usage
13 pages
Kafka Streams for Data Engineers
100% (1)
Kafka Streams for Data Engineers
93 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
Databricks Performance Optimization
No ratings yet
Databricks Performance Optimization
94 pages
ITHome - Deep Dive Into Apache Flink - Gordon
No ratings yet
ITHome - Deep Dive Into Apache Flink - Gordon
44 pages
7 - Streaming 2 - Calcite
No ratings yet
7 - Streaming 2 - Calcite
45 pages
Streaming Data and Stream Processing With Apache Kafka ™: David Tucker, Director of Partner Engineering
No ratings yet
Streaming Data and Stream Processing With Apache Kafka ™: David Tucker, Director of Partner Engineering
44 pages
US Army Ranger Handbook 2011 Edition
100% (11)
US Army Ranger Handbook 2011 Edition
357 pages
Command Hospital Air Force: Test Name Sample Type Result UOM Biological Ref Range
No ratings yet
Command Hospital Air Force: Test Name Sample Type Result UOM Biological Ref Range
4 pages
Ablerex Power Solutions Overview
No ratings yet
Ablerex Power Solutions Overview
38 pages
DIAL 4 Overview Brochure
No ratings yet
DIAL 4 Overview Brochure
8 pages
250mhz PLL
No ratings yet
250mhz PLL
4 pages
Industrial Training for Accounting Students
100% (1)
Industrial Training for Accounting Students
37 pages
Lazada
No ratings yet
Lazada
37 pages
KZN Pre Preparatotory p2 August 2024 Gr. 12 Maths - Marking Guideline
No ratings yet
KZN Pre Preparatotory p2 August 2024 Gr. 12 Maths - Marking Guideline
13 pages
Maritime Employment Dispute
100% (1)
Maritime Employment Dispute
2 pages
Internet Domain Name System
No ratings yet
Internet Domain Name System
5 pages
rlg260h d16
No ratings yet
rlg260h d16
7 pages
Sunni-Shia Debate Analysis
No ratings yet
Sunni-Shia Debate Analysis
49 pages
9 Grade Ii Grammar Zone
No ratings yet
9 Grade Ii Grammar Zone
2 pages
CBSE Sample Papers Class 11 History 2023-24 - MyCBSEguide
No ratings yet
CBSE Sample Papers Class 11 History 2023-24 - MyCBSEguide
33 pages
Data Encoding and Compression Techniques
No ratings yet
Data Encoding and Compression Techniques
3 pages
Radiocarbon Dating Accuracy
100% (1)
Radiocarbon Dating Accuracy
29 pages
Vector Calculus Review
No ratings yet
Vector Calculus Review
45 pages
3 Testing and Repairing WiringElectrical System PDF
100% (1)
3 Testing and Repairing WiringElectrical System PDF
44 pages
How To Deal With Difficult People
No ratings yet
How To Deal With Difficult People
45 pages
Modern Management Theory:: Quantitative, System and Contingency Approaches To Management
No ratings yet
Modern Management Theory:: Quantitative, System and Contingency Approaches To Management
8 pages
Critical Essay - Baj-1a Greviemhar DG
No ratings yet
Critical Essay - Baj-1a Greviemhar DG
6 pages
Hotel Employee Stress Study Proposal
No ratings yet
Hotel Employee Stress Study Proposal
14 pages
MR Nobody
No ratings yet
MR Nobody
1 page
Estimation and Hypothesis Testing
100% (1)
Estimation and Hypothesis Testing
11 pages
Introducti PsychologyMajorIstSemSyllabus1
No ratings yet
Introducti PsychologyMajorIstSemSyllabus1
1 page
CCHF Guidelines for Pakistan 2013
No ratings yet
CCHF Guidelines for Pakistan 2013
6 pages
Editorial Writing Guide
No ratings yet
Editorial Writing Guide
34 pages
1st Quarter Exam On Philo of The Human Person
No ratings yet
1st Quarter Exam On Philo of The Human Person
5 pages
Keesing, R. 1990. Therories of Culture Revisited PDF
No ratings yet
Keesing, R. 1990. Therories of Culture Revisited PDF
15 pages
C Care Uk History
No ratings yet
C Care Uk History
10 pages

Stream Processing at Lyft

Uploaded by

Stream Processing at Lyft

Uploaded by

Streaming

Jamie Grier | @jgrier

• Goals of Lyft’s Streaming Platform

• Streaming Platform Overview

• Make it easy to build real-time, event-driven, stateful,

• Be a force multiplier for other teams within Lyft

Stream / Schema Deployment Metrics &

API Considerations: Operational Considerations

• Stateful Computation and Exactly-once Processing Semantics

Semantics / Features Operational Considerations

The contenders: Apache Kafka, Amazon Kinesis, Pravega

• Rescaling Kafka while preserving per-key ordering

• Efficient Dynamic Computations over streams

• Long term storage for events: real-time and historical reads

• Zero Downtime deployments for streaming services

• Rescaling Kafka while preserving per-key ordering

• Kafka only provides partition ordering guarantees!

• We want per-key ordering guarantees

• Guarantees should hold across re-partitioning events

• Basic approach: Read old partitions completely before reading

• Achieve this using something akin to Flink’s checkpoint 11

• Enable many users to dynamically submit small streaming

• Share bandwidth amongst multiple computations

• Share computed sub-results amongst multiple computations

• Correctly handle bootstrapping of computations which

• Basic approach: Map any computation into a fixed/general

• Lyft is building a next generation streaming platform based

• Stateful stream processing is not a “solved problem”

• There are many hard / open problems left to solve

• If these sort of problems interest you please come join us!

You might also like