011 - Streaming Data System Architecture Components

A streaming data system comprises components for real-time collection, flow, processing, storage, and delivery of data. Key tools include Apache Kafka for data collection, Apache Flink for processing, and various storage solutions like HDFS and Redis. The architecture ensures low latency, scalability, and reliability, enabling effective data analysis and insights delivery.

Uploaded by

Samrat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views2 pages

011 - Streaming Data System Architecture Components

Uploaded by

Samrat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 2

### Streaming Data System Architecture Components

In a streaming data system, various components work together to enable the

continuous processing and analysis of real-time data. These components ensure the
data flows smoothly from its source to its final destination while being processed
along the way.

---

#### 1. **Collection**
This component is responsible for gathering real-time data from different sources.
The data could be coming from IoT devices, social media platforms, log files,
sensors, or user interactions.

- **Sources**:
- IoT devices, mobile applications, servers, databases, user clicks, financial
transactions, etc.
- **Tools for Data Collection**:
- **Apache Kafka**: A distributed event streaming platform for high-throughput,
low-latency data collection.
- **Amazon Kinesis**: A fully managed service for real-time data collection and
processing.
- **Fluentd/Logstash**: Used for collecting and unifying log and event data.

---

#### 2. Data Flow

Data flow involves the movement of collected data through the streaming system.
This can involve message queues, brokers, and channels that allow seamless and
real-time transport of data from one component to another.

- Data Flow Systems:

- **Message Queues**: Used to handle asynchronous data flow.
- Examples: **Apache Kafka**, **RabbitMQ**, **Google Pub/Sub**
- **Stream Brokers**: Coordinate and transport streams of data between producers
and consumers.
- Examples: **Apache Pulsar**, **Amazon Kinesis Data Streams**

- **Responsibilities**:
- Ensuring that the data reaches the processing layer in the right format.
- Handling the backpressure and reliability (guaranteeing that the messages are
delivered without loss).

---

#### 3. **Processing**
Processing is the core function of a streaming data system where real-time data is
filtered, aggregated, analyzed, and transformed into meaningful insights. Data is
continuously processed as it flows through the system.

- **Types of Processing**:
- **Stateless Processing**: Simple operations on a single data point, such as
transformations or filtering.
- **Stateful Processing**: Operations that rely on the context or history of
data, such as aggregations over a sliding window.

- **Processing Frameworks**:
- **Apache Flink**: Provides low-latency, event-time-driven processing of streams
with stateful computation.
- **Apache Storm**: Designed for distributed real-time processing of data
streams.
- **Apache Spark Streaming**: Micro-batch processing framework designed to handle
streaming data.
- **Google Dataflow**: A cloud-based, real-time processing framework.

---

#### 4. **Storage**
In streaming architectures, processed or raw data needs to be stored for further
analysis, auditing, or future use. Depending on the use case, storage could be
transient (just to enable real-time processing) or more permanent.

- **Storage Types**:
- **Distributed Storage**:
- Examples: **HDFS (Hadoop Distributed File System)**, **Amazon S3**
- Used for long-term storage of processed or raw data.
- **In-memory Databases**:
- Examples: **Redis**, **Apache Ignite**
- Used for low-latency access to recently processed or high-priority data.
- **Data Lakes**:
- Examples: **AWS Lake Formation**, **Azure Data Lake**
- Store raw, semi-structured, and unstructured data for future analysis.

---

#### 5. **Delivery**
Once data is processed, the results need to be delivered to end-users,
applications, or other systems that will act upon this information. This could
involve sending data to real-time dashboards, triggering alerts, or feeding results
into machine learning models.

- **Delivery Mechanisms**:
- **Real-Time Dashboards**:
- Tools: **Apache Superset**, **Tableau**, **Grafana**
- Present real-time metrics, insights, and visualizations.
- **Alerts and Notifications**:
- Systems: **Slack Integration**, **Email/SMS**, **PagerDuty**
- Trigger notifications when certain conditions are met.
- **Downstream Applications**:
- Processed data may be pushed to other systems such as databases (e.g.,
PostgreSQL, Elasticsearch) or even back to APIs that control other operations.
- **Machine Learning**:
- Real-time processed data can be input into predictive models to make quick
decisions.

---

### Conclusion
A well-architected streaming data system includes all the components necessary for
real-time collection, flow, processing, storage, and delivery of data. Each of
these layers plays a critical role in enabling the system to operate smoothly,
ensuring low latency, scalability, and reliability. Tools like Apache Kafka, Flink,
and real-time dashboards help facilitate these operations across the system.

011.1 - Streaming Data System Architecture Components - Collection
No ratings yet
011.1 - Streaming Data System Architecture Components - Collection
2 pages
010.4 - Streaming Data Sources
No ratings yet
010.4 - Streaming Data Sources
2 pages
009.4 - Traditional Vs Streaming Systems Data Models
No ratings yet
009.4 - Traditional Vs Streaming Systems Data Models
3 pages
011.3 - Streaming Data System Architecture Components - Processing Tier
No ratings yet
011.3 - Streaming Data System Architecture Components - Processing Tier
3 pages
014 - Distinguishing Features of Streaming Data
No ratings yet
014 - Distinguishing Features of Streaming Data
2 pages
Mining Data Streams in Data Analytics Refers To The Process of Extracting Useful Patterns
No ratings yet
Mining Data Streams in Data Analytics Refers To The Process of Extracting Useful Patterns
30 pages
009 - Streaming Data Applications
No ratings yet
009 - Streaming Data Applications
2 pages
011.5 - Streaming Data System Architecture Components - Delivery Tier
No ratings yet
011.5 - Streaming Data System Architecture Components - Delivery Tier
2 pages
Real-Time Streaming for Tech Pros
No ratings yet
Real-Time Streaming for Tech Pros
5 pages
Big Data 3rd Assignment Answers
No ratings yet
Big Data 3rd Assignment Answers
8 pages
Big Data Analytics Project Guidelines
No ratings yet
Big Data Analytics Project Guidelines
6 pages
008.2 - Real-Time and Streaming Systems
No ratings yet
008.2 - Real-Time and Streaming Systems
2 pages
Stream Processing
No ratings yet
Stream Processing
33 pages
007.2 - Big Data Systems Components
No ratings yet
007.2 - Big Data Systems Components
2 pages
Streaming Systems
No ratings yet
Streaming Systems
1 page
011.2 - Streaming Data System Architecture Components - Data Flow Tier
No ratings yet
011.2 - Streaming Data System Architecture Components - Data Flow Tier
2 pages
Chapter 1-1
No ratings yet
Chapter 1-1
34 pages
010.1 - Stream Analytics
No ratings yet
010.1 - Stream Analytics
3 pages
Unit 3-6
No ratings yet
Unit 3-6
14 pages
019 - Distributed Data Flows
No ratings yet
019 - Distributed Data Flows
3 pages
015 - Features of Real Time Architecture
No ratings yet
015 - Features of Real Time Architecture
2 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
Big Data 3rd Unit
No ratings yet
Big Data 3rd Unit
16 pages
4 Building Blocks of A Streaming Data Architecture
No ratings yet
4 Building Blocks of A Streaming Data Architecture
11 pages
Stream Processing and Analytics Handout
No ratings yet
Stream Processing and Analytics Handout
8 pages
010.3 - Stream Searching
No ratings yet
010.3 - Stream Searching
2 pages
DataStreaming L-4
No ratings yet
DataStreaming L-4
16 pages
StreamProcessingAndAnalytics Handout
No ratings yet
StreamProcessingAndAnalytics Handout
7 pages
Understanding Stream Processing Basics
No ratings yet
Understanding Stream Processing Basics
15 pages
008.3 - Batch Processing Systems Vs Streaming Data Systems
No ratings yet
008.3 - Batch Processing Systems Vs Streaming Data Systems
3 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
b0m33bdt 7p Spark Databricks Streaming - 2023 - en
No ratings yet
b0m33bdt 7p Spark Databricks Streaming - 2023 - en
50 pages
007 - Big Data Architecture Style
No ratings yet
007 - Big Data Architecture Style
3 pages
Stream Processing and Website Tracking
No ratings yet
Stream Processing and Website Tracking
2 pages
Kafka Architecture
No ratings yet
Kafka Architecture
5 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
Big Data and Hadoop Architecture Guide
No ratings yet
Big Data and Hadoop Architecture Guide
18 pages
Lec 19
No ratings yet
Lec 19
24 pages
DAV Chapter3
No ratings yet
DAV Chapter3
44 pages
T09 Data Streaming
No ratings yet
T09 Data Streaming
52 pages
Lec 19
No ratings yet
Lec 19
23 pages
009.3 - Streaming Data Use Cases
No ratings yet
009.3 - Streaming Data Use Cases
3 pages
Lec 05
No ratings yet
Lec 05
10 pages
Bigdata
No ratings yet
Bigdata
23 pages
BigData Mod2
No ratings yet
BigData Mod2
12 pages
Real Time Data Sentiment Analysis Report
No ratings yet
Real Time Data Sentiment Analysis Report
23 pages
Unit 4 Streaming Data
No ratings yet
Unit 4 Streaming Data
4 pages
Spark Streaming API Guide
No ratings yet
Spark Streaming API Guide
37 pages
BDA Unit 3
No ratings yet
BDA Unit 3
42 pages
012 - Lambda Architecture
No ratings yet
012 - Lambda Architecture
2 pages
Unit 3
No ratings yet
Unit 3
4 pages
Streaming Data
No ratings yet
Streaming Data
33 pages
Uint 4miningdatastream 230810162429 9d7c02a7
No ratings yet
Uint 4miningdatastream 230810162429 9d7c02a7
11 pages
Big Data Analytics Module 4 Mumbai University
No ratings yet
Big Data Analytics Module 4 Mumbai University
24 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Big Data Analytics Unit-2
100% (1)
Big Data Analytics Unit-2
11 pages
018 - Features of Real-Time Architecture
No ratings yet
018 - Features of Real-Time Architecture
2 pages
Big Data Architecture Guide
No ratings yet
Big Data Architecture Guide
4 pages
020.05 - Kafka Topics
No ratings yet
020.05 - Kafka Topics
3 pages
019.2 - Data Delivery Semantic
No ratings yet
019.2 - Data Delivery Semantic
3 pages
017.2 - ZooKeeper Internals
No ratings yet
017.2 - ZooKeeper Internals
6 pages
020.08 - Kafka Producers and Consumers
No ratings yet
020.08 - Kafka Producers and Consumers
4 pages
016.2 - Distributed State Management
No ratings yet
016.2 - Distributed State Management
3 pages
016.21 - Split Brain Problem
No ratings yet
016.21 - Split Brain Problem
2 pages
012.2 - Pros and Cons of Lambda Architecture
No ratings yet
012.2 - Pros and Cons of Lambda Architecture
2 pages
017 - Apache ZooKeeper
No ratings yet
017 - Apache ZooKeeper
4 pages
006.1 - Properties of Data
No ratings yet
006.1 - Properties of Data
2 pages
009.1 - Why Is Stream Processing Needed
No ratings yet
009.1 - Why Is Stream Processing Needed
2 pages
006.2 - Fact Based Model For Data
No ratings yet
006.2 - Fact Based Model For Data
2 pages
003.3 - Maintainability
No ratings yet
003.3 - Maintainability
2 pages
EC2 Makeup Old
No ratings yet
EC2 Makeup Old
10 pages
003.1 - Reliability
No ratings yet
003.1 - Reliability
2 pages
008 - Classification of Real Time Systems
No ratings yet
008 - Classification of Real Time Systems
2 pages
Ec2 Regular Old
No ratings yet
Ec2 Regular Old
14 pages
003.2 - Scalability
No ratings yet
003.2 - Scalability
3 pages
Ec2 2025
No ratings yet
Ec2 2025
1 page
CS 10 Designing Reliable Microservice
No ratings yet
CS 10 Designing Reliable Microservice
40 pages
CS 12 Deploying Microservices
No ratings yet
CS 12 Deploying Microservices
19 pages
CS 07 Communication and Transaction Management
No ratings yet
CS 07 Communication and Transaction Management
39 pages
CS 11 Securing and Testing Scalable Services
No ratings yet
CS 11 Securing and Testing Scalable Services
34 pages
10-15 Year
No ratings yet
10-15 Year
3 pages
HPC Unit 456
No ratings yet
HPC Unit 456
25 pages
RocksDB Read Optimization for Streaming
No ratings yet
RocksDB Read Optimization for Streaming
66 pages
Smart Data Boden Introduction Flink
No ratings yet
Smart Data Boden Introduction Flink
37 pages
Apache Hop
No ratings yet
Apache Hop
8 pages
AWS Data Engg Exam MCQ
No ratings yet
AWS Data Engg Exam MCQ
21 pages
15 Open-Source Data Tools That Will Dominate 2025 - by Amįń - Aug, 2025 - Medium
No ratings yet
15 Open-Source Data Tools That Will Dominate 2025 - by Amįń - Aug, 2025 - Medium
12 pages
Apache Flink® Training: Intro
No ratings yet
Apache Flink® Training: Intro
37 pages
Data Science Master's Program
No ratings yet
Data Science Master's Program
31 pages
Chapter 6 Spark and Flink Questions Answers
No ratings yet
Chapter 6 Spark and Flink Questions Answers
5 pages
Ayush Singh July 2024
No ratings yet
Ayush Singh July 2024
2 pages
Apache Flink On Confluent Cloud
No ratings yet
Apache Flink On Confluent Cloud
2 pages
Big Data - Hands-On Manual The Fastest Way To Learn Big Data! - Alvaro de Castro
No ratings yet
Big Data - Hands-On Manual The Fastest Way To Learn Big Data! - Alvaro de Castro
46 pages
Data Engg
No ratings yet
Data Engg
19 pages
Big Data Tech Guide for Organizations
No ratings yet
Big Data Tech Guide for Organizations
8 pages
Hadoop Basics for Data Science Students
No ratings yet
Hadoop Basics for Data Science Students
22 pages
Stream Processing in Big Data
No ratings yet
Stream Processing in Big Data
39 pages
Buyers Guide - Decoding The Top 4 Real-Time Data Platforms Powered by Apache Flink
No ratings yet
Buyers Guide - Decoding The Top 4 Real-Time Data Platforms Powered by Apache Flink
17 pages
Data Science Career Boost
No ratings yet
Data Science Career Boost
41 pages
Understanding ETL Updated-Edition
No ratings yet
Understanding ETL Updated-Edition
107 pages
Understanding Etl Er1
No ratings yet
Understanding Etl Er1
34 pages
Building A Real-Time E-Commerce Data Pipeline With Kafka, Flink, PostgreSQL, and Elasticsearch - by Shijun Ju - Apr, 2025 - Medium
No ratings yet
Building A Real-Time E-Commerce Data Pipeline With Kafka, Flink, PostgreSQL, and Elasticsearch - by Shijun Ju - Apr, 2025 - Medium
42 pages
Unit-1 Introduction To Big Data
No ratings yet
Unit-1 Introduction To Big Data
33 pages
Detection of Violence in Football Stadium Through Big Data Framework and Deep Learning Approach
No ratings yet
Detection of Violence in Football Stadium Through Big Data Framework and Deep Learning Approach
11 pages
Apache Zeppelin for Data Analysts
No ratings yet
Apache Zeppelin for Data Analysts
5 pages
Telemetry Data Processing and Analysis Platform Fo
No ratings yet
Telemetry Data Processing and Analysis Platform Fo
8 pages
CV Template - Scalian Benelux - FY24 - DS
No ratings yet
CV Template - Scalian Benelux - FY24 - DS
3 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
60 pages
Data Processing Systems Design Guide
No ratings yet
Data Processing Systems Design Guide
67 pages
Real Time Fraud Detection Using Apache Flink - Part 1 - by Yugen - Ai - Yugen - Ai Technology Blog - Medium
No ratings yet
Real Time Fraud Detection Using Apache Flink - Part 1 - by Yugen - Ai - Yugen - Ai Technology Blog - Medium
33 pages

011 - Streaming Data System Architecture Components

Uploaded by

011 - Streaming Data System Architecture Components

Uploaded by

### Streaming Data System Architecture Components

In a streaming data system, various components work together to enable the

#### 2. **Data Flow**

- **Data Flow Systems**:

You might also like

#### 2. Data Flow

- Data Flow Systems: