0% found this document useful (0 votes)
38 views33 pages

7-Data and Analytics For IoT

The document discusses data analytics for the Internet of Things (IoT), highlighting the challenges of managing massive data generated by sensors. It covers various analytics types, tools like Hadoop and Apache Spark, and the importance of edge and network analytics for real-time processing and insights. Key concepts include structured vs. unstructured data, data in motion vs. data at rest, and the benefits of edge streaming analytics.

Uploaded by

studytutor2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views33 pages

7-Data and Analytics For IoT

The document discusses data analytics for the Internet of Things (IoT), highlighting the challenges of managing massive data generated by sensors. It covers various analytics types, tools like Hadoop and Apache Spark, and the importance of edge and network analytics for real-time processing and insights. Key concepts include structured vs. unstructured data, data in motion vs. data at rest, and the benefits of edge streaming analytics.

Uploaded by

studytutor2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

7 Data and Analytics for IoT

Yuemin Ding
Tecnun School of Engineering
University of Navarra

1
Outlines
• An Introduction to Data Analytics for IoT
• Big Data Analytics Tools and Technology
• Edge Streaming Analytics
• Network Analytics

2
Data Analytics for IoT
• In IoT, the creation of massive amounts of data
from sensors is one of the biggest challenges.
▪ Modern jet engines are fitted with thousands of sensors
that generate 10GB of data per second.
▪ A twin-engine commercial aircraft with these engines
operating on average 8 hours a day will generate over
500 TB of data daily.

3
Structured v.s. Unstructured Data
• Structured data means that the data follows a
model or schema that defines how the data is
represented or organized.
• Unstructured data lacks a logical schema for
understanding and decoding the data through
traditional programming means.

4
Structured v.s. Unstructured Data
• Structured data and unstructured data require
different toolsets for analysis.
• Around 80% of a business’s data is unstructured.

5
Data in Motion Versus Data at Rest
• Data in IoT networks is either in transit (“data in
motion”) or being held or stored (“data at rest”).
▪ Data in motion include traditional client/server
exchanges, such as web browsing and file transfers, and
email.
▪ Data saved to a hard drive, storage array, or USB drive
is data at rest.

• Data in motion → real-time processing and


analysis → Spark, Storm, and Flink, etc.
• Data at rest → huge volume → Hadoop.

6
IoT Data Analytics Overview
• Data analysis is typically broken down by the
types of results that are produced:
▪ Descriptive: tells what is happening, either now or in the
past.

7
IoT Data Analytics Overview
• Data analysis is typically broken down by the
types of results that are produced:
▪ Diagnostic: provides the answer when you are
interested in the “why”.

8
IoT Data Analytics Overview
• Data analysis is typically broken down by the
types of results that are produced:
▪ Predictive: foretells problems or issues before they occur

9
IoT Data Analytics Overview
• Data analysis is typically broken down by the
types of results that are produced:
▪ Prescriptive: recommends solutions for upcoming
problems.

10
Outlines
• An Introduction to Data Analytics for IoT
• Big Data Analytics Tools and Technology
• Edge Streaming Analytics
• Network Analytics

11
Big Data Analytics Tools and Technology
• ‘Three Vs’ to categorize big data:
▪ Velocity refers to how quickly data is being collected
and analyzed.
▪ Variety refers to different types of data.
▪ Volume refers to the scale of the data.
• Over time, other Vs have been added to big data

12
Massively Parallel Processing Databases
• Massively parallel processing (MPP) databases
were built on the concept of the relational data
warehouses
• MPP are designed to be much faster, to be
efficient, and to support reduced query times.

13
Hadoop
• Hadoop was originally developed as a result of
projects at Google and Yahoo!
• The project had two key elements:
▪ Hadoop Distributed File System (HDFS): A system
for storing data across multiple nodes
▪ MapReduce: A distributed processing engine that splits
a large task into smaller ones that can be run in parallel

14
Hadoop
• Hadoop takes advantage of a distributed
architecture to store and process massive
amounts of data and can leverage resources
from all nodes in the cluster.

15
Hadoop
• NameNodes: They coordinate where the data is stored
and maintain a map of where each block of data is stored
and where it is replicated.
• DataNodes: These are the servers where the data are
stored.

16
Hadoop
• Drawback:
▪ MapReduce breaks down a query into smaller tasks,
which is useful for the analysis of historical data.
▪ Depending on how much data is being queried and the
complexity of the query, the result could take seconds
or minutes to return.
▪ If you have a real-time process running, MapReduce is
not the right data processing engine for that.

17
Apache Spark
• Apache Spark is an in-memory distributed data analytics
platform designed to accelerate processes in the Hadoop
ecosystem.
• At each stage of a MapReduce operation, the data is read
and written back to the disk → latency and slow
• With Spark, the processing of this data is moved into high-
speed memory → allowing near-real-time processing of
events

Hadoop

Spark

18
Source: analyticsvidhya.com
Apache Spark
• Real-time processing is done by a component of the
Apache Spark project called Spark Streaming.
• Spark Streaming is responsible for taking live-streamed
data from a messaging system and dividing it into smaller
micro-batches.
• The Spark processing engine operates on these smaller
pieces of data, allowing rapid insights into the data and
subsequent actions.
• Similar platforms include Apache Storm and Flink.

19
Source: analyticsvidhya.com
Apache Kafka
• Apache Kafka is a messaging system is designed
to accept data, or messages, from where the
data is generated and delivered the data to
stream-processing engines such as Spark
Streaming or Storm.

20
Lambda Architecture
• Ultimately, the key elements of IoT use cases
involve the collection, processing, and storage of
data using multiple technologies.
• Querying both data in motion (streaming) and
data at rest (batch processing) requires a
combination of different projects.

21
Outlines
• An Introduction to Data Analytics for IoT
• Big Data Analytics Tools and Technology
• Edge Streaming Analytics
• Network Analytics

22
Edge Streaming Analytics
• Key values of edge streaming analytics
▪ Reducing data at the edge → Passing all the
IoT data to the cloud is inefficient and is
expensive in terms of bandwidth and network
infrastructure
▪ Analysis and response at the edge →
Some data is useful only at the edge (such as
the control within a local factory)
▪ Time sensitivity → Edge analytics allows
timely analysis and immediate responses to
changing conditions

23
Edge Streaming Analytics
• Three stages of streaming analytics at the
edge:
▪ Raw input data → data coming from the sensors into
the analytics processing unit
▪ Analytics processing unit (APU) → filters and
combines data streams, organizes them by time
windows, and performs various analytical functions
▪ Output streams → The data that is organized into
insightful streams and passed on for storage and further
processing in the cloud.

24
Edge Streaming Analytics
• Core functions of the Analytics Processing
Unit (APU):
▪ Filter → identifies the information that is considered
important to be processed on the edge
▪ Transform → manipulate the data structure into a form
required for further processing
▪ Time → establish a timing context of real-time
streaming data flows

25
Edge Streaming Analytics
• Core functions of the Analytics Processing
Unit (APU):
▪ Correlate → combine multiple data streams from
different types of sensors, such as body temperature,
heart rate, and blood pressure of the patient
▪ Match patterns → gain deeper insights into the data,
such as a sudden change in heart rate
▪ Improve business intelligence → more quickly and
timely response

26
Edge Streaming Analytics
• Depending on the application, analytics can
happen at any point throughout the IoT system
• An example of pressure and temperature
measurement on an oil rig:
▪ Analytics directly on the edge
▪ Fog node locating on the same oil rig performs
streaming analytics from several edge devices
▪ After fog analysis, result forwarded to the cloud for
deeper historical analysis

27
Outlines
• An Introduction to Data Analytics for IoT
• Big Data Analytics Tools and Technology
• Edge Streaming Analytics
• Network Analytics

28
Network Analytics
• Data analytics → finding patterns in the data
generated by endpoints
• Network analytics → discovering patterns in the
communication flows from a network perspective

29
Network Analytics
• For wireless IoT networks, packet sniffer can be
used for flow analytics

CatSniffer

Wireshark

30
Source: test-and-measurement-world.com
Network Analytics
• The benefits of network flow analytics:
▪ Network traffic monitoring and profiling→
IPv4/IPv6 networkwide traffic volume and pattern
analysis
▪ Application traffic monitoring and profiling → gain
a detailed time-based view of IoT access services, such
as MQTT and CoAP
▪ Capacity planning → track and anticipate IoT traffic
growth and help in the planning of upgrades
▪ Security analysis→ change in network traffic behavior
may indicate a cyber security event, such as a denial of
service (DoS) attack.
▪ Accounting → analyze and optimize the billing
▪ Data warehousing and data mining → Flow data can
be warehoused for later retrieval and analysis

31
Summary
• An Introduction to Data Analytics for IoT
• Big Data Analytics Tools and Technology
• Edge Streaming Analytics
• Network Analytics

32
Thank you!
Q&A

33

You might also like