7 Data and Analytics for IoT
Yuemin Ding
Tecnun School of Engineering
University of Navarra
1
Outlines
• An Introduction to Data Analytics for IoT
• Big Data Analytics Tools and Technology
• Edge Streaming Analytics
• Network Analytics
2
Data Analytics for IoT
• In IoT, the creation of massive amounts of data
from sensors is one of the biggest challenges.
▪ Modern jet engines are fitted with thousands of sensors
that generate 10GB of data per second.
▪ A twin-engine commercial aircraft with these engines
operating on average 8 hours a day will generate over
500 TB of data daily.
3
Structured v.s. Unstructured Data
• Structured data means that the data follows a
model or schema that defines how the data is
represented or organized.
• Unstructured data lacks a logical schema for
understanding and decoding the data through
traditional programming means.
4
Structured v.s. Unstructured Data
• Structured data and unstructured data require
different toolsets for analysis.
• Around 80% of a business’s data is unstructured.
5
Data in Motion Versus Data at Rest
• Data in IoT networks is either in transit (“data in
motion”) or being held or stored (“data at rest”).
▪ Data in motion include traditional client/server
exchanges, such as web browsing and file transfers, and
email.
▪ Data saved to a hard drive, storage array, or USB drive
is data at rest.
• Data in motion → real-time processing and
analysis → Spark, Storm, and Flink, etc.
• Data at rest → huge volume → Hadoop.
6
IoT Data Analytics Overview
• Data analysis is typically broken down by the
types of results that are produced:
▪ Descriptive: tells what is happening, either now or in the
past.
7
IoT Data Analytics Overview
• Data analysis is typically broken down by the
types of results that are produced:
▪ Diagnostic: provides the answer when you are
interested in the “why”.
8
IoT Data Analytics Overview
• Data analysis is typically broken down by the
types of results that are produced:
▪ Predictive: foretells problems or issues before they occur
9
IoT Data Analytics Overview
• Data analysis is typically broken down by the
types of results that are produced:
▪ Prescriptive: recommends solutions for upcoming
problems.
10
Outlines
• An Introduction to Data Analytics for IoT
• Big Data Analytics Tools and Technology
• Edge Streaming Analytics
• Network Analytics
11
Big Data Analytics Tools and Technology
• ‘Three Vs’ to categorize big data:
▪ Velocity refers to how quickly data is being collected
and analyzed.
▪ Variety refers to different types of data.
▪ Volume refers to the scale of the data.
• Over time, other Vs have been added to big data
12
Massively Parallel Processing Databases
• Massively parallel processing (MPP) databases
were built on the concept of the relational data
warehouses
• MPP are designed to be much faster, to be
efficient, and to support reduced query times.
13
Hadoop
• Hadoop was originally developed as a result of
projects at Google and Yahoo!
• The project had two key elements:
▪ Hadoop Distributed File System (HDFS): A system
for storing data across multiple nodes
▪ MapReduce: A distributed processing engine that splits
a large task into smaller ones that can be run in parallel
14
Hadoop
• Hadoop takes advantage of a distributed
architecture to store and process massive
amounts of data and can leverage resources
from all nodes in the cluster.
15
Hadoop
• NameNodes: They coordinate where the data is stored
and maintain a map of where each block of data is stored
and where it is replicated.
• DataNodes: These are the servers where the data are
stored.
16
Hadoop
• Drawback:
▪ MapReduce breaks down a query into smaller tasks,
which is useful for the analysis of historical data.
▪ Depending on how much data is being queried and the
complexity of the query, the result could take seconds
or minutes to return.
▪ If you have a real-time process running, MapReduce is
not the right data processing engine for that.
17
Apache Spark
• Apache Spark is an in-memory distributed data analytics
platform designed to accelerate processes in the Hadoop
ecosystem.
• At each stage of a MapReduce operation, the data is read
and written back to the disk → latency and slow
• With Spark, the processing of this data is moved into high-
speed memory → allowing near-real-time processing of
events
Hadoop
Spark
18
Source: analyticsvidhya.com
Apache Spark
• Real-time processing is done by a component of the
Apache Spark project called Spark Streaming.
• Spark Streaming is responsible for taking live-streamed
data from a messaging system and dividing it into smaller
micro-batches.
• The Spark processing engine operates on these smaller
pieces of data, allowing rapid insights into the data and
subsequent actions.
• Similar platforms include Apache Storm and Flink.
19
Source: analyticsvidhya.com
Apache Kafka
• Apache Kafka is a messaging system is designed
to accept data, or messages, from where the
data is generated and delivered the data to
stream-processing engines such as Spark
Streaming or Storm.
20
Lambda Architecture
• Ultimately, the key elements of IoT use cases
involve the collection, processing, and storage of
data using multiple technologies.
• Querying both data in motion (streaming) and
data at rest (batch processing) requires a
combination of different projects.
21
Outlines
• An Introduction to Data Analytics for IoT
• Big Data Analytics Tools and Technology
• Edge Streaming Analytics
• Network Analytics
22
Edge Streaming Analytics
• Key values of edge streaming analytics
▪ Reducing data at the edge → Passing all the
IoT data to the cloud is inefficient and is
expensive in terms of bandwidth and network
infrastructure
▪ Analysis and response at the edge →
Some data is useful only at the edge (such as
the control within a local factory)
▪ Time sensitivity → Edge analytics allows
timely analysis and immediate responses to
changing conditions
23
Edge Streaming Analytics
• Three stages of streaming analytics at the
edge:
▪ Raw input data → data coming from the sensors into
the analytics processing unit
▪ Analytics processing unit (APU) → filters and
combines data streams, organizes them by time
windows, and performs various analytical functions
▪ Output streams → The data that is organized into
insightful streams and passed on for storage and further
processing in the cloud.
24
Edge Streaming Analytics
• Core functions of the Analytics Processing
Unit (APU):
▪ Filter → identifies the information that is considered
important to be processed on the edge
▪ Transform → manipulate the data structure into a form
required for further processing
▪ Time → establish a timing context of real-time
streaming data flows
25
Edge Streaming Analytics
• Core functions of the Analytics Processing
Unit (APU):
▪ Correlate → combine multiple data streams from
different types of sensors, such as body temperature,
heart rate, and blood pressure of the patient
▪ Match patterns → gain deeper insights into the data,
such as a sudden change in heart rate
▪ Improve business intelligence → more quickly and
timely response
26
Edge Streaming Analytics
• Depending on the application, analytics can
happen at any point throughout the IoT system
• An example of pressure and temperature
measurement on an oil rig:
▪ Analytics directly on the edge
▪ Fog node locating on the same oil rig performs
streaming analytics from several edge devices
▪ After fog analysis, result forwarded to the cloud for
deeper historical analysis
27
Outlines
• An Introduction to Data Analytics for IoT
• Big Data Analytics Tools and Technology
• Edge Streaming Analytics
• Network Analytics
28
Network Analytics
• Data analytics → finding patterns in the data
generated by endpoints
• Network analytics → discovering patterns in the
communication flows from a network perspective
29
Network Analytics
• For wireless IoT networks, packet sniffer can be
used for flow analytics
CatSniffer
Wireshark
30
Source: test-and-measurement-world.com
Network Analytics
• The benefits of network flow analytics:
▪ Network traffic monitoring and profiling→
IPv4/IPv6 networkwide traffic volume and pattern
analysis
▪ Application traffic monitoring and profiling → gain
a detailed time-based view of IoT access services, such
as MQTT and CoAP
▪ Capacity planning → track and anticipate IoT traffic
growth and help in the planning of upgrades
▪ Security analysis→ change in network traffic behavior
may indicate a cyber security event, such as a denial of
service (DoS) attack.
▪ Accounting → analyze and optimize the billing
▪ Data warehousing and data mining → Flow data can
be warehoused for later retrieval and analysis
31
Summary
• An Introduction to Data Analytics for IoT
• Big Data Analytics Tools and Technology
• Edge Streaming Analytics
• Network Analytics
32
Thank you!
Q&A
33