0% found this document useful (0 votes)
79 views58 pages

Chapter 4 - Data Storage and Processing in IoT

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views58 pages

Chapter 4 - Data Storage and Processing in IoT

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Cơ sở và thiết kế hệ thống IoT

(IoT System design)


Chapter 4 : Data storage and processing in IoT

Ts: Nguyễn Đắc Cử


Khoa Điện - Điện tử
Trường Đại học Phenikaa
Chapter 4 : Data storage and processing in IoT

Contents

1. The data flow in an IoT system


2. Big Data and IoT
3. IoT data formats
4. IoT data storage
5. IoT data processing and analytic

1
Chapter 4 : Data storage and processing in IoT

The data flow in an IoT system

3-layer data processing tiers in IoT system architecture:

Data sources:
• IoT collects data from smart devices, environmental sensors,
smartphones, intelligent vehicles, and all kinds of sensors.
• The data can then be sent over the network with common
standard protocols such as MQTT, CoAP, and HTTP to the edge
gateway then to the cloud.

2
Chapter 4 : Data storage and processing in IoT

The data flow in an IoT system

Data storage:
• This layer stores data collected from sensors and devices at the
edge or cloud for long-term or short-term applications. The edge
gateway provides functionalities, such as sensor data aggregation,
pre-processing of the data, and securing connectivity to the cloud.
• In the cloud, there are various database management systems
built for IoT applications. The systems can store and manage
those enormous amounts of data for further applications.

3
Chapter 4 : Data storage and processing in IoT

The data flow in an IoT system

Data analytics & applications:


• Most organizations can use the cloud to run the applications
needed to process device-generated data.
• This layer analyzes the data with AI, machine learning, and basic
computing techniques to generate useful information.
• This data is used to create actionable insights to unlock data-
driven business intelligence, optimize operations, engage more
customers, control processes automatically, and help enterprises
make the best decision based on the results extracted from the
data analytics layer.

4
Chapter 4 : Data storage and processing in IoT

The architecture of IoT data flow

5
Chapter 4 : Data storage and processing in IoT

How IoT Data Collection Works

6
Chapter 4 : Data storage and processing in IoT

Big Data

"From the dawn of civilization to 2003, five


exabytes of data were created. The same amount
was created in the last two days.”

-Eric Schmidt (Former Google CEO), in 2010-

7
Chapter 4 : Data storage and processing in IoT

Big Data

[Link] 8
Chapter 4 : Data storage and processing in IoT

Big Data Definition

No single standard definition.


• “Big data” is data whose scale, diversity, and complexity require
new architecture, techniques, algorithms, and analytics to manage
it and extract value and hidden knowledge from it.
• Big Data describes very large sets of information. It is estimated
that there are now more than 44 zettabytes (352 trillion GB) of
data covering the digital universe, with much of that data having
been generated in the past two years via the Internet of Things.
• With approximately 1,145 trillion MB of information generated
every day, businesses and other organizations are searching
through mountains of data to create actionable insights that will
shape the future.

9
Chapter 4 : Data storage and processing in IoT

Big Data

[Link] 10
Chapter 4 : Data storage and processing in IoT

5 V's of Big Data

11
Chapter 4 : Data storage and processing in IoT

5 V's of Big Data


• Volume: the size and amounts of big data that companies manage
and analyze

12
Chapter 4 : Data storage and processing in IoT

5 V's of Big Data


• Value: Value is the most important characteristic of big data

13
Chapter 4 : Data storage and processing in IoT

5 V's of Big Data


• Variety: the diversity and range of different data types, including
unstructured data, semi-structured data and raw data

14
Chapter 4 : Data storage and processing in IoT

5 V's of Big Data


• Velocity: the speed at which companies receive, store and manage
data – e.g., the specific number of social media posts or search
queries received within a day, hour or other unit of time

15
Chapter 4 : Data storage and processing in IoT

5 V's of Big Data


• Veracity: the “truth” or accuracy of data and information assets,
which often determines executive-level confidence

16
Chapter 4 : Data storage and processing in IoT

5 V's of Big Data


The additional characteristic of variability can also be considered:

• Variability: the changing nature of the data companies seek to


capture, manage and analyze – e.g., in sentiment or text analytics,
changes in the meaning of key words or phrases

17
Chapter 4 : Data storage and processing in IoT

5 V's of Big Data


The additional characteristic of variability can also be considered:

• Variability: the changing nature of the data companies seek to


capture, manage and analyze – e.g., in sentiment or text analytics,
changes in the meaning of key words or phrases

18
Chapter 4 : Data storage and processing in IoT

How big data works

There are three key actions:


• Integrate : Big data brings together data from many disparate
sources and applications.
• Manage: Big data requires storage.
• Analyze: analyze and act on data.

19
Chapter 4 : Data storage and processing in IoT

Big Data and IoT

20
Chapter 4 : Data storage and processing in IoT

Big Data and IoT

21
Chapter 4 : Data storage and processing in IoT

Big Data in IoT

Big Data should enable real-time analysis of the data generated


by IOT and thus optimize the use of this technology. To do this, Big
Data proceeds in 4 steps:
• A large amount of unstructured data is generated by IoT devices
which are collected in the big data system. This IoT generated big
data largely depends on their 5V factors that are volume, velocity,
value, veracity and variety.
• In the big data system which is basically a shared distributed
database, the huge amount of data is stored in big data files.
• Analyzing the stored IoT big data using analytic tools like Hadoop
MapReduce or Spark
• Generating the reports of analyzed data.

22
Chapter 4 : Data storage and processing in IoT

The interaction between IoT and Big Data

• The interaction between IoT and Big Data is not one-way. IoT
could also bring a lot to Big Data. The more important IoT are in
your daily life and that of your city, the more developers will be
demanding greater capacity in terms of big data and the more this
business will grow.

• It will thereby be important to improve data storage technologies to


develop systems capable of processing even more data. This
interaction could thus enable technological growth in both areas
simultaneously.

23
Chapter 4 : Data storage and processing in IoT

Benefits of IoT and Big Data in Different Sectors

• Helps to increase the ROI for the Businesses


• Reshape the future e-health system
• Advantages in manufacturing companies
• Benefits in the transportation industry
• More benefits in Industrial internet of things (IIoT)
• Edge-Computing will be in high demand

24
Chapter 4 : Data storage and processing in IoT

IoT data

International Data Corporation (IDC)


25
Chapter 4 : Data storage and processing in IoT

IoT data

• The growth of IoT devices is happening at a very high pace and


reaching more than 75 billion in this decade.
• According to a new IDC forecast, the data generated by these
devices is expected to reach around 80 zetabytes (1021) by 2025.
• IoT data is the output of a device or a process associated with an
application, which is a physical quantity from its environment.

Data
Sensors → → → → → → IoT services

* Evolutionary Computing and Mobile Sustainable Networks (pp.503-515).


10.1007/978-981-15-5258-8_47
26
Chapter 4 : Data storage and processing in IoT

IoT sensor

27
Chapter 4 : Data storage and processing in IoT

IoT data

Data like discrete sensor readings, metadata about a device, files for
image and video are part of the heterogeneous data generated.
The top 3 priority challenges in data preparation: a high volume of data,
complexity, interoperability.
IoT device → low storage capabilities → the high volume of data needs
to be transmitted using communication protocols for further processing and
storing. IoT should deal with a high volume of data and also give
importance to the following issues:
• Handling heterogeneous data,
• Preparing data for the analysis by transforming,
• Aggregating, integrating, and keeping track of data origin,
• Preserving integrity and privacy of the data,
• Choosing storage that can balance between performance, reliability,
flexibility, and cost. 28
Chapter 4 : Data storage and processing in IoT

IoT data structure

Data sensed by an IoT device is a mixture of structured,


semistructured, and unstructured data.
The structured data is represented according to some model or
schema, and it can easily be associated with traditional RDBMS
(Relational Database Management System). Structured data is
represented as tabular representation, like a spreadsheet where
each cell is explicitly defined and referred.
Most of the computing systems like bank transactions and
computer log make use of structured data. IoT sensors represent the
data like temperature, humidity, pressure, and other as structured
data. Structured data can be easily formatted, queried, and
processed to use in decision-making.

29
Chapter 4 : Data storage and processing in IoT

IoT data structure

The unstructured data does not follow any logical schema or any
predefined data model for representation, so the traditional methods
used for understanding and processing can not work for this data, for
example, text, speech, image, and video.
The semi-structured data is the hybrid of structured and
unstructured data and share the characters of both. Email is one of
the good examples of semi-structured data where fields are
predefined, but the content of the body and attachment is
unstructured.

30
Chapter 4 : Data storage and processing in IoT

IoT data format

The major data formats generated by IoT sensors and applications


are Text, Binary, XML, CSV, JSON, and RFID.
The data in IoT depends on the type of sensor and the developer’s
interest. The sensor is connected with an application that demands
less detailed data; IoT uses simple data formats like text and binary.
Whereas for sensors connected to smart devices and applications
the requirement is greater details in data; IoT tends to choose
encoded data formats like XML, JSON, and CSV, for example, PTC
things Worx, Arrowhead, OpenIoT *.
IoT data includes device status, metadata about the device, and
captured data. The data generated by IoT is not uniform, so a single
representation of data for all the applications is difficult.
*Kenda K, Kažiˇc B, Novak E, Mladeni´c D (2019) Streaming data fusion for the internet of things:
taxonomies and open challenges, pp 796–809 31
Chapter 4 : Data storage and processing in IoT

Text

Text data is the human-readable sequence of characters other


than non-character encoded data such as graphic images, audio,
and video. The IoT sensor captures the data from their environment
and represents the data in the text format. Examples of data sensed
by a temperature sensor on the floor, ceiling, and bedside of a hotel
room provide the output as single line textual data with device
identification, location of the device, environment, and read
temperature data.

deviceID: “aee62681aa9b”, “location”: “floor”, “room”: 205, “temp”: 21


deviceID: “792d3a3ef366”, “location”: “ceiling”, “room”: 205, “temp”: 25
deviceID: “b7c96bd32435”, “location”: “bedside”, “room”: 205, “temp”: 24

32
Chapter 4 : Data storage and processing in IoT

XML

Extensible Markup Language is a meta markup language, and is


one of the preferred data formats on the world wide web. Cross-
domain application IoT deployment faces the constraint of inter-
domain data format. XML is one such language that solves the issue
to some extent. XML is the human-readable representation of device
information and sensed data. XML-based description of sensors and
measurement process and encoding could be done by SensorML.

33
Chapter 4 : Data storage and processing in IoT

XML

34
Chapter 4 : Data storage and processing in IoT

CSV

Comma-separated values file is a text file where data values are


delimited by comma, or represented as excel sheet values for easy
access of data for processing. Each line in the CSV file is termed as
one record which specifies sensed data as one sample. Each record
will have values separated by delimiter as the comma.
Many IoT and other applications support the file format. For example,
Tree measurements of data file with 4 records and fields are as follows:
index, circumference (in), height (ft), volume (ftˆ3).

“Index”, “Girth (in)”, “Height (ft)”, “Volume (ftˆ3)”


1, 8.3, 70, 10.3
2, 8.6, 65, 10.3
3, 8.8, 63, 10.2
4, 10.5, 72, 16.4
35
Chapter 4 : Data storage and processing in IoT

CSV

a CSV file from Valarm website:

36
Chapter 4 : Data storage and processing in IoT

JSON

Javascript Object Notation is a lightweight data interchange


format. JSON is a comprehensive hierarchical data format supported
by many modern applications. Even though it can represent complex
data as an object still it is in human-readable format.
JSON is a preferred data format in IoT compared to XML, as
JSON is schemaless, JSON supports strings, numbers, boolean,
objects, arrays, and a null value. XML increases file size from its
header information [16–18]. Below is the example JSON data format
of a device with its attributes name and captured value.

“deviceid”: “iot123”, “temp”: 54.98, “humidity”: 32.43, “coords”:


“latitude”: 47.615694, “longitude”: −122.3359976

37
Chapter 4 : Data storage and processing in IoT

RFID
Radio Frequency Identification System (RFID) helps to identify the
objects with tags automatically. The following is the example of RFID tag
data, from defense with size of the Tag: Header (8 bits), Filter (4 bits),
CAGE Code as ASCII* (48 bits), Serial Number (36 bits). Data is
b00811001111 b0040000 t048 2S194 n03612345678901.
The final hexadecimal data representation with prefix and suffix of RFID
tag are represented in the following 96-bit format:
{XAˆ RFW,H ˆ FDCF02032533139342DFDC1C35 ˆ FSˆ}
RFID systems are adopted by large companies and have contributed to
publishing nID standards and industrial open standard specifications.
RFID data stream includes data on RF tags (transponders), RF tag
reader (transceivers), electronic product code, which can contain product
info and manufacturer number.
RFID data consists of tag-ID, reader-ID, timestamp; this information is
insufficient, incomplete, and high volume. 38
Chapter 4 : Data storage and processing in IoT

O-DF
Interoperable Format or New Open Data Format (O-DF).
The O-DF format will have object hierarchy, where each object will
have sub-objects, and sub-object could be device id or other
information about the device.
The hierarchy can have many levels depending on the details of
the information.

39
Chapter 4 : Data storage and processing in IoT

O-DF

OD-F code using OD-F code using


XML-sensor data from JSON-sensor data from
refrigerator refrigerator
40
Chapter 4 : Data storage and processing in IoT

IoT Data Taxonomy


3 IoT data categories: Data generation, Data quality, and Data
interoperability. This representation and their specific characters are
represented as data taxonomy.
• Generation of data: depends on factors as, at what rate samples are
generated, coping up with a high amount of data generated, the
dynamism of data, and a wide variety of data at a very large rate.
• Data quality: the quality of data depends on uncertainty due to
different sources, missing reading, device identification problems, and
accuracy.
• Data interoperability: to produce a good response by the IoT system
for an event sensed by the sensor requires data from multiple
sources in that environment. Combined data need cooperation
between devices; a failure in this situation can result in
incompleteness.
41
Chapter 4 : Data storage and processing in IoT

Data Objects and Data Stream


Data objects are amultidimensional attribute vector within a
continuous, categorical, or mixed attribute space.
Data stream is a huge sequence of data objects.
Data stream processing: The sensor data processing technique
involves data aggregation, data compression, modeling, and online
querying. Queries can be aggregated to avoid high power
consumption. The initial query is executed to produce an
intermediate result that can be processed further. Many queries can
join to get accurate results. The validity of the top received values is
ensured by making mathematical constraints. The quality of the data
can improve by adopting error-tolerant methods.
Stream mining is performed to extract useful information
employing clustering, classification, outlier, and frequent item set
mining.
42
Chapter 4 : Data storage and processing in IoT

Data Reduction
Growth of diverse data sources and transmission data →
redundant at the storage and analysis → network bandwidth
problem, storage, and throughput at the cloud level.
Data reduction is one such solution to overcome the above
problem.
Data reduction techniques:
➢ single-tier where data is reduced at the gateway,
➢ two-tier where reduction methods are employed at gateway
and cloud or sensor and base station.

43
Chapter 4 : Data storage and processing in IoT

IoT data storage

[Link]
44
Chapter 4 : Data storage and processing in IoT

Hot Storage
Hot Storage: optimized for IoT data with real-time query support.
A hot storage database backs the user-facing side of your IoT
application.
Hot storage databases are optimized for performance, so data can
be instantly queried and displayed on dashboards or custom user
interfaces.
Due to the performance requirements of a hot storage database,
storing this type of data can be costly
→ data will only be available in the hot storage database for a
certain amount of time
→ Before data reaches the retention limit and is deleted, it need to
be copied to warm or cold storage.

45
Chapter 4 : Data storage and processing in IoT

Warm Storage
Warm Storage: optimized for large data volumes with generic
query support.
Warm storage databases are optimized for scale, meaning they
can potentially store an indefinite amount of data. The main
difference between warm storage and cold storage is the ability to
easily query the data.
Warm storage databases, often called data lakes or data
warehouses, typically provide some kind of generic query support to
explore the data.
These databases, however, are not optimized for IoT data and
usually don’t offer the powerful aggregations required for time-series
queries.
One of the primary use cases for warm storage is for offline
analytics and AI/ML. 46
Chapter 4 : Data storage and processing in IoT

Cold Storage
Cold Storage: optimized for cost.
Whereas hot storage and warm storage are typically provided
through databases, cold storage is usually implemented as cloud
buckets or file storage.
Removing the database engine drastically reduces the cost of
storage, but sacrifices the ability to quickly and easily query and
explore the data.
The primary purpose of cold storage is for archiving and backups.
.

47
Chapter 4 : Data storage and processing in IoT

IoT Platforms and Operating Systems


There are a vast number of IoT platforms and operating systems
that can integrate many of the abovementioned technologies to
provide IoT services.
C-based IoT operating systems:
• RIOT
• Contiki

48
Chapter 4 : Data storage and processing in IoT

IoT Platforms and Operating Systems

Most IoT platforms are cloud based and provide IoT technologies.
• AWS IoT
• IBM Watson
• ThingWorx
• Bosch IoT Suite
• Xively
• EVRYTHNG
• Kaa

49
Chapter 4 : Data storage and processing in IoT

IoT Platforms and Operating Systems

50
Chapter 4 : Data storage and processing in IoT

IoT technologies for semantics

Semantic interoperability is a collection of technologies that


enable computer systems to interact unambiguously: Sensor Model
Language, Media Types for Sensor Markup Language, RESTful API
Modeling Language, Wolfram Data Drop.
Sensor Model Language (SensorML). A standard model based
on an XML schema to describe sensors and measurement
procedures. SensorML is useful in IoT systems for creating electronic
description sheets for sensor modules and collecting metadata to be
used to discover sensor systems and observe processes. It also
enables sensor networks to be autonomous because of the self-
describing features of SensorML-supported sensors.

51
Chapter 4 : Data storage and processing in IoT

IoT technologies for semantics

Media Types for Sensor Markup Language (SenML). It is a


new, simple model for acquiring sensed data and to control
actuators. It provides seman tics for the data and allows for
additional metadata with links and extensions. This simple model can
be used in many IoT applications. For example, a sensor, such as a
humidity sensor, could use this media type in CoAP to transport the
sensors’ measurements.

52
Chapter 4 : Data storage and processing in IoT

IoT technologies for semantics

IoT Database (IOTDB). A new technology with unlimited


expandability that supports semantics for providing formal definitions
of all necessary items. Unlike the aforementioned technologies,
IOTDB uses JSON dictionaries to manipulate and monitor nodes,
which makes it relatively fast since JSON parsing is always more
efficient than XML parsing. IOTDB is compatible with protocols, such
as CoAP, and MQTT.

53
Chapter 4 : Data storage and processing in IoT

IoT technologies for semantics

RESTful API Modeling Language (RAML): This language is


used to define HTTP-based APIs that represent most of the
principles of Representational State Transfer (REST). Since it is
RESTful, it is more likely to be used in IoT scenarios that are suitable
for CoAP, where the network overhead is negligible.

54
Chapter 4 : Data storage and processing in IoT

IoT technologies for semantics

Wolfram Data Drop: An open service that allows accumulating


data of any type from anywhere (including IoT nodes) to prepare it
semantically for instant computation, querying, analysis,
visualization, or other operations. Computable data (i.e., collections
and time series) are saved in named data bins in the Wolfram Cloud
and are immediately accessible from all other systems/applications.

55
Chapter 4 : Data storage and processing in IoT

Real-time (streaming) data analytic technologies in IoT

56
IoT System design

Thank you for attention!

You might also like