Cơ sở và thiết kế hệ thống IoT
(IoT System design)
Chapter 4 : Data storage and processing in IoT
Ts: Nguyễn Đắc Cử
Khoa Điện - Điện tử
Trường Đại học Phenikaa
Chapter 4 : Data storage and processing in IoT
Contents
1. The data flow in an IoT system
2. Big Data and IoT
3. IoT data formats
4. IoT data storage
5. IoT data processing and analytic
1
Chapter 4 : Data storage and processing in IoT
The data flow in an IoT system
3-layer data processing tiers in IoT system architecture:
Data sources:
• IoT collects data from smart devices, environmental sensors,
smartphones, intelligent vehicles, and all kinds of sensors.
• The data can then be sent over the network with common
standard protocols such as MQTT, CoAP, and HTTP to the edge
gateway then to the cloud.
2
Chapter 4 : Data storage and processing in IoT
The data flow in an IoT system
Data storage:
• This layer stores data collected from sensors and devices at the
edge or cloud for long-term or short-term applications. The edge
gateway provides functionalities, such as sensor data aggregation,
pre-processing of the data, and securing connectivity to the cloud.
• In the cloud, there are various database management systems
built for IoT applications. The systems can store and manage
those enormous amounts of data for further applications.
3
Chapter 4 : Data storage and processing in IoT
The data flow in an IoT system
Data analytics & applications:
• Most organizations can use the cloud to run the applications
needed to process device-generated data.
• This layer analyzes the data with AI, machine learning, and basic
computing techniques to generate useful information.
• This data is used to create actionable insights to unlock data-
driven business intelligence, optimize operations, engage more
customers, control processes automatically, and help enterprises
make the best decision based on the results extracted from the
data analytics layer.
4
Chapter 4 : Data storage and processing in IoT
The architecture of IoT data flow
5
Chapter 4 : Data storage and processing in IoT
How IoT Data Collection Works
6
Chapter 4 : Data storage and processing in IoT
Big Data
"From the dawn of civilization to 2003, five
exabytes of data were created. The same amount
was created in the last two days.”
-Eric Schmidt (Former Google CEO), in 2010-
7
Chapter 4 : Data storage and processing in IoT
Big Data
[Link] 8
Chapter 4 : Data storage and processing in IoT
Big Data Definition
No single standard definition.
• “Big data” is data whose scale, diversity, and complexity require
new architecture, techniques, algorithms, and analytics to manage
it and extract value and hidden knowledge from it.
• Big Data describes very large sets of information. It is estimated
that there are now more than 44 zettabytes (352 trillion GB) of
data covering the digital universe, with much of that data having
been generated in the past two years via the Internet of Things.
• With approximately 1,145 trillion MB of information generated
every day, businesses and other organizations are searching
through mountains of data to create actionable insights that will
shape the future.
9
Chapter 4 : Data storage and processing in IoT
Big Data
[Link] 10
Chapter 4 : Data storage and processing in IoT
5 V's of Big Data
11
Chapter 4 : Data storage and processing in IoT
5 V's of Big Data
• Volume: the size and amounts of big data that companies manage
and analyze
12
Chapter 4 : Data storage and processing in IoT
5 V's of Big Data
• Value: Value is the most important characteristic of big data
13
Chapter 4 : Data storage and processing in IoT
5 V's of Big Data
• Variety: the diversity and range of different data types, including
unstructured data, semi-structured data and raw data
14
Chapter 4 : Data storage and processing in IoT
5 V's of Big Data
• Velocity: the speed at which companies receive, store and manage
data – e.g., the specific number of social media posts or search
queries received within a day, hour or other unit of time
15
Chapter 4 : Data storage and processing in IoT
5 V's of Big Data
• Veracity: the “truth” or accuracy of data and information assets,
which often determines executive-level confidence
16
Chapter 4 : Data storage and processing in IoT
5 V's of Big Data
The additional characteristic of variability can also be considered:
• Variability: the changing nature of the data companies seek to
capture, manage and analyze – e.g., in sentiment or text analytics,
changes in the meaning of key words or phrases
17
Chapter 4 : Data storage and processing in IoT
5 V's of Big Data
The additional characteristic of variability can also be considered:
• Variability: the changing nature of the data companies seek to
capture, manage and analyze – e.g., in sentiment or text analytics,
changes in the meaning of key words or phrases
18
Chapter 4 : Data storage and processing in IoT
How big data works
There are three key actions:
• Integrate : Big data brings together data from many disparate
sources and applications.
• Manage: Big data requires storage.
• Analyze: analyze and act on data.
19
Chapter 4 : Data storage and processing in IoT
Big Data and IoT
20
Chapter 4 : Data storage and processing in IoT
Big Data and IoT
21
Chapter 4 : Data storage and processing in IoT
Big Data in IoT
Big Data should enable real-time analysis of the data generated
by IOT and thus optimize the use of this technology. To do this, Big
Data proceeds in 4 steps:
• A large amount of unstructured data is generated by IoT devices
which are collected in the big data system. This IoT generated big
data largely depends on their 5V factors that are volume, velocity,
value, veracity and variety.
• In the big data system which is basically a shared distributed
database, the huge amount of data is stored in big data files.
• Analyzing the stored IoT big data using analytic tools like Hadoop
MapReduce or Spark
• Generating the reports of analyzed data.
22
Chapter 4 : Data storage and processing in IoT
The interaction between IoT and Big Data
• The interaction between IoT and Big Data is not one-way. IoT
could also bring a lot to Big Data. The more important IoT are in
your daily life and that of your city, the more developers will be
demanding greater capacity in terms of big data and the more this
business will grow.
• It will thereby be important to improve data storage technologies to
develop systems capable of processing even more data. This
interaction could thus enable technological growth in both areas
simultaneously.
23
Chapter 4 : Data storage and processing in IoT
Benefits of IoT and Big Data in Different Sectors
• Helps to increase the ROI for the Businesses
• Reshape the future e-health system
• Advantages in manufacturing companies
• Benefits in the transportation industry
• More benefits in Industrial internet of things (IIoT)
• Edge-Computing will be in high demand
24
Chapter 4 : Data storage and processing in IoT
IoT data
International Data Corporation (IDC)
25
Chapter 4 : Data storage and processing in IoT
IoT data
• The growth of IoT devices is happening at a very high pace and
reaching more than 75 billion in this decade.
• According to a new IDC forecast, the data generated by these
devices is expected to reach around 80 zetabytes (1021) by 2025.
• IoT data is the output of a device or a process associated with an
application, which is a physical quantity from its environment.
Data
Sensors → → → → → → IoT services
* Evolutionary Computing and Mobile Sustainable Networks (pp.503-515).
10.1007/978-981-15-5258-8_47
26
Chapter 4 : Data storage and processing in IoT
IoT sensor
27
Chapter 4 : Data storage and processing in IoT
IoT data
Data like discrete sensor readings, metadata about a device, files for
image and video are part of the heterogeneous data generated.
The top 3 priority challenges in data preparation: a high volume of data,
complexity, interoperability.
IoT device → low storage capabilities → the high volume of data needs
to be transmitted using communication protocols for further processing and
storing. IoT should deal with a high volume of data and also give
importance to the following issues:
• Handling heterogeneous data,
• Preparing data for the analysis by transforming,
• Aggregating, integrating, and keeping track of data origin,
• Preserving integrity and privacy of the data,
• Choosing storage that can balance between performance, reliability,
flexibility, and cost. 28
Chapter 4 : Data storage and processing in IoT
IoT data structure
Data sensed by an IoT device is a mixture of structured,
semistructured, and unstructured data.
The structured data is represented according to some model or
schema, and it can easily be associated with traditional RDBMS
(Relational Database Management System). Structured data is
represented as tabular representation, like a spreadsheet where
each cell is explicitly defined and referred.
Most of the computing systems like bank transactions and
computer log make use of structured data. IoT sensors represent the
data like temperature, humidity, pressure, and other as structured
data. Structured data can be easily formatted, queried, and
processed to use in decision-making.
29
Chapter 4 : Data storage and processing in IoT
IoT data structure
The unstructured data does not follow any logical schema or any
predefined data model for representation, so the traditional methods
used for understanding and processing can not work for this data, for
example, text, speech, image, and video.
The semi-structured data is the hybrid of structured and
unstructured data and share the characters of both. Email is one of
the good examples of semi-structured data where fields are
predefined, but the content of the body and attachment is
unstructured.
30
Chapter 4 : Data storage and processing in IoT
IoT data format
The major data formats generated by IoT sensors and applications
are Text, Binary, XML, CSV, JSON, and RFID.
The data in IoT depends on the type of sensor and the developer’s
interest. The sensor is connected with an application that demands
less detailed data; IoT uses simple data formats like text and binary.
Whereas for sensors connected to smart devices and applications
the requirement is greater details in data; IoT tends to choose
encoded data formats like XML, JSON, and CSV, for example, PTC
things Worx, Arrowhead, OpenIoT *.
IoT data includes device status, metadata about the device, and
captured data. The data generated by IoT is not uniform, so a single
representation of data for all the applications is difficult.
*Kenda K, Kažiˇc B, Novak E, Mladeni´c D (2019) Streaming data fusion for the internet of things:
taxonomies and open challenges, pp 796–809 31
Chapter 4 : Data storage and processing in IoT
Text
Text data is the human-readable sequence of characters other
than non-character encoded data such as graphic images, audio,
and video. The IoT sensor captures the data from their environment
and represents the data in the text format. Examples of data sensed
by a temperature sensor on the floor, ceiling, and bedside of a hotel
room provide the output as single line textual data with device
identification, location of the device, environment, and read
temperature data.
deviceID: “aee62681aa9b”, “location”: “floor”, “room”: 205, “temp”: 21
deviceID: “792d3a3ef366”, “location”: “ceiling”, “room”: 205, “temp”: 25
deviceID: “b7c96bd32435”, “location”: “bedside”, “room”: 205, “temp”: 24
32
Chapter 4 : Data storage and processing in IoT
XML
Extensible Markup Language is a meta markup language, and is
one of the preferred data formats on the world wide web. Cross-
domain application IoT deployment faces the constraint of inter-
domain data format. XML is one such language that solves the issue
to some extent. XML is the human-readable representation of device
information and sensed data. XML-based description of sensors and
measurement process and encoding could be done by SensorML.
33
Chapter 4 : Data storage and processing in IoT
XML
34
Chapter 4 : Data storage and processing in IoT
CSV
Comma-separated values file is a text file where data values are
delimited by comma, or represented as excel sheet values for easy
access of data for processing. Each line in the CSV file is termed as
one record which specifies sensed data as one sample. Each record
will have values separated by delimiter as the comma.
Many IoT and other applications support the file format. For example,
Tree measurements of data file with 4 records and fields are as follows:
index, circumference (in), height (ft), volume (ftˆ3).
“Index”, “Girth (in)”, “Height (ft)”, “Volume (ftˆ3)”
1, 8.3, 70, 10.3
2, 8.6, 65, 10.3
3, 8.8, 63, 10.2
4, 10.5, 72, 16.4
35
Chapter 4 : Data storage and processing in IoT
CSV
a CSV file from Valarm website:
36
Chapter 4 : Data storage and processing in IoT
JSON
Javascript Object Notation is a lightweight data interchange
format. JSON is a comprehensive hierarchical data format supported
by many modern applications. Even though it can represent complex
data as an object still it is in human-readable format.
JSON is a preferred data format in IoT compared to XML, as
JSON is schemaless, JSON supports strings, numbers, boolean,
objects, arrays, and a null value. XML increases file size from its
header information [16–18]. Below is the example JSON data format
of a device with its attributes name and captured value.
“deviceid”: “iot123”, “temp”: 54.98, “humidity”: 32.43, “coords”:
“latitude”: 47.615694, “longitude”: −122.3359976
37
Chapter 4 : Data storage and processing in IoT
RFID
Radio Frequency Identification System (RFID) helps to identify the
objects with tags automatically. The following is the example of RFID tag
data, from defense with size of the Tag: Header (8 bits), Filter (4 bits),
CAGE Code as ASCII* (48 bits), Serial Number (36 bits). Data is
b00811001111 b0040000 t048 2S194 n03612345678901.
The final hexadecimal data representation with prefix and suffix of RFID
tag are represented in the following 96-bit format:
{XAˆ RFW,H ˆ FDCF02032533139342DFDC1C35 ˆ FSˆ}
RFID systems are adopted by large companies and have contributed to
publishing nID standards and industrial open standard specifications.
RFID data stream includes data on RF tags (transponders), RF tag
reader (transceivers), electronic product code, which can contain product
info and manufacturer number.
RFID data consists of tag-ID, reader-ID, timestamp; this information is
insufficient, incomplete, and high volume. 38
Chapter 4 : Data storage and processing in IoT
O-DF
Interoperable Format or New Open Data Format (O-DF).
The O-DF format will have object hierarchy, where each object will
have sub-objects, and sub-object could be device id or other
information about the device.
The hierarchy can have many levels depending on the details of
the information.
39
Chapter 4 : Data storage and processing in IoT
O-DF
OD-F code using OD-F code using
XML-sensor data from JSON-sensor data from
refrigerator refrigerator
40
Chapter 4 : Data storage and processing in IoT
IoT Data Taxonomy
3 IoT data categories: Data generation, Data quality, and Data
interoperability. This representation and their specific characters are
represented as data taxonomy.
• Generation of data: depends on factors as, at what rate samples are
generated, coping up with a high amount of data generated, the
dynamism of data, and a wide variety of data at a very large rate.
• Data quality: the quality of data depends on uncertainty due to
different sources, missing reading, device identification problems, and
accuracy.
• Data interoperability: to produce a good response by the IoT system
for an event sensed by the sensor requires data from multiple
sources in that environment. Combined data need cooperation
between devices; a failure in this situation can result in
incompleteness.
41
Chapter 4 : Data storage and processing in IoT
Data Objects and Data Stream
Data objects are amultidimensional attribute vector within a
continuous, categorical, or mixed attribute space.
Data stream is a huge sequence of data objects.
Data stream processing: The sensor data processing technique
involves data aggregation, data compression, modeling, and online
querying. Queries can be aggregated to avoid high power
consumption. The initial query is executed to produce an
intermediate result that can be processed further. Many queries can
join to get accurate results. The validity of the top received values is
ensured by making mathematical constraints. The quality of the data
can improve by adopting error-tolerant methods.
Stream mining is performed to extract useful information
employing clustering, classification, outlier, and frequent item set
mining.
42
Chapter 4 : Data storage and processing in IoT
Data Reduction
Growth of diverse data sources and transmission data →
redundant at the storage and analysis → network bandwidth
problem, storage, and throughput at the cloud level.
Data reduction is one such solution to overcome the above
problem.
Data reduction techniques:
➢ single-tier where data is reduced at the gateway,
➢ two-tier where reduction methods are employed at gateway
and cloud or sensor and base station.
43
Chapter 4 : Data storage and processing in IoT
IoT data storage
[Link]
44
Chapter 4 : Data storage and processing in IoT
Hot Storage
Hot Storage: optimized for IoT data with real-time query support.
A hot storage database backs the user-facing side of your IoT
application.
Hot storage databases are optimized for performance, so data can
be instantly queried and displayed on dashboards or custom user
interfaces.
Due to the performance requirements of a hot storage database,
storing this type of data can be costly
→ data will only be available in the hot storage database for a
certain amount of time
→ Before data reaches the retention limit and is deleted, it need to
be copied to warm or cold storage.
45
Chapter 4 : Data storage and processing in IoT
Warm Storage
Warm Storage: optimized for large data volumes with generic
query support.
Warm storage databases are optimized for scale, meaning they
can potentially store an indefinite amount of data. The main
difference between warm storage and cold storage is the ability to
easily query the data.
Warm storage databases, often called data lakes or data
warehouses, typically provide some kind of generic query support to
explore the data.
These databases, however, are not optimized for IoT data and
usually don’t offer the powerful aggregations required for time-series
queries.
One of the primary use cases for warm storage is for offline
analytics and AI/ML. 46
Chapter 4 : Data storage and processing in IoT
Cold Storage
Cold Storage: optimized for cost.
Whereas hot storage and warm storage are typically provided
through databases, cold storage is usually implemented as cloud
buckets or file storage.
Removing the database engine drastically reduces the cost of
storage, but sacrifices the ability to quickly and easily query and
explore the data.
The primary purpose of cold storage is for archiving and backups.
.
47
Chapter 4 : Data storage and processing in IoT
IoT Platforms and Operating Systems
There are a vast number of IoT platforms and operating systems
that can integrate many of the abovementioned technologies to
provide IoT services.
C-based IoT operating systems:
• RIOT
• Contiki
48
Chapter 4 : Data storage and processing in IoT
IoT Platforms and Operating Systems
Most IoT platforms are cloud based and provide IoT technologies.
• AWS IoT
• IBM Watson
• ThingWorx
• Bosch IoT Suite
• Xively
• EVRYTHNG
• Kaa
49
Chapter 4 : Data storage and processing in IoT
IoT Platforms and Operating Systems
50
Chapter 4 : Data storage and processing in IoT
IoT technologies for semantics
Semantic interoperability is a collection of technologies that
enable computer systems to interact unambiguously: Sensor Model
Language, Media Types for Sensor Markup Language, RESTful API
Modeling Language, Wolfram Data Drop.
Sensor Model Language (SensorML). A standard model based
on an XML schema to describe sensors and measurement
procedures. SensorML is useful in IoT systems for creating electronic
description sheets for sensor modules and collecting metadata to be
used to discover sensor systems and observe processes. It also
enables sensor networks to be autonomous because of the self-
describing features of SensorML-supported sensors.
51
Chapter 4 : Data storage and processing in IoT
IoT technologies for semantics
Media Types for Sensor Markup Language (SenML). It is a
new, simple model for acquiring sensed data and to control
actuators. It provides seman tics for the data and allows for
additional metadata with links and extensions. This simple model can
be used in many IoT applications. For example, a sensor, such as a
humidity sensor, could use this media type in CoAP to transport the
sensors’ measurements.
52
Chapter 4 : Data storage and processing in IoT
IoT technologies for semantics
IoT Database (IOTDB). A new technology with unlimited
expandability that supports semantics for providing formal definitions
of all necessary items. Unlike the aforementioned technologies,
IOTDB uses JSON dictionaries to manipulate and monitor nodes,
which makes it relatively fast since JSON parsing is always more
efficient than XML parsing. IOTDB is compatible with protocols, such
as CoAP, and MQTT.
53
Chapter 4 : Data storage and processing in IoT
IoT technologies for semantics
RESTful API Modeling Language (RAML): This language is
used to define HTTP-based APIs that represent most of the
principles of Representational State Transfer (REST). Since it is
RESTful, it is more likely to be used in IoT scenarios that are suitable
for CoAP, where the network overhead is negligible.
54
Chapter 4 : Data storage and processing in IoT
IoT technologies for semantics
Wolfram Data Drop: An open service that allows accumulating
data of any type from anywhere (including IoT nodes) to prepare it
semantically for instant computation, querying, analysis,
visualization, or other operations. Computable data (i.e., collections
and time series) are saved in named data bins in the Wolfram Cloud
and are immediately accessible from all other systems/applications.
55
Chapter 4 : Data storage and processing in IoT
Real-time (streaming) data analytic technologies in IoT
56
IoT System design
Thank you for attention!