0% found this document useful (0 votes)
21 views37 pages

Unit 4 IOT

The document discusses the role of data analytics in the Internet of Things (IoT), focusing on structured, unstructured, and semi-structured data, as well as the importance of machine learning in processing this data. It outlines various types of data analysis, including descriptive, diagnostic, predictive, and prescriptive analysis, and highlights the challenges IoT faces with data management. Additionally, it covers the use of NoSQL databases and big data technologies like Hadoop for efficient data handling and analytics in IoT applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views37 pages

Unit 4 IOT

The document discusses the role of data analytics in the Internet of Things (IoT), focusing on structured, unstructured, and semi-structured data, as well as the importance of machine learning in processing this data. It outlines various types of data analysis, including descriptive, diagnostic, predictive, and prescriptive analysis, and highlights the challenges IoT faces with data management. Additionally, it covers the use of NoSQL databases and big data technologies like Hadoop for efficient data handling and analytics in IoT applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

lOMoARcPSD|52203584

Internet of things 4

Internet - Of -Things (Pondicherry University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Training and Placeme -Amalapuram ([email protected])
lOMoARcPSD|52203584

UNIT 4
DATA ANALYTICS AND SUPPORTING SERVICES 9
Structured Vs Unstructured Data and Data in Motion Vs Data in Rest – Role of Machine
Learning – No SQL Databases – Hadoop Ecosystem – Apache Kafka, Apache Spark – Edge
Streaming Analytics and Network Analytics – Xively Cloud for IoT, Python Web
Application Framework – Django – AWS for IoT – System Management with NETCONF-
YANG
4. Introduction to Data analytics for IoT:
Data management is an important concept in IoT because sensors generate massive amount
of data. Jet engines are embedded with sensors which generate 10 GB of data per second.
Modern jet engines generate 500 TB of data on daily basis for a period of 8 hours.Peta byte
of data is also generated from the one airplane. Handling this massive amount of data is
categorized by the data analytics which helps the IoT industries.

4.1 STRUCTURES AND UNSTRUCTURED DATA:


Structured Data:
Structured data defines data which are organized properly .All Relational databases come
under the structured data. Structured data is categorized as quantitative data. Data that fits
neatly in fixed fields and columns. Example: spreadsheets. Examples of structured data
include names, dates, addresses, credit card numbers, stock information, geolocation, and
more. In relational databases we can give input, search, and manipulate structured data fast.
.The programming language used by structured data is called structured query language, also
known as SQL.The data got from the IoT sensors like temperature, pressure, humidity are the
structured data.
Unstructured Data:
Unstructured data is a qualitative data, and it cannot be analyzed using any standard tools or
methods. Examples of unstructured data include text, video, audio, mobile activity, social
media activity, satellite imagery and surveillance. No pre-defined model is available for
unstructured data; it is not organized as relational databases. Data acquired by the all the
business is the unstructured data. Non-relational, or NoSQL databases, are used for managing
unstructured data..More than 80 percent of all data generated by the business process today is
considered to be an unstructured data. Advanced analytics is considered for manipulating the
unstructured data. For example data mining techniques, machine learning techniques and
Natural language Processing are used for the analysis of unstructured data of text, video and
image. For example, data from sensors attached to industrial machinery can alert
manufacturers of strange activity ahead of time. With this information, a repair can be made
before the machine suffers a costly breakdown.

Semi structured data:


This data is a hybrid data which shares the attributes of structured data and unstructured data.
It contains certain schema and consistency.Email, JSON is an example of the unstructured
data.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

The sensors in IoT generate both structured and un structured data. Structured data is is
managed by the well defined scheme, unstructured data is managed by analytical tools.

Fig 4.1 Comparison between Structured and Unstructured Data

Fig 4.2 Structured data vs unstructured data

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

4.2 DATA IN MOTION VS DATA IN REST

Data in IoT is operated as Data in transit (motion) or data at rest. The data acquired from the
IoT sensor objects is the data in motion. The data in motion is utilized by the fog and edge
computing. Data is sent to data center from the fog and edge computing.
Data in motion:
Data is actively moving from one location to another in the data in motion .for example data
is transferred between two networks.
Data at rest: Data at rest is data that is not actively moving from device to device or network
to network such as data stored on a hard drive, laptop, and flash drive.eg: USB.

Protecting sensitive data both in transit and at rest is much needed for modern systems as
intruders find more complicated ways to steal data. Spark, storm and Flink are the tools used
for analysing the stored data. Myriad tools are used for processing the structured data.Hadoop
helps data processing and data storage.

Fig 4.3 digital data examples

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

4.3 IOT Data Analytics overview:


IoT Data from the smart devices are realized and analysed in many ways. Most of the IoT
systems deploy the descriptive analysis and diagnostic analysis. Prescriptive analysis and
predictive analysis are complex to implement but modern business are trending towards
it.There are four data analysis namely
 Descriptive analysis
 Diagnostic analysis
 Predictive analysis
 Prescriptive analysis
Descriptive analysis: This analysis explain s what is happening now, or in the past. You can
gain data about the current working condition.eg: thermometer reading in truck engine.
Diagnostic analysis: This analysis provides us the details of why it has happened. We are
able to achieve the answer for the Why question. If the engine is hot, the analysis may give a
answer why the engine has become hot.
Predictive analysis: This analysis helps to forecast or predict the future outcomes.(what is
likely to happen in future).the data recorded is analysed and the it predicts the outcome.
Temperature recorded in the engine determines the remaining life of the parts in the engine.
Prescriptive analysis: This analysis is beyond the predictive analysis it gives solutions for
the predicted problems (what should i do about it).if engine gets heated up a lot, it provides a
solution of adding a cooling system to the engine.

Descriptive Analysis,Diagnostic Analysis,Predictive Analysis ,Prescriptive Analysis


Big data technologies(Collect,Integrate,Process,Aggregate,Visualize)
DATA(geolocation,sensors ,video,social media)
Fig 4.4 Types of data Analysis results.
Data from Iot sensors undergo a challenge with the relational databases. The challenges
include
 Scaling problems
 Volatility of data
NO SQL database is used for the abovementioned two challenges.Iot also faces lot of
challenges as it involves huge live data streaming from the sensors. Companies like Google,
Microsoft provide cloud services for handling of the huge volume of data generated from the
sensors and they perform analytics .Flexible Net Flow and IPFIX are the network analytics
tools used to monitor the flow of data in network.
4.4 Machine learning:
The data is generated from the IoT sensors are processed by a set of algorithms and tools to
come out with the relationship between the data. This data processing is carried out by
machine learning. Data obtained from the sensors should be analyzed to take proper decision.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Machine learning is important tool for the IoT and data analytics .Machine learning, Deep
learning, Neural Networks and Convolutional networks are the various terms related to the
field of IoT.Self driving vehicles are embedded with self-learning capacity to make
intelligent decisions during driving is due to advancements in the machine learning concepts.

4.4.1 ROLE OF MACHINE LEARNING


ML role is to process the following
1. Predictions.
2. Foreseeing.
3. (over 90%) accuracy
Both Amazon and Netflix make use of machine learning figuring out how to absorb our
dispositions and deliver a superior ordeal to the client.The below figure depicts the Roles and
Responsibilities of ML in IOT and DATA ANALYTICS related to various industries.

Fig: 4.5 various fields integrated with deep intelligence.


Machine learning overview:
Machine learning comes under the roof of artificial intelligence .Artificial intelligence is
framed in such a way it inhibits the characteristics of human brain intelligence .A simple
application to find the car parked is an area is an example of this artificial intelligence.
Machine learning deals with the concept of recording the data then processing the data to
acquire the certain important decision. Machine learning is a wide concept applied in various
fields to analyze the data. This machine learning can be categorized as supervised learning
and unsupervised learning.

Supervised learning:
Supervised learning involves a set of inputs and their corresponding outputs. The system will
be trained on set of inputs called the training set, algorithms work on the training set and it
calculates the difference between the input in the training set and it finally classifies the
different set of classes in the input. This process of classifying independent classes from a
given set of inputs is called Classification. The inputs given will be labeled set of inputs in
classification. The training and testing is done .testing is done with unlabeled data sets. The
classification results in finding correct value. Classification and regression is considered to

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

the important approaches of supervised learning. Classification predicts discrete value and the
regression predicts the continuous value. Greater number of inputs or larger datasets would
result in better training for the systems to obtain good accuracy.

Unsupervised Learning:
The given data is unlabeled and we are able to find different categories of the input it is said
to unsupervised learning. This algorithm finds the different set of groups in the given
unlabeled set of data. This grouping is performed by the K Means clustering. The mean of the
particular input is calculated and all the data with similar kind are grouped together. The
following figure depicts the three different clusters formed from given set of unlabeled data.

Fig 4.6: unsupervised learning (clustering)

Neural Networks:

Neural networks are the extensions of machine learning approach the system are able to
recognize or differentiate and mimic human brain. Network is formed with different set of
layers namely input layer, first layer, higher layer, top layer and the output layer. The
following figure explains how a system is trained to find a dog from a given set of labeled
images of animals, through proper learning to classify them. In Input layer unlabeled image is
sent to the pretrained network. The first layer finds the different shapes and in the higher
layer complex structures are identified (different features like face, arm) and top layer would
identify the different high complex structures (differentiate different animal categories).the
final output layer predicts the animal based on the training .the output unit gives the final
output with high accuracy. A neural network has much research focus. A neural network has
been used with various image processing application. There are different kinds of neural
network namely artificial neural network, convolutional neural network and recurrent neural
network. Deep learning concept was further developed which consists of more number of
layers. The result of one layer is fed into the next layer and the processing is done fast at the
intermediate layers. Numerous applications nowadays rely on deep learning concept and
neural network approaches.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig: How neural networks recognize a dog in photo.

Machine learning and getting intelligence from Big Data

For every possible use cases, it is necessary to determine the proper algorithm usage to obtain
good result when integrated with the IoT Application.ML operation can be handled by two
ways namely local learning and remote learning.
Local learning: if the data is processed in the sensor node or fog node
Remote Learning: data is collected and it is processed in the central cloud server
ML for IoT in major domains: Weather sensor can provide the details of pollution level at
the city. Light embedded on street can change the luminosity based on the local light
conditions of the environment.ML integrated with IoT is deployed on various applications.
The following actions are performed on the sensors embedded on various places.
Monitoring: The sensors are used for monitoring the environment for example the
temperature sensor.ML integrated with this sensor can find the failure condition.
Behavior control: For example, if a system monitors a hot atmosphere in the environment
the ML may be used to control the behavior of the system and inducing the system to
generate fresh cool air to the environment.
Operations optimization: behavior control focus on the corrective operation, This operation
optimization aims at providing increased efficiency and optimized solution.
Self healing, Self-optimizing: The system which identifies the fault by itself and it can find a
corrective action for the fault being identified.
Predictive Analytics: This kind of analytics is done to predict the issue which is going to
arise due to some fault in the system. Predictive analysis is done to improve the safety and
maintenance of the system .sensors which are embedded in machines can predict the faults
which is going to occur through the help of big data analytics

Big Data Analytics Tools and Technology:

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

The data management is done by the big data and hadoop.Hadoop is the backbone of various
big data application. The data is being collected; stored, manipulated and analyzed .The big
data has three Vs
Velocity: Velocity deals how fast the data is collected and processed.Hadoop file system is
used to process the data fast which is collected by the sensor objects
Variety: deals with different kinds of data like structured, unstructured and semi structured
data stored in the hadoop.Data from sensors is the example of structured data, data from the
social media is the unstructured data
Volume: deals with the huge volume of data ranging from giga bytes to exa bytes. Clusters
of servers are used for big deployments.
Types of Data Sources:
Machine data: Data generated from the sensors embedded in IoT systems
Transaction data: Data obtained from transactions
Social data: Data obtained from the social media like face book, twitter (huge amount of data
generated from the social media)
Enterprise data: Data from the enterprises are structured in nature.
Industrial automation and control systems feed their data into relational databases and
historians. Examples of relational databases include oracle and Microsoft SQL.Historian
databases include the time series data recorded from the sensors. There are new technologies
for handling the data management. They are
 Massively Parallel Processing Databases
 NoSQL Databases
 Hadoop
Massively Parallel Processing Databases:
The data from the enterprises are structured data and it is being stored in relation databases.
These group of relational databases together constitute the data warehouses.MPP is a concept
which is built on the top of the relational data warehouses for faster access and reducing the
query time. These systems can process the data in parallel so it results in faster query
process time.MPP is also termed as the analytic databases. Refer the following figure for the
MPP nothing sharing architecture. It possess the master node to which all nodes are
connected .each node has the processor, memory and storage within itself. The whole process
is optimized with the help of SQL. Fast processing is an important aspect of MPP.

Fig 4.7: MPP Shared Nothing Architecture

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

4.5 NO SQL DATABASE

1. NoSQL (“non SQL” or “not only SQL”) databases store data in a format other than
relational tables. The semi structured and unstructured data are processed by NO
SQL. NoSQL database has been characterized in many types which include document
stores, key-value stores, wide-column stores, and graph stores.
Document stores: It involves unstructured data (XML and JSON)
Key value stores: It stores in the form of associative arrays. Key is paired with value.
Wide column stores: stores key value pairs but formatting takes place row by row
Graph stores: it describes the relationship between elements. Well suited for natural
Language processing and social media.

2. A common misconception is that NoSQL databases or non-relational databases don’t


store relationship data well. NoSQL databases can store relationship data—they just
store it differently than relational databases do.

3. The cost of storage is decreased due to the invent of No SQL databases


4. NoSQL databases are used in real-time web applications.
5. The data structures used by NoSQL databases are different from those used by default
in relational databases which makes some operations faster in NoSQL.
6. Most NoSQL stores lack true ACID(Atomicity, Consistency, Isolation, Durability)
transactions but a few databases, such as MarkLogic, Aerospike, FairCom c-treeACE,
Google Spanner (though technically a NewSQL database), Symas LMDB, and
OrientDB have made them central to their designs.

fig 4.8: SQL Databases and No SQL Databases.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Features of NoSQL

Non-relational

 NoSQL databases do not follow the relational model


 Tables are not provided with flat fixed-column records
 Work with self-contained aggregates or BLOBs.

Schema-free

 NoSQL databases are either schema-free or have relaxed schemas


 They don’t require any kind of definition of the schema of the data
 Offers heterogeneous structures of data in the same domain

fig: 4.9 difference between RDBMS and NoSQL DB

Simple API

 easy use of interfaces for storage and querying data provided


 Text-based protocols are used with HTTP REST with JSON
 Web-enabled databases running as internet-facing services

Distributed

 Multiple NoSQL databases can be executed in a distributed fashion


 Offers auto-scaling and fail-over capabilities
 Often ACID concept can be sacrificed for scalability and throughput
 Mostly no synchronous replication between distributed nodes Asynchronous Multi-
Master Replication, peer-to-peer, HDFS Replication

TYPES OF NOSQL DATABASE

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

fig: 4.10 No SQL databases

 Key-value Pair Based


 Column-oriented Graph
 Graphs based
 Document-oriented

Key Value Pair Based

1. Data is stored in key/value pairs. It is designed to handle lots of data.


2. Key-value pair storage databases store data as a hash table where each key is unique, and
the value can be a JSON, BLOB (Binary Large Objects), string, etc.
3. For example, a key-value pair may contain a key like "Website" associated with a value
like "Guru99".

Table: 4.1 key value pair based

Column-based

1. Column-oriented databases work on columns and are based on Big Table paper by
Google. Every column is treated separately.
2. They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN
etc. as the data is readily available in a column.
3. Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence,
4. HBase, hyper table are examples of column based database.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Table 4.2 column family

Document-Oriented:

1. Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value
part is stored as a document. The document is stored in JSON or XML formats.
2. In this diagram on your left you can see we have rows and columns, and in the right, we
have a document database which has a similar structure to JSON.
3. The document type is mostly used for CMS systems, blogging platforms, real-time
analytics & e-commerce applications..
4. Amazon Simple DB,,Mongo DB, are popular Document originated DBMS systems.

Relational Vs. Document

Table 4.3 document oriented

Graph-Based

1. A graph type database stores entities as well the relations amongst those entities. The
entity is stored as a node with the relationship as edges. An edge gives a relationship
between nodes. Every node and edge has a unique identifier.
2. Compared to a relational database where tables are loosely connected, a Graph database is
a multi-relational in nature..
3. Graph base database mostly used for social networks, logistics, and spatial data.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig: 4.11 Graph

WHEN TO USE NOSQL

Some specific cases when NoSQL databases are a better choice than RDBMS include the
following:

 When there is a large need for storing large amounts of unstructured data with
changing schemas.
 When you are interconnected by cloud computing.
 When you need to develop rapidly.
 When a hybrid data environment is available.

4.6 HADOOP ECOSYSTEM- Apache Kafka, Apache Spark

4.6 Hadoop:
Hadoop is recent data management for processing of data.Hadoop system was initially
developed to handle the millions of websites and to enhance the fast search .Hadoop has two
key elements. (HDFS and Map reduce)
Hadoop Distributed File System: system for storing data from different nodes.
Map reduce: Processing engine which divides a big task into small one and it runs in parallel
for faster approach.

• Hadoop is an open-source framework


• It helps to process big data and store data in a distributed environment.
• It is designed to scale up from single servers to thousands of machines, each offering
local computation and storage.
• Hadoop runs applications using the Map Reduce algorithm, where the data is
processed in parallel on different CPU nodes.
• Hadoop framework has developed applications capable of running on clusters of
computers. Hadoop has the ability to perform statistical analysis for a large volume of
data.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig 4.12: Hadoop Eco System.

Fig 4.13: Distributed Hadoop cluster

The above figure depicts the hadoop cluster; it includes the name nodes and the data nodes.

Name Nodes: This Node is important for data ads, deletes reads on the HDFS system.
Namenode takes the request from clients and it gives the requested block to the available
nodes. It gives instruction to the data nodes when to perform the replication.
Data nodes: This node is to store the data .The various blocks are distributed in the data
nodes .The same block is shared to one or more nodes as per their replication policy. This is
done to ensure the data redundancy.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig 4.14: Datanode and Namenode

Fig 4.15: Writing a file to HDFS

The steps involved in writing a file to HDFS


1. Create a DFS
2. Create a Namenode
3. Write to Data Input stream
4. Write a packet to data node
5. Perform acknowledgement of packet to the input stream
6. Close the operation
7. Complete the process with Namenode.

Hadoop Architecture
• Hadoop framework includes following four modules:
• Hadoop Common: These are Java libraries and utilities required by other Hadoop
modules.
• Hadoop YARN: This is a framework for job scheduling and cluster resource
management.
• Hadoop Distributed File System (HDFS™): A distributed file system that provides
high-throughput access to application data.
• HadoopMapReduce: This is YARN-based system for parallel processing of large
data sets.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig 4.16 Hadoop framework

Hadoop-related Apache projects:


• Pig: Provides a high-level data-flow programming language
• Hive: Provides SOL-like access
• Mahout: Provides analytical tools
• HBase: Provides real-time reads and writes

MAPREDUCE:
• Hadoop divides the job into two important tasks. There are two types of tasks:
1. Map tasks (Splits & Mapping)
2. Reduce tasks (Shuffling, Reducing)
The execution process is controlled and controlled by two types of entities called aJob
tracker: Acts like a master (responsible for complete execution of submitted job)
Multiple Task Trackers: Acts like slaves, each of them performing the job

YARN :(Yetanother Resource Negotiator) was developed to improve the working


principle of MapReduce .YARN separates the resource management of the cluster
From the scheduling and monitoring of jobs running on the cluster. YARN has replaced the
work done by the Job Tracker and TaskTracker daemons .YARN is the basic requirement for
Enterprise Hadoop, which provides resource management .It delivers a consistent operations,
security, and data governance for the Hadoop. YARN also extends the power of Hadoop to
include more new technologies found within the data center. Yarn has an advantage of cost
effective, linear-scale storage and processing. It provides ISVs and developers a consistent
framework for writing data access applications that run IN Hadoop.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig 4.17 : Yarn


YARN FEATURES:
 Multi-tenancy: YARN allows multiple access engines .It provides common standard
for batch, interactive and real-time engines that can simultaneously access the same
data set.
 Cluster utilization: YARN’s provides dynamic allocation of cluster resources
 Scalability: Data center processing power can be expanded rapidly.
 Compatibility:Existing MapReduce applications developed for Hadoop 1 can run
YARN without any disruption to existing processes that already work

4.7 APACHE KAFKA:

APACHE KAFKA is a messaging scheme that is working based on the distributed publisher
subscribers. It is a real time event streaming system. It delivers an information or message to
stream processing engine like spark streaming or storm. It has numerous producers and
consumers connected to the Kafka Cluster .producers and consumers exchange information
between them through the kafka cluster. The producers generate the data and the consumers
read the data

Fig 4.18 : apache kafka Data flow

Fig 4.19: Kafka Architecture

 Kafka is used for analysis of real-time streams of data.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

 Kafka utilized in real-time streaming data architectures for providing real-time


analytics.
 Kafka has higher throughput, reliability, and replication characteristics,
 Kafka can work with Flume/Flafka, Spark Streaming, Storm, HBase, Flink, and Spark
for real-time ingesting, analysis and processing of streaming data.
 Many companies who handle a lot of data use Kafka. LinkedIn, Twitter uses it as part
of Storm to provide a stream processing infrastructure. It's also used by other
companies like Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco,
CloudFlare, and Netflix.

 Kafka has operational simplicity.

Fig 4.20: apache kafka (various companies applications)

4.8 APACHE SPARK:Spark was introduced by Apache Software Foundation for speeding
up the Hadoop computational computing software process.
Apache Spark is an open-source distributed processing system used for big data workloads.
This utilizes in-memory caching. The task is performed at a rapid speed the data is transferred
to the high speed memory for the read and write operation. It provides development APIs in
Java, Scale, Python and R.The data is being processed in real time Real time processing is
done by the apache spark project and it is also termed as spark streaming. Live streaming and
messing system activities are performed by the spark core. Spark core takes the data from the
Kafka. The data collected from the Kafka is further divided into small batches or micro
batches. For the purpose of security

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Spark uses Hadoop in two ways – one is storage and second is processing. Since Spark has its
own cluster management computation, it uses Hadoop for storage purpose only.

Fig 4.21 SPARK


There are three ways of Spark deployment as explained below.
 Standalone –it is on the top of HDFS
 Hadoop Yarn –SPARK runs on Yarn
 Spark in MapReduce (SIMR) − Spark in MapReduce is used to launch spark job in
addition to standalone deployment

COMPONENTS OF SPARK

Fig 4.22: Apache spark core


Apache Spark Core
Spark Core is the underlying general execution engine for spark .It provides In-Memory
computing and referencing datasets in external storage systems.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Spark SQL
Spark SQL is a component focuses on a new data abstraction called SchemaRDD.
Spark Streaming
Spark Streaming leverages Spark Core's fast scheduling capability to perform streaming
analytics. It ingests data in mini-batches and performs RDD (Resilient Distributed Datasets)
transformations on those mini-batches of data.
MLlib (Machine Learning Library)
MLlib is a distributed machine learning framework above Spark because of the distributed
memory-based Spark architecture.
GraphX
GraphX is a distributed graph-processing framework on top of Spark. (User defined graphs)

Features of Apache Spark

 Speed –faster processing


 Supports multiple languages you can write the application in different languages
 Advanced Analytics − Spark not only supports ‘Map’ and ‘reduce’. It also supports
SQL queries, Streaming data, Machine learning (ML), and Graph algorithms.

Fig 4.23: spark core

APACHE STORM AND APACHE FLINK:


Apache storm and apache flink is built for the distributed stream processing and it is mainly
deployed for the IoT systems. Storm takes the data from the Kafka and it processes it for the
data streaming.

Lambda Architecture:Lambda architecture is data management system where the


processing is performed at three layers. The three layers are batch layer, stream layer and the
serving layer. Stream layer is responsible for the real time processing of data using apache
stream, storm or flink. Batch layer is responsible for the batch processing and storage
purpose. Serving layer provides the services to the users or consumers.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig 4.24: lambda architecture

4.9 EDGE STREAMING ANALYTICS:

In the era of information technology every company relies on the cloud to store, or
retrieves or market their business with the help of cloud.IoT integrated with the cloud
plays a major role. The data stored in the cloud is analyzed and various decisions are
taken. In automobile racing, various sensors in car produce enormous amount of data per
second, resulting in huge gigabytes of data, similarly weather forecasting involves
numerous data generated from the various sensorsEdge analytics is the collection,
processing, and analysis of data at the edge of a network either at sensor or near it. Retail,
manufacturing, transportation, are generating huge volume of data at the edge of the
network. Edge analytics is data analytics in real-time and on site where data collection is
happening. Edge analytics could be descriptive or diagnostic or predictive analytics.

Comparing Big Data and Edge Analytics:

Big data refers to the unstructured data collected and stored in the cloud. Big data
analytics can be performed on the data centre data in the cloud .it performs batch job
analytics .This edge streaming analytics allows you to analyse and monitor the streaming
of data at the edges to make the prediction decision wisely. In edges analytics the data is
not been analysed in single edge it is analysed in distributed edge nodes, each node has to
communicate with one another. Streaming analytics is being done on the traffic data
which gives information to the driver in taking important decisions due to analytics on the
traffic data. Big data analytics is performed on the data at rest, streaming analytics is
performed on the data in motion.
Key values of edge streaming analytics:
 Reducing data at the edge.
 Analysis and response at the edge
 Time sensitivity

Edge Analytics Core Function:

The data in real time is analysed is by the streaming analytics. This analytics is performed
in three stages
1) Raw input data: data from sensors are given as input
2) Analytics processing unit: It takes the data streams and processes by time windows by
operating it with analytical functions
3) Output streams: output depicts the communication using messaging protocol MQTT

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig 4.25: Edge analytics processing unit

APU has the following functions

Filter: It filters out the irrelevant data and takes only important data needed for processing
that is the work of filter in APU.
Transform: The data extracted is formatted for processing.
Time: As the data flow through real time basis, timing should be framed. If there is a
fluctuation of data at different times .The average value is calculated from the various time
fluctuated data. Average value between the certain time intervals is calculated.
Correlate: The data is obtained from different sensors and finally combined into single
record. For example data comes from the different instruments is combined into single health
record of a patient finally. Combining real time data with the historical data of the patients
leads to know the insights of the current health condition of the patients. This process is
called correlation.
Match patterns: Matching patterns aims at the alerting the system if there is a kind of
emergency. For example the matching pattern may alert a nurse by notification of an alarm.
Machine learning technique is adopted to find the matching patterns of the system
Improve business Intelligence: Edge Analytics improves the business intelligence by
improving the basic operations which in turn gives better efficiency.

Advantages
 Reduce latency of data analytics.
 Scalability of analytics.
 The amount of bandwidth needed to transmit all the data collected by thousands of
these edge devices will also grow exponentially with the increasing number of these
devices
 Edge analytics will reduce overall expenses by minimizing bandwidth.

Fig.4.26 IoT Edge Analytics

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Distributed Analytics systems:


Fog analytics is performed at many nodes and data is correlated from many nodes. Sensors
communicate with the help of MQTT.MQTT message broker sends to the fog processing,
streaming analytics is performed and data is is communicated to the cloud data center.

Fig 4.27: Distributed Analytics throughout the IoT Systems.

4.10 NETWORK ANALYTICS:


In management of IoT systems network analytics plays an important role. It provides a
structure for understanding the network traffic patterns. It analyses the patterns between the
communications of different nodes. This analytics helps to find the abnormal behavior of the
network and it would suggest a way to rectify the problems of the network by providing
optimal solution. Network analytics is the best tool for trouble shooting. The below figure
depicts the traffic analytics performed on the router of the smart grid. This network analytics
is performed to analyze the abnormal traffic in the distributed systems by analyzing the
patterns. The protocols TCP and UDP port numbers are used in the network analytics.
Network analytics helps in maintaining the performance of the system and it enhance the
security of the network.IPV4 and IPV6 traffic is analyzed in the below figure.

Fig 4.28 Net flow example on smart grid

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Network management services are given below

 Network traffic monitoring and profiling: This feature lets you analyze the network
by monitoring the traffic and it rectifies the problem.
 Application traffic monitoring and profiling: This kind of monitoring is done by
the protocols MQTT, CoAP, and DNP3
 Capacity planning: It helps in analyzing the data for certain period of time. This
analysis may help to monitor the traffic growth.
 Security analysis: This kind of analysis is done to monitor the denial of service
attack
 Accounting: For this kind of accounting process the software like cisco Jasper is used
for monitoring the flow of data
 Data warehousing and data mining: Data stored in the warehouse will be analyzed
for multiservice applications

Flexible Net flow Architecture:


FNF is used for networks and it can be deployed in the IoT Infrastructure. This has the
advantage of flexibility, scalability and it can check the progress of network packet .It also
monitors the network behavior.

fig 4.29 : Flexible netflow overview


FNF Components:

 FNF Flow monitors (Net flow cache):


 It is a record with key fields (flow record) and non key fields (flow of attributes). It
monitors the information stored in the cache. It is the flow exporter it sends the
information.
 FNF Flow records: It is a predefined records for monitoring the application of
netflow.security detection, traffic analysis and capacity planning are the information
kept in the flow record. User defined records are also present.
 FNF Exporter: defines the net flow where the data has to be sent (destination
address).The information from the reporter is being sent to the Net flow reporting
collector.
 Flow export timers: timers indicate how many times flow should be exported to the
server
 Net flow export format: Type of flow reporting

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

 Netflow server for collection and reporting: Problems in network the final
destination of the net flow is analyzed by the server.

Flexible Net Flow in Multiservice IoT Networks:

FNF is installed on the routers, this provides the view of multiservice performed in the IoT
network.LoRaWAN cannot perform the net flow analysis.MQTT can do this only with the
help of IoT Broker. Challenges are faced if network does not support flow analytics, or any
additional bandwidth systems has to be reviewed.

4.11 Xively Cloud for IoT


Xively is a system for incorporating IoT applications on the cloud. It is considered to be the
PaaS. Xively is defined by a data collection, management, and distribution infrastructure. It
provides a platform to connect and develop application.Xively comes under the category of
Connected Product Management (CPM) platform. Xively has tools to strengthen your
business. Xively is consisidered to be very beneficial in developing IoT based application or
the IoT product. Xively is being connected to many of the IoT frameworks and
microcontrollers to develop a smart product.

Xively Python Libraries are used to embed python code as per the Xively APIs. A Xively
Web interface is available for creating the front end part. Xively can work with different
programming language platform .HTTP protocols, APIs, MQTT are the protocols used in
Xively. All the devices are connected to Xively Cloud for real-time processing and archiving
to cloud. IOT application developers can write the frontend for IoT applications as per their
requirements. Management of apps is very flexible with Xively cloud and other APIs. Xively
is very popular with companies which deal with IoT based device manufacturing and
development. Companies using Xively has the secure connectivity of devices and good data
management capability.

Xively is an IoT cloud platform that is “an enterprise platform for building, managing, and
deriving business value from connected products”. It is a cloud-based API with an SDK
which simplifies and reduces the time of the development process. It supports several
platforms like

 Android
 Arduino
 Arm mbed
 C
 Java and much more.

How to use Xively?

Step1: Register with Xively to use cloud services. (Programmers or Developers )

Step 2: Developers can create different devices for which he has to create an IoT app.
Templates are provided in the Web Interface of Xively.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Step 3: Unique FEED_IDV is allocated to the connected devices. It specifies the data
stream of the connected device.

Step 4: IoT devices are assigned using the available APIs. The permissions are given to
perform the Create, Update, Delete and Read operation.

Step 5: Bidirectional channels are created after we connect a device with Xively. Each
channel is unique to the device connected.

Step 6 : Xively cloud is connected with the help of these channels.

Step 7: Xively APIs are used by IoT devices to create communication enabled
products.

fig 4.30: Xively cloud services

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig 4.31 : XIVELY BY LOG MEIN

4.12 PYTHON WEB APPLICATION FRAMEWORK

Web Frameworks for Python

A Web framework is a collection of packages or modules. This framework makes developers


to write Web applications (see WebApplications) or services. Framework eliminates the need
of protocols and sockets. The most of the Web frameworks are server-side technology. alt, s
Web frameworks are including AJAX code that helps developers with the programming task
for (client-side) the user's browser. This "plugging in" aspect of Web development is often
seen as being in opposition to the classical distinction between programs and libraries, and
the notion of a "mainloop" dispatching events to application code is very similar to that found
in GUI programming. Frameworks provide support for a number of activities such as sending
requests, producing responses, and storing data.Full-stack frameworks supply components for
each layer in the stack.

The need for Python frameworks

A Python framework is a platform for developing software applications. It provides a


foundation for Programmers can build programs for a specific platform. A framework
may include predefined classes and functions that can be used to process input, manage
hardware devices, and interact with system software.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig 4.32 : Python Web Frameworks

4.13 DJANGO:

DJANGO is a Web framework. Django help us to build better Web apps very fast with less
code. Django is a high-level Python Web framework that encourages rapid development and
proper design. Web applications are fast and results with good performance. Django focuses
on automating part. It follows the DRY (Don't Repeat Yourself) principle. To develop an e-
commerce website, Django is best. The execution of the work would be very fast. It’s
free and open source.

Django Python framework advantages:

 The admin structure is powerful and customizing a product is easy.


 Tools like Django Rest Framework helpful for developing mobile apps.
 Django’s ORM is powerful; it streamlines the process of dealing with data.

Django Python framework cons

 The template system is not considered to be the most powerful.


 Third-party library is used to configure different types of deployment environments.
 Upgrading Django is not easy as it requires a lot changes to be made in the code.

Django architecture: The below fig depicts a simple DJANGO framework with
templates and caching framework.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig 4.33: DJANGO

4.14 AWS FOR IOT

Billions of devices in homes, factories, oil wells, hospitals, cars, and thousands of other
devices are found in many places. Solutions are needed to connect them, and collect, store,
and analyze device data. AWS IoT provides broad functionality, spanning the edge to the
cloud, for building the IoT solutions virtually across a wide range of devices for any kind of
devices. Since AWS IoT integrates with AI services the devices become smarter.AWS IoT
can easily scale based on the requirements of the business. AWS IoT provides good security
features and preventive security policies. These policies respond immediately to all security
related issues.

AWS IoT provides secure, bi-directional communication between Internet-connected devices


such as sensors, actuators, embedded micro-controllers, or smart appliances and the AWS
Cloud. Data is being collected from from multiple devices, stored and analyzed. Users or the
customers can build applications that will enable them to control devices from their phones or
tablets.

AWS IoT Components

AWS IoT consists of the following components:

 Alexa Voice Service (AVS) Integration for AWS IoT

o This service Brings Alexa Voice to any connected device. AVS for AWS
IoT reduces the cost and complexity of integrating Alexa.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

o AVS for AWS IoT enables Alexa built-in functionality on MCUs, such as
the ARM Cortex M class with less than 1 MB embedded RAM. To do so,
AVS offloads memory and compute tasks to a virtual Alexa Built-in device
in the cloud.

Custom Authentication service


o This feature allows us to manage our own authentication and authorization
strategy using a custom authentication service and a Lambda function.
Custom authorizers allow AWS IoT to authenticate your devices and
authorize operations using bearer token authentication and authorization
strategies.

o example, JSON Web Token verification, OAuth provider callout,

Device gateway
o This Feature provides devices to securely communicate with AWS IoT.
Device provisioning service
o This feature Allows us to provision devices using a template that describes
the resources required for your device: a thing, a certificate, and one or
more policies.

o The templates contain variables that are replaced by values in a dictionary


(map).
Device shadow
o A JSON document used to store and retrieve current state information for a
device.
Device Shadow service
o The device can be synchronized with other applications. devices publish
their current state to a shadow for use by other devices.
Group registry
o Several devices are managed at once by categorizing them into groups.
Action performed on a parent group will be applied to its child groups, and
to all the devices its child groups as well.
Jobs service
o Remote operations are set to the devices connected to AWS IoT. For
example, you can define a job that instructs a set of devices to download
and install application reboot, perform remote troubleshooting process.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Message broker
o The MQTT protocol is used for the secure transmission over WebSocket to
publish and subscribe. HTTP REST interface is used to publish.
Registry
o Register your devices and associate up to three custom attributes with each
one.
Rules engine
o Provides message processing and integration with other AWS services.
SQL-based language to select data from message payloads, and then
process and send the data to other services, such as Amazon S3, Amazon
DynamoDB, and AWS Lambda.
Security and Identity service
o Provides shared responsibility for security in the AWS Cloud. The message
broker and rules engine use AWS security features to send data securely to
devices or other AWS services.

AWS IoT solutions

Fig.4.34:Industrial

AWS IoT customers are building industrial IoT applications for predictive quality and
maintenance and to remotely monitor operations.

fig 4.35 :Connected home

AWS IoT customers are building connected home applications for home automation, home
security and monitoring, and home networking.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

fig 4.36 :Commercial

AWS IoT customers are building commercial applications for traffic monitoring, public
safety, and health monitoring.

Fig 4.37: AWS IoT Services

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig 4.38: AWS Analytics services

Fig 4.39 : AWS IOT

4.15 IoT System Management with NETCONF-YANG


NETCONF: Network Configuration Protocol (NETCONF) is a session-based network
management protocol. NETCONF retrieves state or configuration data and manipulates
configuration data on network devices. Network Configuration Protocol, better known as
NETCONF, gives access to the device within a network, defining methods to manipulate its
configuration database, retrieve operational data, and invoke specific operations. YANG
provides the means to define the content carried via NETCONF, for both data and operations.
Together, they help users build network management applications that meet the needs of
network operators.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

fig 4.40 : Definition of NETCONF and YANG.


The motivation behind NETCONF and YANG was, instead of individual devices with
functionalities, to have a network management system that manages the network at the
service level that includes:

 Standardized data model (YANG)


 Network-wide configuration transactions
 Validation and roll-back of configuration
 Centralized backup and restore configuration
Businesses have used SNMP for a long time, but it was being used more for reading device
states than for configuring devices. NETCONF and YANG address the disadvantages of
SNMP and it has added the various functionality in network management, such as:

 Configuration transactions
 Network-wide orchestrated activation
 Network-level validation and roll-back.
 Save and restore configurations

Service provider and enterprise network teams are changing their trends towards a service
oriented approach for managing their networks. IETF’s Network Configuration Protocol
(NETCONF) and YANG, a data modelling language, to help remove the time, cost and
manual steps involved in network element configuration.

NETCONF is the standard for installing, manipulating and deleting configuration of network
devices .YANG is used to model both configuration and state data of network elements.
YANG structures the data definitions into tree structures and provides many modelling
features, including an extensible type system, formal separation of state and configuration
data and a variety of syntactic and semantic constraints. YANG data definitions are contained
in modules and provide a strong set of features for extensibility and reuse.

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

fig 4.41: IoT System Management with NETCONF-YANG

YANG

 YANG is a data modelling language used to model configuration and state data
manipulated by the NETCONF protocol
 YANG modules contain the definitions of the configuration data, state data, RPC calls
and can be formatted according to the notifications.
 A YANG module defines the data exchanged between the NETCONF client and
server.
 A module comprises of a number of 'leaf' nodes which are organized into a
hierarchical tree structure. The 'leaf' nodes are specified using the 'leaf' or 'leaf-list'
constructs. Leaf nodes are organized using 'container' or 'list' constructs. The below
fig depicts the leaf node structure(it is a hierarchical tree structure)
 A YANG module can import definitions from other modules. Constraints can be
defined on the data nodes, e.g. allowed values.
 YANG can model both configuration data and state data using the 'config' statement
 This YANG module is a YANG version of the toaster MIB
 The toaster YANG module begins with the header information followed by identity
declarations which define various bread types.
 The leaf nodes (‘toasterManufacturer’ , ‘toasterModelNumber’ and oasterStatus’) are
defined in the ‘toaster’ container.
 Each leaf node definition has a type and optionally a description and default value.
 The module has two RPC definitions (‘make-toast’ and ‘cancel-toast’).

Downloaded by Training and Placeme -Amalapuram ([email protected])


lOMoARcPSD|52203584

Fig 4.42 : YANG features with leaf nodes

fig 4.42: Yang modules

Downloaded by Training and Placeme -Amalapuram ([email protected])

You might also like