Unit 4 Iot
Unit 4 Iot
(IoT)
Internet of Things(IoT) - Syllabus
Lecture Details:
Fundamentals Of IoT
Branch: CSM
Semester: III-II
Cont..
INTRODUCTION
• This In the world of IoT, the creation of massive amounts of data from sensors
is common and one of the biggest challenges—not only from a transport
perspective but also from a data management standpoint.
• A great example of the deluge of data that can be generated by IoT is found
in the commercial aviation industry and the sensors that are deployed
throughout an aircraft
Cont..
INTRODUCTION
Example:
• This Modern jet engines, similar to the one shown in Figure may be equipped
with around 5000 sensors.
• Therefore, a twin engine commercial aircraft with these engines operating on
average 8 hours a day will generate over 500 TB of data daily, and this is just
the data from the engines!
• Aircraft today have thousands of other sensors connected to the airframe and
other systems.
• In fact, a single wing of a modern jumbo jet is equipped with 10,000 sensors.
Petabyte (PB) of data per day per commercial airplane.
• Across the world, there are approximately 100,000 commercial flights per
day. The amount of IoT data coming just from the commercial airline business
is overwhelming
Cont..
STRUCTURED VERSUS UNSTRUCTURED DATA
Cont..
STRUCTURED VERSUS UNSTRUCTURED DATA
• Structured data means that the data follows a model or schema that defines
how the data is represented or organized, meaning it fits well with a
traditional relational database management system (RDBMS).
• IoT sensor data often uses structured values, such as temperature, pressure,
humidity, and so on, which are all sent in a known format.
• Structured data is easily formatted, stored, queried, and processed; for these
reasons, it has been the core type of data used for making business decisions.
Because of the highly organizational format of structured data, a wide array of
data analytics tools are readily available for processing this type of data.
Cont..
STRUCTURED VERSUS UNSTRUCTURED DATA
• Unstructured data lacks a logical schema for understanding and decoding the
data through traditional programming means. Examples of this data type
include text, speech, images, and video.
• Structured data is more easily managed and processed due to its well-defined
organization.
• On the other hand, unstructured data can be harder to deal with and typically
requires very different analytics tools for processing the data.
• Data saved to a hard drive, storage array, or USB drive is data at rest.
• From an IoT perspective, the data from smart objects is considered data in
motion as it passes through the network enroute to its final destination. This is
often processed at the edge, using fog computing.
Cont..
DATA IN MOTION VERSUS DATA AT REST
• Tools with this sort of capability, such as Spark, Storm, and Flink, are relatively
nascent compared to the tools for analyzing stored data.
Cont..
IOT DATA ANALYTICS OVERVIEW
Cont..
IOT DATA ANALYTICS OVERVIEW
Descriptive:
• Descriptive data analysis tells you what is happening, either now or in the
past.
• From a descriptive analysis perspective, you can pull this data at any moment
to gain insight into the current operating condition of the truck engine.
• If the temperature value is too high, then there may be a cooling problem or
the engine may be experiencing too much load.
Cont..
IOT DATA ANALYTICS OVERVIEW
Diagnostic:
• When you are interested in the “why,” diagnostic data analysis can provide the
answer.
• Continuing with the example of the temperature sensor in the truck engine, you
might wonder why the truck engine failed.
• Diagnostic analysis might show that the temperature of the engine was too
high, and the engine overheated.
Predictive:
• Predictive analysis aims to foretell problems or issues before they occur.
• For example, with historical values of temperatures for the truck engine,
predictive analysis could provide an estimate on the remaining life of certain
components in the engine.
• Or perhaps if temperature values of the truck engine start to rise slowly over
time, this could indicate the need for an oil change or some other sort of engine
cooling maintenance.
Cont..
IOT DATA ANALYTICS OVERVIEW
Prescriptive:
• Prescriptive analysis goes a step beyond predictive and recommends solutions
for upcoming problems.
• These calculations could range from the cost necessary for more frequent oil
changes Cooling maintenance to installing new cooling equipment on the
engine Upgrading to a lease on a model with a more powerful engine.
Prescriptive:
• Prescriptive analysis goes a step beyond predictive and recommends solutions
for upcoming problems.
Cont..
IOT DATA ANALYTICS CHALLENGES
Scaling problems:
• Due to the large number of smart objects in most IoT networks that
continually send data, relational databases can grow incredibly large very
quickly. This can result in performance issues that can be costly to resolve,
often requiring more hardware and architecture changes.
Volatility of data:
• With relational databases, it is critical that the schema be designed correctly
from the beginning. Changing it later can slow or stop the database from
operating. Due to the lack of flexibility, revisions to the schema must be kept
at a minimum.
• IoT data, however, is volatile in the sense that the data model is likely to
change and evolve over time. A dynamic schema is often required so that
data model changes can be made daily or even hourly.
Cont..
MACHINE LEARNING
• One of the core subjects in IoT is how to makes sense of the data that is
generated. Because much of this data can appear incomprehensible to the
naked eye, specialized tools and algorithms are needed to find the data
relationships that will lead to new business insights.
• Performing this kind of operation manually is almost impossible (or very, very
slow and inefficient).
• Machines are needed to process information fast and react instantly when
thresholds are met.
Cont..
MACHINE LEARNING
• This term used to make science fiction amateurs dream of biped robots and
conscious machines, or of a Matrix-like world where machines would enslave
humankind.
• A simple example is an app that can help you find your parked car.
Simple static rule set
• In more complex cases, static rules cannot be simply inserted into the
program because they require parameters that can change or that are
imperfectly understood.
• You need to record a set of predetermined sentences to help the tool match
well-known words to the sounds you make when you say the words.
Neural networks
• ML methods that mimic the way the human brain works.
• When you look at a human figure, multiple zones of your brain are activated
to recognize colors, movements, facial expressions, and so on.
• Your brain combines these elements to conclude that the shape you are
seeing is human. Neural networks mimic the same logic
Cont..
MACHINE LEARNING
Cont..
INTRODUCTION TO NOSQL DATABASES
• In relational database you need to create the table, define schema, set the
data types of fields etc., before you can actually insert the data.
• In NoSQL you don’t have to worry about that, you can insert, update on the
fly.
Cont..
INTRODUCTION TO NOSQL DATABASES
• High Avalability.
• Here are the types of NoSQL databases and the name of the databases system
that falls in that category. MongoDB falls in the category of NoSQL document
based database.
• The relationship between the data you store is not that important.
• The data is growing continuously and you need to scale the database regular
to handle the data.
Cont..
HADOOP OVERVIEW
• Hadoop has made its place in the industries and companies that need to work
on large data sets which are sensitive and needs efficient handling.
• Hadoop is a framework that enables processing of large data sets which reside
in the form of clusters.
• Most of the tools or solutions are used to supplement or support these major
elements.
MapReduce:
• MapReduce is a parallel programming model for writing distributed
applications devised at Google for efficient processing of large amounts of
data multi−terabyte data sets−on large clusters thousands of nodes of
commodity hardware in a reliable, fault-tolerant manner.
• The MapReduce program runs on Hadoop which is an Apache open-source
framework
• By making the use of distributed and parallel algorithms, MapReduce makes it
possible to carry over the processing’s logic and helps to write applications
which transform big data sets into a manageable one.
Cont..
HADOOP ECOSYSTEM
MapReduce:
• MapReduce makes the use of two functions i.e. Map() and Reduce() whose
task is:
Map() performs sorting and filtering of data and thereby organizing them in
the form of group. Map generates a key-value pair based result which is
later on processed by the Reduce() method.
YARN:
• Yet Another Resource Negotiator, as the name implies, YARN is the one who
helps to manage the resources across the clusters.
• In short, it performs scheduling and resource allocation for the Hadoop
System. Consists of three major components i.e.
Resource Manager
Nodes Manager
Application Manager
• Resource manager has the privilege of allocating resources for the applications
in a system whereas Node managers work on the allocation of resources such
as CPU, memory, bandwidth per machine and later on acknowledges the
resource manager.
• Application manager works as an interface between the resource manager
and Node manager and performs negotiations as per the requirement of the
two.
Cont..
HADOOP ECOSYSTEM
YARN Architecture:
Cont..
HADOOP ECOSYSTEM
YARN:
• Apache YARN – “ Yet Another Resource Negotiator, “is the resource
management layer of Hadoop.
• The Yarn was introduced in Hadoop 2.x. Yarn allows different data processing
engines like graph processing, interactive processing, stream processing as
well as batch processing to run and process data stored in HDFS. Apart from
resource management, Yarn also does job Scheduling.
Cont..
HADOOP ECOSYSTEM
HIVE:
• HIVE performs reading and writing of large data sets. However, its query
language is called as HQL (Hive Query Language).
• It is highly scalable as it allows real-time processing and batch processing
both. Also, all the SQL data types are supported by Hive thus, making the
query processing easier.
• Similar to the Query Processing frameworks, HIVE too comes with two
components: JDBC (Java Database Connectivity)Drivers and HIVE Command
Line.
• JDBC, along with ODBC (Open Database Connectivity) drivers work on
establishing the data storage permissions and connection whereas HIVE
Command line helps in the processing of queries.
• Hive do three main functions: data summarization, query, and analysis.
Cont..
HADOOP ECOSYSTEM
HIVE:
Cont..
HADOOP ECOSYSTEM
HIVE:
• Hive use language called HiveQL (HQL), which is similar to SQL. HiveQL
automatically translates SQL-like queries into MapReduce jobs which will
execute on Hadoop.
PIG:
• Pig was basically developed by Yahoo which works on a pig Latin language,
which is Query based language similar to SQL.
• It is a platform for structuring the data flow, processing and analyzing huge
data sets.
• Pig does the work of executing commands and in the background, all the
activities of MapReduce are taken care of. After the processing, pig stores the
result in HDFS.
• Pig Latin language is specially designed for this framework which runs on Pig
Runtime. Just the way Java runs on the JVM.
• Pig helps to achieve ease of programming and optimization and hence is a
major segment of the Hadoop Ecosystem.
Cont..
HADOOP ECOSYSTEM
Features of PIG:
• Extensibility – For carrying out special purpose processing, users can create
their own function.
Hbase:
• It’s a NoSQL database which supports all kinds of data and thus capable of
handling anything of Hadoop Database. It provides capabilities of Google’s
BigTable, thus able to work on Big Data sets effectively.
Hbase:
Cont..
HADOOP ECOSYSTEM
Hbase:
• Two HBase Components namely- HBase Master and RegionServer.
i. HBase Master
It is not part of the actual data storage but negotiates load balancing
across all RegionServer.
Maintain and monitor the Hadoop cluster.
Performs administration (interface for creating, updating and deleting
tables.)
Controls the failover.
HMaster handles DDL operation.
ii. RegionServer
It is the worker node which handles read, writes, updates and delete
requests from clients. Region server process runs on every node in Hadoop
cluster. Region server runs on HDFS DateNode.
Cont..
HADOOP ECOSYSTEM
HCatalog:
• It is a table and storage management layer for Hadoop.
• Hcatalog is a key component of Hive that enables the user to store their format
and structure.
• By default, HCatalog supports RCFile, CSV, JSON, sequenceFile and ORC file
formats.
Cont..
HADOOP ECOSYSTEM
Avro:
• It is a table and Avro is an open source project that provides data serialization
and data exchange services for Hadoop.
• Big data can exchange programs written in different languages using Avro.
Cont..
HADOOP ECOSYSTEM
Apache Mahout:
• Mahout, allows Machine Learnability to a system or application.
• Machine Learning, as the name suggests helps the system to develop itself
based on some patterns, user/environmental interaction or on the basis of
algorithms.
• It allows invoking algorithms as per our need with the help of its own libraries.
Cont..
HADOOP ECOSYSTEM
Apache Mahout:
• Once data is stored in Hadoop HDFS, mahout provides the data science tools
to automatically find meaningful patterns in those big data sets.
Sqoop:
• Sqoop imports data from external sources into related Hadoop ecosystem
components like HDFS, Hbase or Hive.
Apache Flume:
• Flume efficiently collects, aggregate and moves a large amount of data from
its origin and sending it back to HDFS.
• This Hadoop Ecosystem component allows the data flow from the source into
Hadoop environment.
• It uses a simple extensible data model that allows for the online analytic
application.
• Using Flume, we can get the data from multiple servers immediately into
hadoop.
Cont..
HADOOP ECOSYSTEM
Apache Flume:
Cont..
HADOOP ECOSYSTEM
Ambari:
• Ambari, another Hadop ecosystem component, is a management platform for
provisioning, managing, monitoring and securing apache Hadoop cluster.
Other Components:
• Apart from all of these, there are some other components too that carry out a
huge task in order to make Hadoop capable of processing large datasets. They
are as follows:
• Solr, Lucene: These are the two services that perform the task of searching and
indexing with the help of some java libraries, especially Lucene is based on Java
which allows spell check mechanism, as well.
Apache Spark:
• It’s a platform that handles all the process consumptive tasks like batch
processing, interactive or iterative real-time processing, graph conversions, and
visualization, etc.
• It consumes in memory resources hence, thus being faster than the prior in
terms of optimization.
• Spark is best suited for real-time data whereas Hadoop is best suited for
structured data or batch processing, hence both are used in most of the
companies interchangeably.
Cont..
HADOOP ECOSYSTEM
Other Components:
• Oozie: Oozie simply performs the task of a scheduler, thus scheduling jobs and
binding them together as a single unit.
• There is two kinds of jobs .i.e Oozie workflow and Oozie coordinator jobs. Oozie
workflow is the jobs that need to be executed in a sequentially ordered manner
whereas Oozie Coordinator jobs are those that are triggered when some data or
external stimulus is given to it.
Cont..
HADOOP ECOSYSTEM
Other Components:
• Since the initial release of Hadoop in 2011, many projects have been developed
to add incremental functionality to Hadoop and have collectively become known
as the Hadoop ecosystem.
• Hadoop now comprises more than 100 software projects under the Hadoop
umbrella, capable of nearly every element in the data lifecycle, from collection,
to storage, to processing, to analysis and visualization.
Apache Kafka:
• Part of processing real-time events, such as those commonly generated by smart
objects, is having them ingested into a processing engine.
• The process of collecting data from a sensor or log file and preparing it to be
processed and analyzed is typically handled by messaging systems.
• Messaging systems are designed to accept data, or messages, from where the
data is generated and deliver the data to stream-processing engines such as
Spark Streaming or Storm.
Cont..
APACHE KAFTKA
Apache Kafka:
• Apache Kafka is a distributed publisher-subscriber messaging system that is
built to be scalable and fast.
• The data flow from the smart objects (producers), through a topic in Kafka, to
the real-time processing engine.
Cont..
APACHE KAFTKA
Apache Kafka:
Cont..
APACHE KAFTKA
Apache Kafka:
• Due to the distributed nature of Kafka, it can run in a clustered configuration that
can handle many producers and consumers simultaneously and exchanges
information between nodes, allowing topics to be distributed over multiple
nodes.
• The goal of Kafka is to provide a simple way to connect to data sources and allow
consumers to connect to that data in the way they would like.
Cont..
APACHE SPARK
Apache Spark:
• Apache Spark is an in-memory distributed data analytics platform designed to
accelerate processes in the Hadoop ecosystem.
• The “in-memory” characteristic of Spark is what enables it to run jobs very
quickly.
• At each stage of a MapReduce operation, the data is read and written back to
the disk, which means latency is introduced through each disk operation.
• However, with Spark, the processing of this data is moved into high-speed
memory, which has significantly lower latency. This speeds the batch processing
jobs and also allows for near-real-time processing of events.
Cont..
APACHE SPARK
Apache Spark:
• Real-time processing is done by a component of the Apache Spark project called
Spark Streaming.
• Spark Streaming is an extension of Spark Core that is responsible for taking live
streamed data from a messaging system, like Kafka, and dividing it into smaller
micro batches.
• These microbatches are called discretized streams, or DStreams.
• The Spark processing engine is able to operate on these smaller pieces of data,
allowing rapid insights into the data and subsequent actions.
• Due to this “instant feedback” capability, Spark is becoming an important
component in many IoT deployments.
Cont..
APACHE SPARK
Apache Spark:
• Systems that control
safety and security of personnel,
time-sensitive processes in the manufacturing space,
and infrastructure control in traffic management
• all benefit from these real-time streaming capabilities.
Cont..
XIVELY CLOUD
• Xively Python Libraries can also be used to embed python code as per the Xively
APIs.
• This makes the device connectivity a lot easier with Xively cloud.
• All the devices can be connected to Xively Cloud for real-time processing and
archiving to cloud.
• IOT application developers can write the frontend for IoT applications as per
their requirements.
• This helps in convenient management of apps with Xively cloud and other APIs.
Cont..
XIVELY CLOUD
• Companies using Xively can rely on the secure connectivity of devices as well as
the seamless data management capability.
Cont..
XIVELY CLOUD
Here are some key aspects and benefits of edge streaming analytics in IoT:
• 1.Real-time Insights: By analyzing data locally at the edge, you can make instant
decisions without waiting for data to travel to a distant cloud server. This is
especially crucial for applications that require immediate action, such as
industrial automation, autonomous vehicles, or healthcare monitoring.
Here are some key aspects and benefits of edge streaming analytics in IoT:
• 5.Scalability: With the increasing number of IoT devices, edge analytics allows
for distributed processing, reducing the load on central servers and improving
the scalability of the system.
• Python web application for IoT is a great way to manage, visualize, and interact
with IoT devices and the data they generate.
• Python is a popular choice for IoT applications due to its simplicity, large number
of libraries, and community support.
• Data Collection: You’ll need a mechanism to gather data from these devices,
either by sending data to a central server, cloud service, or edge device for
processing.
2.Backend Development with Flask or Django:
• Flask: Flask is a lightweight web framework for Python that’s great for building
simple web applications or APIs. It’s well-suited for IoT projects where you might
want to build an API for data collection, control, and visualization.
• You can set up endpoints to receive data from IoT devices (via HTTP POST or
MQTT messages) and store the data in a database.
Cont..
PYTHON WEB APPLICATION
• Django: Django is a more full-featured web framework, ideal if you need more
advanced features like user authentication, admin panels, or complex database
interactions.
• Django can also handle real-time data streams, though you may need additional
tools like Channels or Celery for background tasks.
•Relational Databases (SQL): MySQL, PostgreSQL (great for structured data like
sensor readings with timestamps).
•NoSQL Databases: MongoDB (if the data is semi-structured, like sensor logs), or
InfluxDB (specifically designed for time-series data).
Cont..
PYTHON WEB APPLICATION
•HTML/CSS: Used for creating the structure and style of your web pages.
•JavaScript: To make the web app interactive, you can use JavaScript libraries like
D3.js or Chart.js to visualize IoT data (e.g., graphs, real-time updates).
•If you need a more dynamic UI, consider using React or Vue.js for the frontend.
Cont..
PYTHON WEB APPLICATION
•WebSockets allow for bidirectional communication, so your web app can push
updates to the frontend as soon as new data comes in from IoT devices.
• This ensures that only authorized users can access specific data or control IoT
devices.
Cont..
PYTHON WEB APPLICATION
• SocketIO allows real-time communication between the server and the client
(browser).
• Paho MQTT subscribes to the MQTT broker to receive sensor data and pushes it
to the client in real-time using WebSockets.
• The client (browser) listens for updates and displays them dynamically.
Cont..
PYTHON WEB APPLICATION
8.Deployment:
• You can deploy this web application to platforms like Heroku, AWS, Google
Cloud, or Azure.
• If you want to host the application locally for a small setup, you can run it on a
Raspberry Pi or similar edge device that collects and processes IoT data.