0% found this document useful (0 votes)
31 views92 pages

Unit 4 Iot

The document outlines the syllabus for a course on the Internet of Things (IoT), focusing on data analytics and services. It discusses the challenges of managing large volumes of structured and unstructured data generated by IoT devices, particularly in industries like aviation. Additionally, it covers various data processing techniques, machine learning applications, and the role of NoSQL databases and the Hadoop ecosystem in handling IoT data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views92 pages

Unit 4 Iot

The document outlines the syllabus for a course on the Internet of Things (IoT), focusing on data analytics and services. It discusses the challenges of managing large volumes of structured and unstructured data generated by IoT devices, particularly in industries like aviation. Additionally, it covers various data processing techniques, machine learning applications, and the role of NoSQL databases and the Hadoop ecosystem in handling IoT data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 92

Internet of Things

(IoT)
Internet of Things(IoT) - Syllabus

UNIT – 4 :- DATA ANALYTICS AND SERVICES

Structured Vs Unstructured Data and Data in Motion Vs Data in


Rest, Role of Machine Learning — No SQL Databases, Hadoop
Ecosystem, Apache Kafka, Apache Spark, Edge Streaming
Analytics, Xively Cloud for IoT, Python Web Application.

Lecture Details:
Fundamentals Of IoT
Branch: CSM
Semester: III-II
Cont..
INTRODUCTION

• This In the world of IoT, the creation of massive amounts of data from sensors
is common and one of the biggest challenges—not only from a transport
perspective but also from a data management standpoint.

• A great example of the deluge of data that can be generated by IoT is found
in the commercial aviation industry and the sensors that are deployed
throughout an aircraft
Cont..
INTRODUCTION

Commercial Jet Engine


Cont..
INTRODUCTION

Example:
• This Modern jet engines, similar to the one shown in Figure may be equipped
with around 5000 sensors.
• Therefore, a twin engine commercial aircraft with these engines operating on
average 8 hours a day will generate over 500 TB of data daily, and this is just
the data from the engines!
• Aircraft today have thousands of other sensors connected to the airframe and
other systems.
• In fact, a single wing of a modern jumbo jet is equipped with 10,000 sensors.
Petabyte (PB) of data per day per commercial airplane.
• Across the world, there are approximately 100,000 commercial flights per
day. The amount of IoT data coming just from the commercial airline business
is overwhelming
Cont..
STRUCTURED VERSUS UNSTRUCTURED DATA
Cont..
STRUCTURED VERSUS UNSTRUCTURED DATA

• Structured data means that the data follows a model or schema that defines
how the data is represented or organized, meaning it fits well with a
traditional relational database management system (RDBMS).

• Structured data can be found in most computing systems and includes


everything from banking transaction and invoices to computer log files and
router configurations.

• IoT sensor data often uses structured values, such as temperature, pressure,
humidity, and so on, which are all sent in a known format.

• Structured data is easily formatted, stored, queried, and processed; for these
reasons, it has been the core type of data used for making business decisions.
Because of the highly organizational format of structured data, a wide array of
data analytics tools are readily available for processing this type of data.
Cont..
STRUCTURED VERSUS UNSTRUCTURED DATA

• Unstructured data lacks a logical schema for understanding and decoding the
data through traditional programming means. Examples of this data type
include text, speech, images, and video.

• Data analytics methods that can be applied to unstructured data, such as


cognitive computing and machine learning.

• With machine learning applications, such as natural language processing


(NLP), you can decode speech.

• With image/facial recognition applications, you can extract critical information


from still images and video.
Cont..
STRUCTURED VERSUS UNSTRUCTURED DATA

• Smart objects in IoT networks generate both structured and unstructured


data.

• Structured data is more easily managed and processed due to its well-defined
organization.

• On the other hand, unstructured data can be harder to deal with and typically
requires very different analytics tools for processing the data.

• Being familiar with both of these data classifications is important because


knowing which data classification you are working with makes integrating with
the appropriate data analytics solution much easier.
Cont..
DATA IN MOTION VERSUS DATA AT REST

• In most networks, data in IoT networks is either in transit (“data in motion”) or


being held or stored (“data at rest”).

• Examples of data in motion include traditional client/server exchanges, such


as web browsing and file transfers, and email.

• Data saved to a hard drive, storage array, or USB drive is data at rest.

• From an IoT perspective, the data from smart objects is considered data in
motion as it passes through the network enroute to its final destination. This is
often processed at the edge, using fog computing.
Cont..
DATA IN MOTION VERSUS DATA AT REST

• When data is processed at the edge, it may be filtered and deleted or


forwarded on for further processing and possible storage at a fog node or in
the data center. Data does not come to rest at the edge.

• When data arrives at the data center, it is possible to process it in real-time,


just like at the edge, while it is still in motion.

• Tools with this sort of capability, such as Spark, Storm, and Flink, are relatively
nascent compared to the tools for analyzing stored data.
Cont..
IOT DATA ANALYTICS OVERVIEW
Cont..
IOT DATA ANALYTICS OVERVIEW

Descriptive:
• Descriptive data analysis tells you what is happening, either now or in the
past.

• For example, a thermometer in a truck engine reports temperature values


every second.

• From a descriptive analysis perspective, you can pull this data at any moment
to gain insight into the current operating condition of the truck engine.

• If the temperature value is too high, then there may be a cooling problem or
the engine may be experiencing too much load.
Cont..
IOT DATA ANALYTICS OVERVIEW

Diagnostic:
• When you are interested in the “why,” diagnostic data analysis can provide the
answer.

• Continuing with the example of the temperature sensor in the truck engine, you
might wonder why the truck engine failed.

• Diagnostic analysis might show that the temperature of the engine was too
high, and the engine overheated.

• Applying diagnostic analysis across the data generated by a wide range of


smart objects can provide a clear picture of why a problem or an event
occurred.
Cont..
IOT DATA ANALYTICS OVERVIEW

Predictive:
• Predictive analysis aims to foretell problems or issues before they occur.

• For example, with historical values of temperatures for the truck engine,
predictive analysis could provide an estimate on the remaining life of certain
components in the engine.

• These components could then be proactively replaced before failure occurs.

• Or perhaps if temperature values of the truck engine start to rise slowly over
time, this could indicate the need for an oil change or some other sort of engine
cooling maintenance.
Cont..
IOT DATA ANALYTICS OVERVIEW

Prescriptive:
• Prescriptive analysis goes a step beyond predictive and recommends solutions
for upcoming problems.

• A prescriptive analysis of the temperature data from a truck engine might


calculate various alternatives to cost-effectively maintain our truck.

• These calculations could range from the cost necessary for more frequent oil
changes Cooling maintenance to installing new cooling equipment on the
engine Upgrading to a lease on a model with a more powerful engine.

• Prescriptive analysis looks at a variety of factors and makes the appropriate


recommendation.
Cont..
IOT DATA ANALYTICS OVERVIEW

Prescriptive:
• Prescriptive analysis goes a step beyond predictive and recommends solutions
for upcoming problems.
Cont..
IOT DATA ANALYTICS CHALLENGES

Scaling problems:
• Due to the large number of smart objects in most IoT networks that
continually send data, relational databases can grow incredibly large very
quickly. This can result in performance issues that can be costly to resolve,
often requiring more hardware and architecture changes.
Volatility of data:
• With relational databases, it is critical that the schema be designed correctly
from the beginning. Changing it later can slow or stop the database from
operating. Due to the lack of flexibility, revisions to the schema must be kept
at a minimum.
• IoT data, however, is volatile in the sense that the data model is likely to
change and evolve over time. A dynamic schema is often required so that
data model changes can be made daily or even hourly.
Cont..
MACHINE LEARNING

• One of the core subjects in IoT is how to makes sense of the data that is
generated. Because much of this data can appear incomprehensible to the
naked eye, specialized tools and algorithms are needed to find the data
relationships that will lead to new business insights.

• This brings us to the subject of machine learning (ML).

• ML is indeed central to IoT. Data collected by smart objects needs to be


analyzed, and intelligent actions need to be taken based on these analyses.

• Performing this kind of operation manually is almost impossible (or very, very
slow and inefficient).

• Machines are needed to process information fast and react instantly when
thresholds are met.
Cont..
MACHINE LEARNING

• One of Machine learning is, in fact, part of a larger set of technologies


commonly grouped under the term artificial intelligence (AI).

• This term used to make science fiction amateurs dream of biped robots and
conscious machines, or of a Matrix-like world where machines would enslave
humankind.

• AI includes any technology that allows a computing system to mimic human


intelligence using any technique, from very advanced logic to basic “if-then-
else” decision loops.
Cont..
MACHINE LEARNING

• A simple example is an app that can help you find your parked car.
Simple static rule set

• In more complex cases, static rules cannot be simply inserted into the
program because they require parameters that can change or that are
imperfectly understood.

• A typical example is a dictation program that runs on a computer.

• The program is configured to recognize the audio pattern of each word in a


dictionary, but it does not know your voice’s accent, tone, speed, and so on.
Cont..
MACHINE LEARNING

• You need to record a set of predetermined sentences to help the tool match
well-known words to the sounds you make when you say the words.

• This process is called machine learning. ML is concerned with any process


where the computer needs to receive a set of data that is processed to help
perform a task with more efficiency.
Cont..
MACHINE LEARNING
Cont..
MACHINE LEARNING
Cont..
MACHINE LEARNING
Cont..
MACHINE LEARNING
Cont..
MACHINE LEARNING

Neural networks
• ML methods that mimic the way the human brain works.

• When you look at a human figure, multiple zones of your brain are activated
to recognize colors, movements, facial expressions, and so on.

• Your brain combines these elements to conclude that the shape you are
seeing is human. Neural networks mimic the same logic
Cont..
MACHINE LEARNING
Cont..
INTRODUCTION TO NOSQL DATABASES

• A database Management System provides the mechanism to store and


retrieve the data.

• There are different kinds of database management systems:

1.RDBMS (Relational Database Management Systems)


2.OLAP (Online Analytical Processing)
3.NoSQL (Not only SQL)
Cont..
INTRODUCTION TO NOSQL DATABASES

What is a NoSQL database?


• NoSQL databases are different relational databases like MQSQL.

• In relational database you need to create the table, define schema, set the
data types of fields etc., before you can actually insert the data.

• In NoSQL you don’t have to worry about that, you can insert, update on the
fly.
Cont..
INTRODUCTION TO NOSQL DATABASES

Limitations of Relational databases:


• 1. In relational database we need to define structure and schema of data first
and then only we can process the data.

• 2. Relational database systems provides consistency and integrity of data by


enforcing ACID properties (Atomicity, Consistency, Isolation and Durability ).
There are some scenarios where this is useful like banking system. However in
most of the other cases these properties are significant performance
overhead and can make your database response very slow.

• 3. Most of the applications store their data in JSON (Javascript Object


Notation) format and RDBMS don’t provide you a better way of performing
operations such as create, insert, update, delete etc., on this data. On the
other hand NoSQL store their data in JSON format, which is compatible with
most of the today’s world application.
Cont..
INTRODUCTION TO NOSQL DATABASES

What are the advantages of NoSQL:


• High scalability.

• High Avalability.

• Here are the types of NoSQL databases and the name of the databases system
that falls in that category. MongoDB falls in the category of NoSQL document
based database.

Key Value Store: Memcached, Redis, Coherance


Tabular: Hbase, Big Table, Accumulo
Document based: MONGODB, CouchDB, Cloudant
Cont..
INTRODUCTION TO NOSQL DATABASES

When to go for NoSQL over relational database:


• When you want to store and retrieve huge amount of data.

• The relationship between the data you store is not that important.

• The data is not structured and changing over time

• Constraints and Joins support is not required at database level

• The data is growing continuously and you need to scale the database regular
to handle the data.
Cont..
HADOOP OVERVIEW

• Apache Hadoop is an open source framework intended to make interaction


with big data easier.

• Hadoop has made its place in the industries and companies that need to work
on large data sets which are sensitive and needs efficient handling.

• Hadoop is a framework that enables processing of large data sets which reside
in the form of clusters.

• Being a framework, Hadoop is made up of several modules that are supported


by a large ecosystem of technologies.
Cont..
HADOOP ECOSYSTEM

• Hadoop Ecosystem is a platform or a suite which provides various services to


solve the big data problems.

• It includes Apache projects and various commercial tools and solutions.There


are four major elements of Hadoop i.e. HDFS, MapReduce, YARN, and Hadoop
Common

• Most of the tools or solutions are used to supplement or support these major
elements.

• All these tools work collectively to provide services such as absorption,


analysis, storage and maintenance of data etc.
Cont..
HADOOP ECOSYSTEM
Cont..
HADOOP ECOSYSTEM
Cont..
HADOOP ECOSYSTEM

Following are the components that collectively form a Hadoop Ecosystem


• HDFS: Hadoop Distributed File System
• YARN : Yet Another Resource Negotiator
• MapReduce : Programming based Data Processing
• Spark: In-Memory data processing
• HIVE, PIG : Query based processing of data services
• HBase: NoSQL Database
• Mahout, Spark MLLib: Machine Learning algorithm libraries
• Solar, Lucene: Searching and Indexing
• Zookeeper: Managing cluster
• Oozie: Job Scheduling
Cont..
HADOOP ECOSYSTEM

Hadoop Distributed File System:


• HDFS is the primary or major component of Hadoop ecosystem and is
responsible for storing large data sets of structured or unstructured data
across various nodes and thereby maintaining the metadata in the form of log
files.
• HDFS splits files into blocks and sends them across various nodes in form of
large clusters. Also in case of a node failure, the system operates and data
transfer takes place between the nodes which are facilitated by HDFS.
• It is highly fault-tolerant and is designed to be deployed on low-cost hardware.
• It provides high throughput access to application data and is suitable for
applications having large datasets.
Cont..
HADOOP ECOSYSTEM

Hadoop Distributed File System:


• HDFS consists of two core components i.e.
Name node
Data Node
• Name Node is the prime node which contains metadata (data about data)
requiring comparatively fewer resources than the data nodes that stores the
actual data. These data nodes are commodity hardware in the distributed
environment. Undoubtedly, making Hadoop cost effective.
• NameNode does not store actual data or dataset. NameNode stores Metadata
i.e. number of blocks, their location, on which Rack, which Datanode the data
is stored and other details. It consists of files and directories.
• HDFS maintains all the coordination between the clusters and hardware, thus
working at the heart of the system.
Cont..
HADOOP ECOSYSTEM

Hadoop Distributed File System:


• It is also known as Slave.
• HDFS Datanode is responsible for storing actual data in HDFS. Datanode
performs read and write operation as per the request of the clients.
• Replica block of Datanode consists of 2 files on the file system. The first file is
for data and second file is for recording the block’s metadata.
• HDFS Metadata includes checksums for data. At startup, each Datanode
connects to its corresponding Namenode and does handshaking.
• Verification of namespace ID and software version of DataNode take place by
handshaking. At the time of mismatch found, DataNode goes down
automatically.
Cont..
HADOOP ECOSYSTEM

MapReduce:
• MapReduce is a parallel programming model for writing distributed
applications devised at Google for efficient processing of large amounts of
data multi−terabyte data sets−on large clusters thousands of nodes of
commodity hardware in a reliable, fault-tolerant manner.
• The MapReduce program runs on Hadoop which is an Apache open-source
framework
• By making the use of distributed and parallel algorithms, MapReduce makes it
possible to carry over the processing’s logic and helps to write applications
which transform big data sets into a manageable one.
Cont..
HADOOP ECOSYSTEM

MapReduce:
• MapReduce makes the use of two functions i.e. Map() and Reduce() whose
task is:
Map() performs sorting and filtering of data and thereby organizing them in
the form of group. Map generates a key-value pair based result which is
later on processed by the Reduce() method.

Reduce(), as the name suggests does the summarization by aggregating the


mapped data. In simple, Reduce() takes the output generated by Map() as
input and combines those tuples into smaller set of tuples.
Cont..
HADOOP ECOSYSTEM

YARN:
• Yet Another Resource Negotiator, as the name implies, YARN is the one who
helps to manage the resources across the clusters.
• In short, it performs scheduling and resource allocation for the Hadoop
System. Consists of three major components i.e.
Resource Manager
Nodes Manager
Application Manager
• Resource manager has the privilege of allocating resources for the applications
in a system whereas Node managers work on the allocation of resources such
as CPU, memory, bandwidth per machine and later on acknowledges the
resource manager.
• Application manager works as an interface between the resource manager
and Node manager and performs negotiations as per the requirement of the
two.
Cont..
HADOOP ECOSYSTEM

YARN Architecture:
Cont..
HADOOP ECOSYSTEM

YARN:
• Apache YARN – “ Yet Another Resource Negotiator, “is the resource
management layer of Hadoop.

• The Yarn was introduced in Hadoop 2.x. Yarn allows different data processing
engines like graph processing, interactive processing, stream processing as
well as batch processing to run and process data stored in HDFS. Apart from
resource management, Yarn also does job Scheduling.
Cont..
HADOOP ECOSYSTEM

HIVE:
• HIVE performs reading and writing of large data sets. However, its query
language is called as HQL (Hive Query Language).
• It is highly scalable as it allows real-time processing and batch processing
both. Also, all the SQL data types are supported by Hive thus, making the
query processing easier.
• Similar to the Query Processing frameworks, HIVE too comes with two
components: JDBC (Java Database Connectivity)Drivers and HIVE Command
Line.
• JDBC, along with ODBC (Open Database Connectivity) drivers work on
establishing the data storage permissions and connection whereas HIVE
Command line helps in the processing of queries.
• Hive do three main functions: data summarization, query, and analysis.
Cont..
HADOOP ECOSYSTEM

HIVE:
Cont..
HADOOP ECOSYSTEM

HIVE:
• Hive use language called HiveQL (HQL), which is similar to SQL. HiveQL
automatically translates SQL-like queries into MapReduce jobs which will
execute on Hadoop.

• Main parts of Hive are:


Metastore – It stores the metadata.
Driver – Manage the lifecycle of a HiveQL statement.
Query compiler – Compiles HiveQL into Directed Acyclic Graph(DAG).
Hive server – Provide a thrift interface and JDBC/ODBC server.
Cont..
HADOOP ECOSYSTEM

PIG:
• Pig was basically developed by Yahoo which works on a pig Latin language,
which is Query based language similar to SQL.
• It is a platform for structuring the data flow, processing and analyzing huge
data sets.
• Pig does the work of executing commands and in the background, all the
activities of MapReduce are taken care of. After the processing, pig stores the
result in HDFS.
• Pig Latin language is specially designed for this framework which runs on Pig
Runtime. Just the way Java runs on the JVM.
• Pig helps to achieve ease of programming and optimization and hence is a
major segment of the Hadoop Ecosystem.
Cont..
HADOOP ECOSYSTEM

Features of PIG:
• Extensibility – For carrying out special purpose processing, users can create
their own function.

• Optimization opportunities – Pig allows the system to optimize automatic


execution. This allows the user to pay attention to semantics instead of
efficiency

• Handles all kinds of data – Pig analyzes both structured as well as


unstructured.
Cont..
HADOOP ECOSYSTEM

Hbase:
• It’s a NoSQL database which supports all kinds of data and thus capable of
handling anything of Hadoop Database. It provides capabilities of Google’s
BigTable, thus able to work on Big Data sets effectively.

• At times where we need to search or retrieve the occurrences of something


small in a huge database, the request must be processed within a short quick
span of time. At such times, HBase comes handy as it gives us a tolerant way
of storing limited data.
Cont..
HADOOP ECOSYSTEM

Hbase:
Cont..
HADOOP ECOSYSTEM

Hbase:
• Two HBase Components namely- HBase Master and RegionServer.
i. HBase Master
It is not part of the actual data storage but negotiates load balancing
across all RegionServer.
Maintain and monitor the Hadoop cluster.
Performs administration (interface for creating, updating and deleting
tables.)
Controls the failover.
HMaster handles DDL operation.
ii. RegionServer
It is the worker node which handles read, writes, updates and delete
requests from clients. Region server process runs on every node in Hadoop
cluster. Region server runs on HDFS DateNode.
Cont..
HADOOP ECOSYSTEM

HCatalog:
• It is a table and storage management layer for Hadoop.

• HCatalog supports different components available in Hadoop ecosystems like


MapReduce, Hive, and Pig to easily read and write data from the cluster.

• Hcatalog is a key component of Hive that enables the user to store their format
and structure.

• By default, HCatalog supports RCFile, CSV, JSON, sequenceFile and ORC file
formats.
Cont..
HADOOP ECOSYSTEM

Avro:
• It is a table and Avro is an open source project that provides data serialization
and data exchange services for Hadoop.

• These services can be used together or independently.

• Big data can exchange programs written in different languages using Avro.
Cont..
HADOOP ECOSYSTEM

Apache Mahout:
• Mahout, allows Machine Learnability to a system or application.

• Machine Learning, as the name suggests helps the system to develop itself
based on some patterns, user/environmental interaction or on the basis of
algorithms.

• It provides various libraries or functionalities such as collaborative filtering,


clustering, and classification which are nothing but concepts of Machine
learning.

• It allows invoking algorithms as per our need with the help of its own libraries.
Cont..
HADOOP ECOSYSTEM

Apache Mahout:
• Once data is stored in Hadoop HDFS, mahout provides the data science tools
to automatically find meaningful patterns in those big data sets.

Algorithms of Mahout are:


• Clustering – Here it takes the item in particular class and organizes them into
naturally occurring groups, such that item belonging to the same group are
similar to each other.
• Collaborative filtering – It mines user behavior and makes product
recommendations (e.g. Amazon recommendations)
• Classifications – It learns from existing categorization and then assigns
unclassified items to the best category.
• Frequent pattern mining – It analyzes items in a group (e.g. items in a
shopping cart or terms in query session) and then identifies which items
typically appear together.
Cont..
HADOOP ECOSYSTEM

Sqoop:
• Sqoop imports data from external sources into related Hadoop ecosystem
components like HDFS, Hbase or Hive.

• It also exports data from Hadoop to other external sources.

• Sqoop works with relational databases such as teradata, Netezza, oracle,


MySQL.
Cont..
HADOOP ECOSYSTEM

Apache Flume:
• Flume efficiently collects, aggregate and moves a large amount of data from
its origin and sending it back to HDFS.

• It is fault tolerant and reliable mechanism.

• This Hadoop Ecosystem component allows the data flow from the source into
Hadoop environment.

• It uses a simple extensible data model that allows for the online analytic
application.

• Using Flume, we can get the data from multiple servers immediately into
hadoop.
Cont..
HADOOP ECOSYSTEM

Apache Flume:
Cont..
HADOOP ECOSYSTEM

Ambari:
• Ambari, another Hadop ecosystem component, is a management platform for
provisioning, managing, monitoring and securing apache Hadoop cluster.

• Hadoop management gets simpler as Ambari provide consistent, secure


platform for operational control.
Cont..
HADOOP ECOSYSTEM

Other Components:
• Apart from all of these, there are some other components too that carry out a
huge task in order to make Hadoop capable of processing large datasets. They
are as follows:

• Solr, Lucene: These are the two services that perform the task of searching and
indexing with the help of some java libraries, especially Lucene is based on Java
which allows spell check mechanism, as well.

• Zookeeper: There was a huge issue of management of coordination and


synchronization among the resources or the components of Hadoop which
resulted in inconsistency, often. Zookeeper overcame all the problems by
performing synchronization, inter-component based communication, grouping,
and maintenance.
Cont..
HADOOP ECOSYSTEM
Cont..
HADOOP ECOSYSTEM

Apache Spark:
• It’s a platform that handles all the process consumptive tasks like batch
processing, interactive or iterative real-time processing, graph conversions, and
visualization, etc.

• It consumes in memory resources hence, thus being faster than the prior in
terms of optimization.

• Spark is best suited for real-time data whereas Hadoop is best suited for
structured data or batch processing, hence both are used in most of the
companies interchangeably.
Cont..

HADOOP ECOSYSTEM

Other Components:
• Oozie: Oozie simply performs the task of a scheduler, thus scheduling jobs and
binding them together as a single unit.

• There is two kinds of jobs .i.e Oozie workflow and Oozie coordinator jobs. Oozie
workflow is the jobs that need to be executed in a sequentially ordered manner
whereas Oozie Coordinator jobs are those that are triggered when some data or
external stimulus is given to it.
Cont..
HADOOP ECOSYSTEM

Other Components:
• Since the initial release of Hadoop in 2011, many projects have been developed
to add incremental functionality to Hadoop and have collectively become known
as the Hadoop ecosystem.

• Hadoop now comprises more than 100 software projects under the Hadoop
umbrella, capable of nearly every element in the data lifecycle, from collection,
to storage, to processing, to analysis and visualization.

• Each of these individual projects is a unique piece of the overall data


management solution.
Cont..
APACHE KAFTKA

Apache Kafka:
• Part of processing real-time events, such as those commonly generated by smart
objects, is having them ingested into a processing engine.

• The process of collecting data from a sensor or log file and preparing it to be
processed and analyzed is typically handled by messaging systems.

• Messaging systems are designed to accept data, or messages, from where the
data is generated and deliver the data to stream-processing engines such as
Spark Streaming or Storm.
Cont..
APACHE KAFTKA

Apache Kafka:
• Apache Kafka is a distributed publisher-subscriber messaging system that is
built to be scalable and fast.

• It is composed of message brokers, where producers write data and consumers


read data from these brokers.

• The data flow from the smart objects (producers), through a topic in Kafka, to
the real-time processing engine.
Cont..
APACHE KAFTKA

Apache Kafka:
Cont..
APACHE KAFTKA

Apache Kafka:
• Due to the distributed nature of Kafka, it can run in a clustered configuration that
can handle many producers and consumers simultaneously and exchanges
information between nodes, allowing topics to be distributed over multiple
nodes.

• The goal of Kafka is to provide a simple way to connect to data sources and allow
consumers to connect to that data in the way they would like.
Cont..
APACHE SPARK

Apache Spark:
• Apache Spark is an in-memory distributed data analytics platform designed to
accelerate processes in the Hadoop ecosystem.
• The “in-memory” characteristic of Spark is what enables it to run jobs very
quickly.
• At each stage of a MapReduce operation, the data is read and written back to
the disk, which means latency is introduced through each disk operation.
• However, with Spark, the processing of this data is moved into high-speed
memory, which has significantly lower latency. This speeds the batch processing
jobs and also allows for near-real-time processing of events.
Cont..
APACHE SPARK

Apache Spark:
• Real-time processing is done by a component of the Apache Spark project called
Spark Streaming.
• Spark Streaming is an extension of Spark Core that is responsible for taking live
streamed data from a messaging system, like Kafka, and dividing it into smaller
micro batches.
• These microbatches are called discretized streams, or DStreams.
• The Spark processing engine is able to operate on these smaller pieces of data,
allowing rapid insights into the data and subsequent actions.
• Due to this “instant feedback” capability, Spark is becoming an important
component in many IoT deployments.
Cont..
APACHE SPARK

Apache Spark:
• Systems that control
safety and security of personnel,
time-sensitive processes in the manufacturing space,
and infrastructure control in traffic management
• all benefit from these real-time streaming capabilities.
Cont..
XIVELY CLOUD

Xively Cloud For IoT :


• Xively (formerly known as Cosm and Pachube) is an Internet of Things (IoT)
platform owned by Google.
• Xively offers product companies a way to connect products, manage connected
devices and the data they produce, and integrate that data into other sy
• A Platform as a Service built for the IoT.
Includes directory services, data services, a trust engine for security, and web-based
management application.
• Xively's messaging is built on a protocol called MQTT.
The API supports REST, WebSockets and MQTT.
Cont..
XIVELY CLOUD

Xively Cloud For IoT :


• Xively is a system for deploying IoT applications on the cloud. It is offered as
PaaS.

• Xively is basically a data collection, management, and distribution infrastructure.


It also provides APIs to connect and develop IoT applications.

• Xively also provides tools to model your connected business.

• This capability is highly beneficial to build any IoT based product.


Cont..
XIVELY CLOUD

Xively Cloud For IoT :


• Other than that Xively also provides Management as well as operational tools.

• Xively can be connected to most of the IoT frameworks and microcontrollers in


the market to create a ‘smart’ project or product.

• Xively Python Libraries can also be used to embed python code as per the Xively
APIs.

• A Xively Web interface is provided to be used for easy implementation of front


end interface.
Cont..
XIVELY CLOUD

Xively Cloud For IoT :


• Xively also comes with multiple language and platform support. We can
implement HTTP protocols, APIs, MQTT.

• This makes the device connectivity a lot easier with Xively cloud.

• All the devices can be connected to Xively Cloud for real-time processing and
archiving to cloud.

• IOT application developers can write the frontend for IoT applications as per
their requirements.

• This helps in convenient management of apps with Xively cloud and other APIs.
Cont..
XIVELY CLOUD

Xively Cloud For IoT :


• Xively is very popular with companies which deal with IoT based device
manufacturing and development.

• Companies using Xively can rely on the secure connectivity of devices as well as
the seamless data management capability.
Cont..
XIVELY CLOUD

How to use Xively?


• Programmers or Developers have to register with Xively to use cloud services.
• After registration and account creation, developers can create different devices
for which he has to create an IoT app. It can be easily done using the templates
provided in the Web Interface of Xively.
• Each connected devices is allocated a unique FEED_ID. It specifies the data
stream and metadata of the connected device.
• Once this is done permissions on the IoT devices are assigned using the available
APIs. The available permissions are Create, Update, Delete and Read.
• One or more bidirectional channels are created after we connect a device with
Xively. Each channel is unique to the device connected.
• Xively cloud is connected with the help of these channels.
• Xively APIs are used by IoT devices to create communication enabled products.
Cont..
EDGE STREAMING ANALYTICS

• Programmers or Edge streaming analytics in IoT (Internet of Things) refers to the


process of analyzing data from IoT devices at or near the location where the data
is generated, instead of sending all the data to centralized cloud servers for
processing.

• This approach provides real-time insights, reduces latency, and minimizes


bandwidth usage by processing and filtering data locally on edge devices (such
as sensors, gateways, or edge servers) before transmitting only relevant or
aggregated data to the cloud.
Cont..
EDGE STREAMING ANALYTICS

Here are some key aspects and benefits of edge streaming analytics in IoT:

• 1.Real-time Insights: By analyzing data locally at the edge, you can make instant
decisions without waiting for data to travel to a distant cloud server. This is
especially crucial for applications that require immediate action, such as
industrial automation, autonomous vehicles, or healthcare monitoring.

• 2.Reduced Latency: Data doesn’t need to travel far, leading to significantly


reduced latency. This is important for time-sensitive applications that need to
respond quickly to changing conditions.

• 3.Bandwidth Optimization: Sending only the most relevant or pre-processed


data to the cloud reduces the volume of data transmitted. This is particularly
valuable in environments with limited or costly network bandwidth, such as
remote locations or devices with limited connectivity.
Cont..
EDGE STREAMING ANALYTICS

Here are some key aspects and benefits of edge streaming analytics in IoT:

• 4.Improved Security and Privacy: By processing data locally, sensitive


information can be kept on-site, reducing the risk of exposure or breaches during
transmission to the cloud. It also gives more control over data access and privacy
policies.

• 5.Scalability: With the increasing number of IoT devices, edge analytics allows
for distributed processing, reducing the load on central servers and improving
the scalability of the system.

• 6.Autonomy: Edge devices can continue functioning even if the network


connection to the cloud is intermittent or lost. They can continue to process and
act on data autonomously, ensuring continuous operation.
Cont..
EDGE STREAMING ANALYTICS

Use Cases of Edge Streaming Analytics in IoT:


• Smart Cities: Real-time traffic monitoring, waste management, and air quality
monitoring.

• Manufacturing: Predictive maintenance, quality control, and real-time


monitoring of equipment performance.

• Healthcare: Remote patient monitoring, real-time diagnostics, and personalized


healthcare.

• Agriculture: Precision farming, monitoring of soil conditions, and crop health in


real-time.

• Energy Management: Smart grids, real-time energy consumption monitoring,


and optimization.
Cont..
PYTHON WEB APPLICATION

• Python web application for IoT is a great way to manage, visualize, and interact
with IoT devices and the data they generate.

• Python is a popular choice for IoT applications due to its simplicity, large number
of libraries, and community support.

• A Python-based web app can provide a dashboard to monitor real-time data,


control devices, and perform analytics.
Cont..
PYTHON WEB APPLICATION

1.Set up IoT Devices and Data Collection:


• IoT Devices: IoT devices (sensors, actuators, etc.) collect data such as
temperature, humidity, motion, or other environmental conditions. These
devices often communicate through protocols like MQTT, HTTP, or CoAP.

• Data Collection: You’ll need a mechanism to gather data from these devices,
either by sending data to a central server, cloud service, or edge device for
processing.
2.Backend Development with Flask or Django:
• Flask: Flask is a lightweight web framework for Python that’s great for building
simple web applications or APIs. It’s well-suited for IoT projects where you might
want to build an API for data collection, control, and visualization.

• You can set up endpoints to receive data from IoT devices (via HTTP POST or
MQTT messages) and store the data in a database.
Cont..
PYTHON WEB APPLICATION

• Django: Django is a more full-featured web framework, ideal if you need more
advanced features like user authentication, admin panels, or complex database
interactions.

• Django can also handle real-time data streams, though you may need additional
tools like Channels or Celery for background tasks.

3.Database for storing IoT data:


You will need a database to store and manage the IoT data. Options include:

•Relational Databases (SQL): MySQL, PostgreSQL (great for structured data like
sensor readings with timestamps).

•NoSQL Databases: MongoDB (if the data is semi-structured, like sensor logs), or
InfluxDB (specifically designed for time-series data).
Cont..
PYTHON WEB APPLICATION

3.Database for storing IoT data:


•SQLAlchemy is a common ORM (Object Relational Mapper) used with Flask and
Django to simplify database interaction.

4. Frontend with HTML, CSS, and JavaScript:


The frontend allows users to interact with the data and devices.

•HTML/CSS: Used for creating the structure and style of your web pages.

•JavaScript: To make the web app interactive, you can use JavaScript libraries like
D3.js or Chart.js to visualize IoT data (e.g., graphs, real-time updates).

•If you need a more dynamic UI, consider using React or Vue.js for the frontend.
Cont..
PYTHON WEB APPLICATION

5.Real-time Data and WebSockets:


•For real-time data streaming (such as receiving sensor data or controlling IoT
devices in real time), you can use WebSockets.

•Flask-SocketIO or Django Channels can be used to manage real-time


communication between the server and client.

•WebSockets allow for bidirectional communication, so your web app can push
updates to the frontend as soon as new data comes in from IoT devices.

6. Integration with IoT Protocols (e.g., MQTT):


• IoT devices often communicate using protocols like MQTT, a lightweight
messaging protocol for small sensors and mobile devices optimized for low-
bandwidth, high-latency, or unreliable networks.
Cont..
PYTHON WEB APPLICATION

6. Integration with IoT Protocols (e.g., MQTT):


• Paho MQTT is a popular Python library for MQTT. Your Python backend can
subscribe to MQTT topics (e.g., temperature readings) and push updates to the
web frontend via WebSockets.

7. Authentication and Authorization:


• If you want to control access to your IoT web application, you can implement
user authentication using Flask-Login (for Flask) or Django’s built-in
authentication system.

• This ensures that only authorized users can access specific data or control IoT
devices.
Cont..
PYTHON WEB APPLICATION

7. Authentication and Authorization:


• Flask serves as the web framework.

• SocketIO allows real-time communication between the server and the client
(browser).

• Paho MQTT subscribes to the MQTT broker to receive sensor data and pushes it
to the client in real-time using WebSockets.

• The client (browser) listens for updates and displays them dynamically.
Cont..
PYTHON WEB APPLICATION

8.Deployment:
• You can deploy this web application to platforms like Heroku, AWS, Google
Cloud, or Azure.

• If you want to host the application locally for a small setup, you can run it on a
Raspberry Pi or similar edge device that collects and processes IoT data.

You might also like