UNIT -4
DATA ACQUIRING AND STORAGE
Data Generation:
Data generates at devices that later on, transfers to the Internet through a gateway. Data
generates as follows:
● Passive devices data:
>Data generate at the device or system.
>Data following the result of interactions.
> A passive device does not have its own power source.
> An external source helps such a device to generate and send data.
>RFID or Debit card.
>A contactless card may or may not have an associated microcontroller ,memory &trans
receivers.
●Event data from Device:
>A device can generate data on an event only once.
>Detection of the traffic or on dark ambient conditions, which signals the event. The event
on darkness communicates a need for lighting up a group of streetlights.
> A system consisting of security cameras can generate data on an event of security breach
or on detection of an intrusion.
> A waste container with associate circuit can generate data in the event of getting it filled
up 90% or above
●Active devices data:
> Data generates at the device or system.
>Data following the result of interactions.
> An active device has its own power source.
> Ex: Active RFID,streetlight sensor or wireless sensor node.
> An active device also has an associated microcontroller, memory and transceiver.
● Device real-time data:
>An ATM generates data and communicates it to the server instantaneously through the
Internet.
>This initiates and enables Online Transactions Processing (OLTP) in real time.
● Event-driven device data:
> A device data can generate on an event only once.
Examples
(i) A device receives command from Controller or Monitor, and then performs
action(s) using an actuator. When the action completes, then the device sends an
acknowledgement;
(ii) When an application seeks the status of a device, then the
device communicates the status.
Data Acquisition:
>Data acquisition means acquiring data from IoT or M2M devices.
> The data communicates after the interactions with a data acquisition system (application).
>The application interacts and communicates with a number of devices for acquiring the
needed data.
>The devices send data on demand or at programmed intervals.
> Data of devices communicate using the network, transport and security layers.
>Device-management software:
*Provisions for device ID or address, activation,configuring (managing device parameters
and settings), registering, deregistering, attaching, and detaching.
Data Validation:
>Data needs validation checks.
> Data validation software do the validation checks on the acquired data.
> Validation software applies logic, rules and semantic annotations.
> The applications or services depend on valid data.
> Then only the analytics, predictions, prescriptions, diagnosis and decisions can be
acceptable.
Data Categoristion for Storage:
1.Data which needs to be repeatedly processed, referenced or audited in future, and
therefore, data alone needs to be stored.
2. Data which needs processing only once, and the results are used at a later time using the
analytics, and both the data and results of processing and analytics are stored.Advantages of
this case are quick visualisation and reports generation without reprocessing. Also the data
is available for reference or auditing in future.
3. Online, real-time or streaming data need to be processed and the results of this
processing and analysis need storage.
Assembly Software for the Events:
➢ A device can generate events. For example, a sensor can generate an event when
temperature reaches a preset value or falls below a threshold.
➢ A logic value sets or resets for an event state.
➢ Logic 1 refers to an event generated but not yet acted upon.
➢ Logic 0 refers to an event generated and acted upon or not yet generated.
➢ A software component in applications can assemble the events (logic value, event ID
and device ID) and can also add Date time stamp.
Data Store:
A data store is a data repository of a set of objects which integrate into the store. Features of
data store are:
● Objects in a data-store are modeled using Classes which are defined by the database
schemas.
● A data store is a general concept. It includes data repositories such as database, relational
database, flat file, spreadsheet, mail server, web server, directory services and VMware
● A data store may be distributed over multiple nodes. Apache Cassandra is an example of
distributed data store.
● A data store may consist of multiple schemas or may consist of data in only one scheme.
Example of only one scheme data store is a relational database.
Data Centre Management:
➢ A data centre is a facility which has multiple banks of computers,servers, large
memory systems, high speed network and Internet connectivity.
➢ Data centres also possess a dust free, heating,ventilation and air conditioning
(HVAC), cooling, humidification and dehumidification equipment, pressurisation
system with a physically highly secure environment.
➢ The manager of data centre is responsible for all technical and IT issues, operations
of computers and servers, data entries, data security, data quality control, network
quality control and the management of the services and applications used for data
processing.
Server Management: Server management means managing services, setup and
maintenance of systems of all types associated with the server. A server needs to serve
around the clock. Server management includes managing the following:
● Short reaction times when the system or network is down
● High security standards by routinely performing system maintenance and updation
● Periodic system updates for state-of-the art setups
● Optimised performance
● Monitoring of all critical services, with SMS and email notifications
● Security of systems and protection
● Maintaining confidentiality and privacy of data
● High degree of security and integrity and effective protection of data, files and
databases at the organisation
● Protection of customer data or enterprise internal documents by attackers which
includes spam mails, unauthorised use of the access to the server, viruses, malwares
and worms
● Strict documentation and audit of all activities.
Spatial Storage:
➢ Spatial storage is storage as spatial database which is optimised to store and later on
receives queries from the applications.
➢ Spatial data refers to data which represents objects defined in a geometric space.
➢ Spatial database can also represent database for 3D objects, topological coverage,
linear networks,triangular irregular networks and other complex structures.
➢ Spatial database has the following features:
● Can perform geometry constructors. For example, creating new geometries.
● Can define a shape using the vertices (points or nodes).
● Can perform observer functions using queries which replies specific spatial information
such as location of the centre of a geometric object.
● Can perform spatial measurements which mean computing distance between
geometries, lengths of lines, areas of polygons and other parameters
● Can change the existing features to new ones using spatial functions and can predicate
spatial relationships between geometries using true or false type queries.
ORGANIZING AND ANALYTICS IN IOT/M2M AND ORGANIZING DATA:
ORGANISING THE DATA:
>Data can be organised in a number of ways.
> For example,objects, files, data store, database, relational database and object oriented
database.
Database:
>A collection of data.The collection is organised into tables.
Relational Database:
>A collection of data into multiple tables which relate to each other through special fields.
> Object Oriented Database (OODB) is a collection of objects, which save the objects in
objected oriented design.
Database Management System:
>Database Management System (DBMS) is a software system, which contains a set of
programs specially designed for creation and management of data stored in a database.
>Database transactions can be performed on a database or relational database.
Atomicity, Data Consistency, Data Isolation and Durability (ACID) Rules
➢ The database transactions must maintain the atomicity, data consistency, data
isolation and durability during transactions. Let us explain these rules using as
follows:
➢ Atomicity: It means a transaction must complete in full, treating it as indivisible.
When a service request completes, then the pending request field should also be
made zero.
➢ Consistency: It means that data after the transactions should remain consistent. For
example,sum of chocolates sent should equal the sums of sold and unsold chocolates
for each flavour after the transactions on the database.
➢ Isolation:It means transactions between tables isolated from each other.
➢ Durability:It means after completion of transactions, the previous transaction cannot
be recalled. Only a new transaction can affect any change.
Distributed Database:
Distributed Database (DDB) is a collection of logically interrelated databases over a
computer network. Distributed DBMS means a software system that manages a distributed
database.
Distributed Query Processing:
Distributed Query Processing means query processing operations in distributed databases
on the same system or networked systems.
SQL: It is a language for data access control,schema creation and modifications. It is also a
language for managing the RDBMS.
NOSQL
NOSQL stands for No-SQL or Not Only SQL that does not integrate with applications that are based
on SQL. NOSQL is used in cloud data store.
Extract, Transform and Load
Extract, Transform and Load or ETL is a system which enables the usage of databases used, especially
the ones stored at a data warehouse.
> Extract means obtaining data from homogeneous or heterogeneous data sources.
> Transform means transforming and storing the data in an appropriate structure or format.
> Load means the structured data load in the final target database or data store or data warehouse.
ANALYTICS
Analytics require the data to be available and accessible. It uses arithmetic and statistical,data
mining and advanced methods, such as machine learning to find new parameters and information
which add value to the data. Analytics enable building models based on selection of right data. Later
the models are tested and used for services and processes.
Analytics has three phases before deriving new facts and providing business intelligence. These are:
1. Descriptive analytics enables deriving the additional value from visualisations and reports.
Descriptive analytics enable the following:
● Actions, such as Online Analytical Processing (OLAP) for the analytics
● Reporting or generating spreadsheets
● Visualisations or dashboard displays of the analysed results
● Creation of indicators, called key performance indicators.
2. Predictive analytics is advanced analytics which enables extraction of new facts and knowledge,
and then predicts or forecasts. Predictive analytics uses algorithms, such as regression analysis,
correlation, optimisation, and multivariate statistics, and techniques such as modelling,
simulation,machine learning, and neural networks. The software tools make the predictive analytics
easy to use and understand. The examples are as follows:
● Predicting trends
● Undertaking preventive maintenance from earlier models of equipment and device failure rates
3. Prescriptive analytics enables derivation of the additional value and undertake better decisions for
new option(s) to maximise the profits.
TRANSACTIONS
➢ A transaction is a collection of operations that form a single logical unit. For example,
a database connect, insertion,append, deletion or modification transactions.
➢ Business transactions are transactions related in some way to a business activity.
Online Transactions and Processing
➢ OLTP means process as soon as data or events generate in real time.
➢ OLTP is used when requirements are availability, speed, concurrency and
recoverability in databases for real-time data or events.
Batch Transactions Processing
➢ Batch processing means a transaction process in batches and in an non-interactive
way.
➢ When one set of transactions finish, the results are stored and a next batch is taken
up.
Streaming Transactions Processing
➢ Processing on streaming data need specialised frameworks.
➢ Examples of frameworks for real-time streaming computation frameworks.
Interactive Transactions Processing
➢ Interactive transactions processing means the transactions which involve continual exchange
of information between the computer and a user.
Event Stream Processing and Complex Event Processing
➢ Event Stream Processing (ESP) is a set of technologies, event processing languages,Complex
Event Processing (CEP), event visualisation, event databases and event-driven middleware.
Apache S4 and Twitter Storm are examples of ESPs. SAP Sybase ESP and EsperTechEsper are
examples of CEPs. ESP and CEP does the following:
● Processes tasks on receiving streams of event data
● Identifies the meaningful pattern from the streams
● Detects relationships between multiple events
● Correlates the events data
● Detects event hierarchies
● Detects aspects such as timing, causality, subscription membership
● Builds and manages the event-driven information systems.
Complex Event Processing
A CEP application in Eclipse are used for capturing a combination of data, timing conditions and
efficiently recognise the corresponding events over data streams.
Business Processes
➢ A business process consists of a series of activities which serves a particular specific
result.
➢ It is used when an enterprise has a number of interrelated processes which serve a
particular result or goal.
Business Intelligence
➢ Business intelligence is a process which enables a business service to extract new
facts and knowledge and then undertake better decisions.
➢ The new facts and knowledge follow from the earlier results of data processing,
aggregation and then analysing those results.
➢ The Architecture of business intelligence and business process can be represented
as:-
Distributed Business Process
➢ Distributed Business Process System (DBPS) is a collection of logically interrelated
business processes in an Enterprise network.
➢ DBPS means a software system that manages the distributed BPs. DBPS features are:
➢ DBPS is a collection of logically related BPs like DDBS.
➢ DBPS exists as cooperation between the BPs in a transparent manner. Transparent
means that each user within the system may access all of the process decisions
within all of the processes as if they were a single business process.
DBPS should possess ‘location independence’ which means the enterprise BI is unaware of
where the BPs are located. It is possible to move the results of analytics and knowledge from
one physical location to another without affecting the user
Integration and Enterprise Systems
➢ Complex applications integration architecture and SOA of cloud-based IoT services,
web services, cloud services and services.
➢ Process orchestration means a number of business processes running in parallel and
a number of processes running in sequence.
➢ The process matrix provides the decision points which indicate which processes
should run in parallel and which in sequence.
➢ An SOA models the number of services and interrelationships. Each service initiates
on receipt of messages from a process or service.
➢ The service discovery and selection software components select the services for
application integration.