50% found this document useful (2 votes)
2K views

Introduction To Stream Data Model

This document introduces the stream data model and stream processing. It discusses that in a stream data model, data arrives continuously in streams and must be processed immediately or it will be lost. It then describes the architecture of a data stream management system, which uses working storage to answer queries on summaries or parts of streams, as it cannot store entire streams. It provides examples of sensor, image, internet, and web traffic data streams and different types of stream queries.

Uploaded by

George Fernandez
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
2K views

Introduction To Stream Data Model

This document introduces the stream data model and stream processing. It discusses that in a stream data model, data arrives continuously in streams and must be processed immediately or it will be lost. It then describes the architecture of a data stream management system, which uses working storage to answer queries on summaries or parts of streams, as it cannot store entire streams. It provides examples of sensor, image, internet, and web traffic data streams and different types of stream queries.

Uploaded by

George Fernandez
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 15

Introduction to Stream Data Model

• In the stream data model data arrives in a


stream or streams.
• If it is not processed immediately or stored,
then the data will be lost forever.
• The data arrival in stream model is so rapid
that it is not feasible to store it all in active
storage (i.e., in a conventional database), and
then interact with it later.
Stream Data model and Architecture
A Data-Stream-Management System
• A stream processor as a kind of data-management
system, whose high-level organization is depicted in
the figure.
• In a data stream management system any number of
streams can enter the system.
• Each stream can provide elements at its own schedule.
They need not have the same data rates or data types,
and the time between elements of one stream need
not be uniform.
• The rate of arrival of stream elements is not under the
control of the system distinguishes stream processing
from the processing of data that goes on within a
database-management system.

A Data-Stream-Management System
• Streams may be archived in a large archival store, but it is
not possible to answer queries from the archival store.
• Examined only under special circumstances using time-
consuming retrieval processes.
• working store, into which summaries or parts of streams
may be placed, and which can be used for answering
queries.
• The working store might be disk, or it might be main
memory, depending on how fast we need to process
queries.
Disadvantage:
 The working store is of sufficiently limited capacity.
 it cannot store all the data from all the streams.
Examples of Stream Sources

• Sensor Data
Question?
Consider a temperature sensor bobbing about in
the ocean, sending back to a base station a
reading of the surface temperature each hour.
Explanation:
Data produced by this sensor is a stream of real
numbers. Entire stream could be kept in main
memory
Explanation
• sensor is given a GPS unit.
• Let it report surface height instead of temperature
then the rate of generation of the data will be
more.
• Surface height varies quite rapidly compared with
temperature.
• sensor send back a reading every tenth of a
second.
• sensor sends a 4-byte real number each time, then
it produces 3.5 megabytes per day.
• Example :To learn ocean behavior
• Deploy a million sensors, each sending back a
stream, at the rate of ten per second.
• One million sensors isn’t very many.
• There would be one for every 150 square miles
of ocean which result in 3.5 terabytes of data
arrival every day.
• concern
 what can be kept in working storage
 what can only be archived.
Image Data

• Satellites often send down to earth streams


consisting of many terabytes of images per
day.
• Surveillance cameras produce images with
lower resolution than satellites.
• Each producing a stream of images at intervals
like one second.
• Example: London is said to have six million
such cameras, each producing a stream
Internet and Web Traffic

• A switching node in the middle of the Internet


receives streams of IP packets from many
inputs and routes them to its outputs.
• Job of Switch is to transmit data and not to
retain it or query it.
• The capability of switch:
 The ability to detect denial-of-service attacks.
 The ability to reroute packets based on
information about congestion in the network.
Internet and Web Traffic
• Web sites receive streams of various types.
For example:
 Google receives several hundred million search
queries per day.
 Yahoo accepts billions of “clicks” per day on its
various sites
Web Traffic
• Many interesting things can be learned from
these streams.
For example
 An increase in queries like “sore throat” enables
us to track the spread of viruses.
 A sudden increase in the click rate for a link
could indicate
 some news connected to that page.
 the link is broken and needs to be repaired.
Stream Queries

• Types
1.Standing queries
2.Adhoc queries
Standing queries

• Standing queries are those queries which are


permanently executing, and produce outputs at
appropriate times. It is stored within the
processor.
• Ex:
• Query to output an alert whenever the
temperature exceeds 25 degrees centigrade in
the stream produced by the ocean-surface-
temperature sensor.
• The query is easily answered, since it depends
only on the most recent stream element.
Adhoc queries
 
• An adhoc query is a question which is asked once
about the current state of a stream or streams.
• Common approach is to store a sliding window of
each stream in the working store.
• A sliding window can be the most recent n
elements of a stream, for some n, or it can be all
the elements that arrived within the last t time
units, e.g., one day.
• The stream-management system must keep the
window fresh, deleting the oldest elements as
new ones come in.
Example
• Web sites often like to report the number of unique users over the past month.
• If each login is considered as a stream element, we can maintain a window that is
all logins in the most recent month.
• Associate the arrival time with each login, to know when it no longer belongs to the
window.
• If we think of the window as a relation Logins (name, time), then it is simple to get
the number of unique users over the past month.
• The corresponding SQL query is
•  SELECT COUNT (DISTINCT (name))
• FROM Logins
• WHERE time >= t;
• Here, t is a constant that represents the time one month before the current time.
• We must be able to maintain the entire stream of logins for the past month in
working storage

You might also like