0% found this document useful (0 votes)

37 views39 pages

Unit 11 - Stream Computing

The document provides an overview of stream computing, highlighting its significance in real-time data processing as compared to traditional batch data processing. It outlines key concepts, terminology, and components involved in streaming data, including IBM's pioneering role with System S and the development of IBM Streams. The document also discusses the evolution of analytics from OLTP and OLAP to Real-Time Analytic Processing (RTAP) and the need for immediate data analysis in various applications.

Uploaded by

risalafr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views39 pages

Unit 11 - Stream Computing

Uploaded by

risalafr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Stream Computing

Data Science Foundations

© Copyright IBM Corporation 2018

Course materials may not be reproduced in whole or in part without the written permission of IBM.
Unit 11 Stream Computing

© Copyright IBM Corp. 2018 11-2

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Unit objectives
• Define streaming data
• Describe IBM as a pioneer in streaming data - with System S
IBM Streams
• Explain streaming data - concepts & terminology
• Compare and contrast batch data vs streaming data
• List and explain streaming components & Streaming Data Engines
(SDEs)

Stream Computing © Copyright IBM Corporation 2018

Unit objectives

© Copyright IBM Corp. 2018 11-3

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Generations of analytics processing

• From Relational Databases (OLTP) Data Warehouses (OLAP)
Real-time Analytic Processing (RTAP) on Streaming Data

Stream Computing © Copyright IBM Corporation 2018

Generations of analytics processing

Graphic from:
• Ebbers, M., Abdel-Gayed, A., Budhi, V. B., Dolot, F., Kamat, V., Picone, R., &
Trevelin, J. (2013). Addressing data volume, velocity, and variety with IBM
InfoSphere Streams V3.0. Raleigh, NC: IBM International Technical Support
Organization (ITSO), p. 26.
• This eBook is available as a free download from:
http://www.redbooks.ibm.com/redbooks/pdfs/sg248108.pdf.
• Note that the current version of the IBM Streams product (February, 2018)
is v4.2.
Extract from this book (pp. 26-27):
Hierarchical databases were invented in the 1960s and still serve as the foundation for
online transaction processing (OLTP) systems for all forms of business and government
that drive trillions of transactions today. Consider a bank as an example. It is likely that
even today in many banks that information is entered into an OLTP system, possibly by
employees or by a web application that captures and stores that data in hierarchical
databases. This information then appears in daily reports and graphical dashboards to
demonstrate the current state of the business and to enable and support appropriate
actions. Analytical processing here is limited to capturing and understanding what
happened.

© Copyright IBM Corp. 2018 11-4

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Relational databases brought with them the concept of data warehousing, which
extended the use of databases from OLTP to online analytic processing (OLAP). By
using our example of the bank, the transactions that are captured by the OLTP system
were stored over time and made available to the various business analysts in the
organization. With OLAP, the analysts can now use the stored data to determine trends
in loan defaults, overdrawn accounts, income growth, and so on. By combining and
enriching the data with the results of their analysis, they could do even more complex
analysis to forecast future economic trends or make recommendations on new
investment areas. Additionally, they can mine the data and look for patterns to help
them be more proactive in predicting potential future problems in areas such as
foreclosures. The business can then analyze the recommendations to decide whether
they must take action. The core value of OLAP is focused on understanding why things
happened to make more informed recommendations.
A key component of OLTP and OLAP is that the data is stored. Now, some new
applications require faster analytics than is possible when you have to wait until the
data is retrieved from storage. To meet the needs of these new dynamic applications,
you must take advantage of the increase in the availability of data before storage,
otherwise known as streaming data. This need is driving the next evolution in analytic
processing called real-time analytic processing (RTAP). RTAP focuses on taking the
proven analytics that are established in OLAP to the next level. Data in motion and
unstructured data might be able to provide actual data where OLAP had to settle for
assumptions and hunches. The speed of RTAP allows for the potential of action in
place of making recommendations.
So, what type of analysis makes sense to do in real time? Key types of RTAP include,
but are not limited to, the following analyses:
• Alerting
• Feedback
• Detecting failures
[These uses are discussed in more detail on pp. 26-27.]

© Copyright IBM Corp. 2018 11-5

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

What is streaming data?

• Applications that require on-the-fly processing, filtering, and analysis
of streaming data
▪ Sensors: environmental, industrial, surveillance video, GPS, …
▪ “Data exhaust”: network/system/webserver log files
▪ High-rate transaction data: financial transactions, call detail records
• Criteria: two or more of the following
▪ Messages are processed in isolation or in limited data windows
▪ Sources include non-traditional data (spatial, imagery, text, …)
▪ Sources vary in connection methods, data rates, and processing
requirements, that present integration challenges
▪ Data rates / volumes require the resources of multiple processing nodes
▪ Analysis and response are needed with sub-millisecond latency
▪ Data rates and volumes are too great for store-and-mine approaches

Stream Computing © Copyright IBM Corporation 2018

What is streaming data?

Reading:
What is Streaming Data?
(excellent article written from the perspective of Amazon’s AWS, but with general
applicability) https://aws.amazon.com/streaming-data/
• Benefits of Streaming Data
• Streaming Data Examples
• Comparison between Batch Process and Stream Processing
• Challenged in Working with Streaming Data

© Copyright IBM Corp. 2018 11-6

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

IBM is a pioneer in streaming analytics

• Book:
▪ Roy, J. (2016). Streaming analytics with IBM Streams: Analyze more, act
faster, and get continuous insights. Indianapolis, IN: Wiley.
▪ Available as a free PDF download from IBM at:
https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?
htmlfid=IMM14184USEN&attachment=IMM14184USEN.PDF
• IBM researchers spent a decade transforming the vision of stream
computing into a product - a new programming language, IBM Streams
Processing Language (SPL), was even built just for streaming systems

Stream Computing © Copyright IBM Corporation 2018

IBM is a pioneer in streaming analytics

References:
• The Invention of Stream Computing
http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/streamcomputing/
• Research articles (2004-2015)
https://researcher.watson.ibm.com/researcher/view_group_pubs.php?grp=2531&
t=1

© Copyright IBM Corp. 2018 11-7

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

IBM System S
▪ System S provides a programming model
and an execution platform for user-
developed applications that ingest, filter,
analyze, and correlate potentially massive
volumes of continuous data streams
▪ Source Adapters = an operator that
connects to a specific type of input (e.g.,
Stock Exchange, Weather, file system, …)
▪ Sink Adapters = an operator that connects
to a specific type of output external to the
streams processing system (RDBMS, , …)
▪ Operator = software processing unit
▪ Stream = flow of tuples from one operator
to the next operator (they do not traverse
operators)

Stream Computing © Copyright IBM Corporation 2018

IBM System S
References:
• Stream Computing Platforms, Applications, and Analytics (Tab 4: System S)
https://researcher.watson.ibm.com/researcher/view_group_subpage.php?id=253
4&t=4
System S: While at IBM, Dr. Ted Codd invented the relational database. In the defining
IBM Research project, it was referred to as System R, which stood for Relational. The
relational database is the foundation for data warehousing that launched the highly
successful Client/Server and On-Demand informational eras. One of the cornerstones
of that success was the capability of OLAP products that are still used in critical
business processes today.
When the IBM Research division again set its sights on developing something to
address the next evolution of analysis (RTAP) for the Smarter Planet evolution, they set
their sights on developing a platform with the same level of world-changing success,
and decided to call their effort System S, which stood for Streams. Like System R,
System S was founded on the promise of a revolutionary change to the analytic
paradigm. The research of the Exploratory Stream Processing Systems team at T.J.
Watson Research Center, which was set on advanced topics in highly scalable stream-
processing applications for the System S project, is the heart and soul of Streams.

© Copyright IBM Corp. 2018 11-8

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Ebbers, M., Abdel-Gayed, A., Budhi, V. B., Dolot, F., Kamat, V., Picone, R., & Trevelin,
J. (2013). Addressing Data Volume, Velocity, and Variety with IBM InfoSphere Streams
V3.0. Armonk, NY: IBM International Technical Support Organization (ITSO), p. 27.

© Copyright IBM Corp. 2018 11-9

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Streaming data - concepts & terminology (1)

Stream Computing © Copyright IBM Corporation 2018

Streaming data - concepts & terminology

Reference:
Stream Computing Platforms, Applications, and Analytics (Overview)
https://researcher.watson.ibm.com/researcher/view_group_subpage.php?id=2534&t=1

© Copyright IBM Corp. 2018 11-10

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Streaming data - concepts & terminology (2)

• Directed Acyclic Graph (DAG) -
a directed acyclic graph (DAG) is a directed graph that contains no
cycles
• Hardware Node - one computer in a cluster of computers that can
work as a single processing unit
• Operator (or Software Node ) - a software process that handles
one unit of the processing that needs to be performed on the data
• Tuple - an ordered set of elements
(equivalent to the concept of a record)
• Source - an operator where tuples are ingested
• Target (or Sink) - an operator where tuples are consumed (and
usually made available outside the streams processing environment)

Stream Computing © Copyright IBM Corporation 2018

Wikipedia:
“In mathematics and computer science, a directed acyclic graph, is a finite directed
graph with no directed cycles. That is, it consists of finitely many vertices and edges,
with each edge directed from one vertex to another, such that there is no way to start at
any vertex v and follow a consistently-directed sequence of edges that eventually loops
back to v again. Equivalently, a DAG is a directed graph that has a topological ordering,
a sequence of the vertices such that every edge is directed from earlier to later in the
sequence.”

© Copyright IBM Corp. 2018 11-11

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Streaming data - concepts & terminology (3)

Types of Operators:
• Source - Reads the input data in the form of streams.
• Sink - Writes the data of the output streams to external storage or
systems.
• Functor - An operator that filters, transforms, and performs
functions on the data of the input stream.
• Sort - Sorts streams data on defined keys.
• Split- Splits the input streams data into multiple output streams.
• Join - Joins the input streams data on defined keys.
• Aggregate - Aggregates streams data on defined keys.
• Barrier - Combines and coordinates streams data.
• Delay - Delays a stream data flow.
• Punctor - Identifies groups of data that should be processed
together.

Stream Computing © Copyright IBM Corporation 2018

The terminology used here is that of IBM Streams, but similar terminology applies to
other Streams Processing Engines (SPE).
References:
• Glossary of terms for IBM Streams (formerly known as IBM InfoSphere Streams)
https://www.ibm.com/support/knowledgecenter/en/SSCRJU_4.2.1/com.ibm.strea
ms.glossary.doc/doc/glossary_streams.html
Examples of other terminology:
• Apache Storm: http://storm.apache.org/releases/current/Concepts.html
o A spout is a source of streams in a topology. Generally spouts will read
tuples from an external source and emit them into the topology.
o All processing in topologies is done in bolts. Bolts can do anything from
filtering, functions, aggregations, joins, talking to databases, and more.

© Copyright IBM Corp. 2018 11-12

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Streaming data - concepts & terminology (4)

• Topologies
▪ Generally defined in terms of a Directed Acyclic Graph (DAG)
• Reliability
▪ Different Stream Processing Engines (SPE) provide reliability differently:
− Some take full responsibility for guaranteeing that no tuple is lost (but that can
slow the engine down)
− Some require that the developer program for reliability using, for example,
redundant processing streams in parallel
• Operators / Software Nodes / Workers
▪ To handle volume and velocity of data, most SPEs divide the work among
multiple software operators (= typically JVM tasks) that work in parallel
▪ The software operators are often spread over multiple Hardware Nodes in
a cluster

Stream Computing © Copyright IBM Corporation 2018

Reliability, using the example of Apache Storm

(http://storm.apache.org/releases/current/Concepts.html):
Storm guarantees that every spout tuple will be fully processed by the topology. It does
this by tracking the tree of tuples triggered by every spout tuple and determining when
that tree of tuples has been successfully completed. Every topology has a "message
timeout" associated with it. If Storm fails to detect that a spout tuple has been
completed within that timeout, then it fails the tuple and replays it later.
To take advantage of Storm's reliability capabilities, you must tell Storm when new
edges in a tuple tree are being created and tell Storm whenever you've finished
processing an individual tuple. These are done using the OutputCollector object that
bolts use to emit tuples. Anchoring is done in the emit method, and you declare that
you're finished with a tuple using the ack method.

© Copyright IBM Corp. 2018 11-13

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Batch processing - classic approach

• With batch processing, the data is stationary (“at rest”) in a database or
an application store
• Operations are performed over all the data that is stored

Stream Computing © Copyright IBM Corporation 2018

Batch processing - classic approach

We are familiar with these operations in classic batch SQL. In the case of batch
processing, all that data is present at the time the SQL statement is processed.
But, with streaming data, the data is constantly flowing and thus other techniques need
to be performed. With streaming data, operations will be performed on windows of
data.

© Copyright IBM Corp. 2018 11-14

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Stream processing - the real-time data approach

• Moving from a multi‐threaded program to one that takes advantage of
multiple nodes is a major rewrite of the application We are now dealing
with multiple processes as well as the communication between them -
aka message queues move data = multiple streams
• Instead of all data, only windows of data are visible for processing

Stream Computing © Copyright IBM Corporation 2018

Stream processing - the real-time data approach

Sometimes a stream processing job needs to do something in regular time intervals,
regardless of how many incoming messages the job is processing. For example, say
you want to report the number of page views per minute. To do this, you increment a
counter every time you see a page view event. Once per minute, you send the current
counter value to an output stream and reset the counter to zero. This is a fixed time
window and is useful for reporting, for instance, sales that occurred during a clock-
hour.
Other methods of windowing of streaming data use a sliding window or a session
window (both illustrated above).
Microsoft Azure offers tumbling, hopping, and sliding windows to perform temporal
operations on streaming data. See Microsoft’s article: Introduction to Stream
Analytics Window functions, https://docs.microsoft.com/en-us/azure/stream-
analytics/stream-analytics-window-functions

© Copyright IBM Corp. 2018 11-15

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Streaming components & Streaming Data Engines

Open Source: Proprietary:
• Hortonworks HDF / NiFi • IBM Streams (full SDE)
• Apache Storm • Amazon Kinesis
• Apache Flink • Microsoft Azure Stream Analytics
• Apache Kafka
• Apache Samza
• Apache Beam (an SDK)
• Apache Spark Streaming

Stream Computing © Copyright IBM Corporation 2018

Streaming components & Streaming Data Engines

To work with streaming analytics is important to understand what are the various
available components are and how they relate. The topic itself is quite complex and
deserving of a full workshop in itself - thus, here, we can only provide an introduction.
A full data pipeline (i.e., Streaming Application) involves:
• Accessing data at its source (“source operator” components) - Kafka can be used
here
• Processing data (serializing data, merging & joining individual streams,
referencing static data from in-memory stores and databases, transforming data,
performing aggregation and analytics, … ) - Storm is a component that is
sometimes used here
• Delivering data to long-term persistence and to on-the-fly visualization
(“sink operators”)
IBM Streams can handle all these operations with using standard and custom-build
operators and is thus a full Streaming Data Engine (SDE), but open source
components can be used to build somewhat equivalent systems.
We are only going to look at HDF / NiFi and IBM Streams (data pipeline) in detail -
and separately; they are bolded in the list above.

© Copyright IBM Corp. 2018 11-16

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Open source references

• http://storm.apache.org/
• http://flink.apache.org/
• http://kafka.apache.org
• http://samza.apache.org
Apache Samza is a distributed stream processing framework. It uses Apache
Kafka for messaging, and Apache Hadoop YARN to provide fault tolerance,
processor isolation, security, and resource management
• https://beam.apache.org/
Apache Beam is an advanced unified programming model that implement batch
and streaming data processing jobs that run on any execution engine
• https://spark.apache.org/streaming/
Spark Streaming makes it easy to build scalable fault-tolerant streaming
applications
• Introduction to Spark Streaming (Hortonworks Tutorial)
https://hortonworks.com/hadoop-tutorial/introduction-spark-streaming/
Proprietary SDE references:
• IBM Streams
https://www.ibm.com/ms-en/marketplace/stream-computing
• Try IBM Streams (basic) for free
https://console.bluemix.net/catalog/services/streaming-analytics
• Amazon Kinesis
https://aws.amazon.com/kinesis/
• Microsoft Azure Stream Analytics
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-
introduction
• Stock-trading analysis and alerts
• Fraud detection, data, and identity protections
• Embedded sensor and actuator analysis
• Web clickstream analytics.

© Copyright IBM Corp. 2018 11-17

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Hortonworks HDF / NiFi

• Hortonworks DataFlow (HDF) provides an only end-to-end platform
that collects, curates, analyzes, and acts on data in real-time, on-
premises or in the cloud
• With version 3.x, HDF has an available a drag-and-drop visual
interface
• HDF is an integrated solution that uses Apache Nifi/MiNifi, Apache
Kafka, Apache Storm, Superset, and Druid components where
appropriate
• The HDF streaming real-time data analytics platform includes Data
Flow Management Systems, Stream Processing, and Enterprise
Services
• The newest additions to HDF include a Schema Repository

Stream Computing © Copyright IBM Corporation 2018

Hortonworks HDF / NiFi

© Copyright IBM Corp. 2018 11-18

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

What is Hortonworks DataFlow (HDF)?

• HDF is a collection of products for data movement, streaming
analytics, and the management and governance of streaming data

Stream Computing © Copyright IBM Corporation 2018

What is Hortonworks DataFlow (HDF)?

Hortonworks DataFlow (HDF) is a collection of products for data movement, streaming
analytics, and the management and governance of streaming data:
• Flow management: NiFi/MiNiFi
• Stream processing: Storm, Kafka, Streaming Analytics Manager
• Enterprise services: Schema Registry, Apache Ranger, Ambari

The Streams Processing element is the only component that competes with
IBM Streams.

There is a more complete competitive presentation on Storm available but we’ll cover
the basics in here.

© Copyright IBM Corp. 2018 11-19

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Components of the HDF Platform

Stream Computing © Copyright IBM Corporation 2018

Components of the HDF Platform

Reference for this graphic: https://hortonworks.com/products/data-platforms/hdf/
Hortonworks provides a 7-part Data-in-Motion webinar series (available on demand) at:
https://hortonworks.com/blog/harness-power-data-motion/

© Copyright IBM Corp. 2018 11-20

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

NiFi & MiNiFi

• NiFi is a disk-based, microbatch ETL tool that can duplicate
the same processing on multiple hosts for scalability
▪ Web-based user interface
▪ Highly configurable / Loss tolerant vs guaranteed delivery / Low latency vs
high throughput
▪ Tracks dataflow from beginning to end
• MiNiFi-a subproject of Apache NiFi-is a complementary data collection
approach that supplements the core tenets of NiFi in dataflow
management, focusing on the collection of data at the source of its
creation
▪ Small size and low resource consumption
▪ Central management of agents
▪ Generation of data provenance (full chain of custody of information)
▪ Integration with NiFi for follow-on dataflow management

Stream Computing © Copyright IBM Corporation 2018

NiFi & MiNiFi

References:
• https://nifi.apache.org
• https://nifi.apache.org/minifi/
NiFi background:
• Originated at the National Security Agency (NSA) - more than eight years of
development as a closed-source product
• Apache incubator November 2014 as part of NSA technology transfer program
• Apache top-level project in July 2015
• Java based, running on a JVM
Wikipedia (“Apache NiFi):
• Apache NiFi (short for NiagaraFiles) is a software project from the Apache
Software Foundation designed to automate the flow of data between software
systems. It is based on the "NiagaraFiles" software previously developed by the
NSA and open-sourced as a part of its technology transfer program in 2014

© Copyright IBM Corp. 2018 11-21

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

• The software design is based on the flow-based programming model and offers
features which prominently include the ability to operate within clusters, security
using TLS encryption, extensibility (users can write their own software to extend
its abilities) and improved usability features like a portal which can be used to
view and modify behavior visually.
NiFi is written in Java and runs within a Java virtual machine running on the server
that hosts it. The main components of Nifi are
• Web Server - the HTTP-based component used to visually control the software
and monitor the
• Flow Controller - serves as the brains of NiFi's behavior and controls the running
of Nifi extensions, scheduling the allocation of resources for this to happen.
• Extensions - various plugins that allow Nifi to interact with different kinds of
systems
• FlowFile repository - used by NiFi to maintain and track status of the currently
active FlowFile, including the information that NiFi is helping move between
systems
• Cluster manager - an instance of NiFi that provides the sole management point
for the cluster; the same data flow runs on all the nodes of the cluster
• Content repository - where data in transit is maintained
• Provenance repository - metadata detailing to the provenance of the data flowing
through the Software development and commercial support is offered by
Hortonworks. The software is fully open source. This software is also sold and
supported by IBM.
MiNiFi is a subproject of NiFi designed to solve the difficulties of managing and
transmitting data feeds to and from the source of origin, often the first/last mile of
digital signal, enabling edge intelligence to adjust flow behavior/bi-directional
communication.
Reference:
• https://hortonworks.com/blog/edge-intelligence-iot-apache-minifi/
• Free eBook (PDF, 52 pp., authored by Hortonworks & Attunity):
http://discover.attunity.com/apache-nifi-for-dummies-en-report-go-c-lp8558.html

© Copyright IBM Corp. 2018 11-22

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Edge Intelligence for IoT with Apache MiNiFi

• Since the first mile of data collection (the far edge), is very distributed
and likely involves a very large number of end devices (ie. IoT), MiNiFi
carries over all the main capabilities of NiFi

Stream Computing © Copyright IBM Corporation 2018

Edge Intelligence for IoT with Apache MiNiFi

Reference: https://hortonworks.com/blog/edge-intelligence-iot-apache-minifi/
MiNiFi is ideal for situations where there are a large number of distributed, connected
devices. For example first mile data collection from routers, security cameras, cable
modems, ATMs, security appliances, point of sale systems, weather detection systems,
fleets of trucks/trains/planes/ships, thermostats, utility or power meters are all good
uses cases for MiNiFi due the distributed nature of their physical locations. MiNIFi then
enables multiple applications. For example, by deploying MiNiFi on point of sale
terminals, retailers can gather data from 100’s or 1000’s or more terminals.
This edge collection role was played often in the past by Flume, but Flume is has been
marked as deprecated. See:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.3/bk_release-
notes/content/deprecated_items.html

© Copyright IBM Corp. 2018 11-23

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

HDP dashboard - example

• Hortonworks provides a full
tutorial - Trucking IoT on HDF -
that covers the core concepts of
Storm and the role it plays in an
environment where real-time,
low-latency and distributed data
processing are important
▪ Continuous IoT data (Kafka)
▪ Real-time stream processing
(Storm)
▪ Visualization (Web application)
• https://hortonworks.com/hadoop-
tutorial/trucking-iot-hdf/

Stream Computing © Copyright IBM Corporation 2018

HDP dashboard - example

This tutorial discuss a real-world use-case and provides an understanding of the role
Storm plays within in HDF.
• https://hortonworks.com/hadoop-tutorial/trucking-iot-hdf/
The use case is a trucking company that dispatches trucks across the country. The
trucks are outfitted with sensors that collect data (IoT, Internet of Things). The data
includes:
• The name of the driver
• The route the truck is using and intends to use
• The speed of the truck
• Event recently occurred (speeding, the truck weaving out of its lane, following too
closely, etc)
Data like this is generated often - perhaps several times per second - and is
streamed back to the company’s servers.
Components of the tutorial are:
• Demonstration
• A dive into the Storm internals and how to build a topology from scratch
• A discussion of how to package the Storm code and deploy onto a cluster

© Copyright IBM Corp. 2018 11-24

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Controlling HDF components from Ambari

Example: Storm

Stream Computing © Copyright IBM Corporation 2018

Controlling HDF components from Ambari

The Ambari interface provides:
• Open source management platform
• Ongoing cluster management and maintenance (cluster size irrelevant) via Web
UI and REST API (cluster operations automation)
• Visualization of cluster health and extended cluster capability by wrapping custom
services under management
• Centralized security setup

© Copyright IBM Corp. 2018 11-25

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

IBM Streams
• IBM Streams is an advanced computing platform that allows user-
developed applications to quickly ingest, analyze, and correlate
information as it arrives from thousands of real-time sources
• The solution can handle very high data throughput rates, up to millions
of events or messages per second
• IBM Streams provides:
▪ Development support - Rich Eclipse-based, visual IDE lets solution
architects visually build applications or use familiar programming languages
like Java, Scala or Python
▪ Rich data connections - Connect with virtually any data source whether
structured, unstructured or streaming, and integrate with Hadoop, Spark
and other data infrastructures
▪ Analysis and visualization - Integrate with business solutions. Built-in
domain analytics like machine learning, natural language, spatial-temporal,
text, acoustic, and more, to create adaptive streams applications

Stream Computing © Copyright IBM Corporation 2018

IBM Streams
References:
• https://www.ibm.com/cloud/streaming-analytics
• Streaming Analytics: Resources
https://www.ibm.com/cloud/streaming-analytics/resources
• Free eBook: Addressing Data Volume, Velocity, and Variety with IBM InfoSphere
Streams V3.0
IBM Redbook series, PDF, 326 pp.
http://www.redbooks.ibm.com/redbooks/pdfs/sg248108.pdf
• Toolkits, Sample, and Tutorials for IBM Streams
https://github.com/IBMStreams
• Forrester Total Economic Impact of IBM Streams:
This study provides readers with a framework to evaluate the potential financial
impact of Streams on their organizations
https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=IMC15131USEN

© Copyright IBM Corp. 2018 11-26

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Streams is based on nearly two decades of effort by the IBM Research team to extend
computing technology to handle advanced analysis of high volumes of data quickly.
How important is their research? Consider how it would help crime investigation to
analyze the output of any video cameras in the area that surrounds the scene of a
crime to identify specific faces of any persons of interest in the crowd and relay that
information to the unit that is responding. Similarly, what a competitive edge it could
provide by analyzing 6 million stock market messages per second and execute trades
with an average trade latency of only 25 microseconds (far faster than a hummingbird
flaps its wings). Think about how much time, money, and resources could be saved by
analyzing test results from chip-manufacturing wafer testers in real time to determine
whether there are defective chips before they leave the line.
System S: While at IBM, Dr. Ted Codd invented the relational database. In the
defining IBM Research project, it was referred to as System R, which stood for
Relational. The relational database is the foundation for data warehousing that
launched the highly successful Client/Server and On-Demand informational eras.
One of the cornerstones of that success was the capability of OLAP products that
are still used in critical business processes today.
When the IBM Research division again set its sights on developing something to
address the next evolution of analysis (RTAP) for the Smarter Planet evolution,
they set their sights on developing a platform with the same level of world-
changing success, and decided to call their effort System S, which stood for
Streams. Like System R, System S was founded on the promise of a
revolutionary change to the analytic paradigm. The research of the Exploratory
Stream Processing Systems team at T.J. Watson Research Center, which was
set on advanced topics in highly scalable stream-processing applications for the
System S project, is the heart and soul of Streams.
Critical intelligence, informed actions, and operational efficiencies that are all available
in real time is the promise of Streams.

© Copyright IBM Corp. 2018 11-27

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Comparison of IBM Streams vs NiFi

IBM Streams NiFi
▪ Stream Data Engine (SDE) ▪ Microbatch engine
▪ Complete cluster support ▪ Master-slave, duplicate processing
▪ C++ engine - performance & scalability ▪ Java engine
▪ Mature & proven product (v4.2) ▪ Evolving product
▪ Memory-based ▪ Disk-based
▪ Streaming analytics ▪ Extract-Tranform-Load (ETL) oriented
▪ Many analytics operators ▪ No analytics Processors
▪ Enterprise data source / sink support ▪ Limited data source / sink support
▪ Web-based, command-line based, and ▪ Web-based monitoring tooling
REST- based monitoring tooling ▪ Web-based development environment
▪ Drag-and-drop development ▪ Needs Storm or Spark to provide the
environment + Streams Processing analytics
Language (SPL)

Comparison of IBM Streams vs NiFi

NiFi itself is not a streaming analytics engine. HDF uses Storm for the analytic
processing.
Apache Storm is a distributed stream processing computation framework written
predominantly in the Clojure programming language. Originally created by Nathan Marz
and team at BackType, the project was open sourced after being acquired by Twitter.
Why use Storm? http://storm.apache.org/
• Apache Storm is a free and open source distributed realtime computation system.
Storm makes it easy to reliably process unbounded streams of data, doing for
realtime processing what Hadoop did for batch processing. Storm is simple, can
be used with any programming language, and is a lot of fun to use!
• Storm has many use cases: realtime analytics, online machine learning,
continuous computation, distributed RPC, ETL, and more. Storm is fast: a
benchmark clocked it at over a million tuples processed per second per node. It is
scalable, fault-tolerant, guarantees your data will be processed, and is easy to set
up and operate.

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

• Storm integrates with the queueing and database technologies you already use.
A Storm topology consumes streams of data and processes those streams in
arbitrarily complex ways, repartitioning the streams between each stage of the
computation however needed.
• The reference provides a tutorial.
Apache Storm with v1.0 works with Apache Trident. Trident is a high-level
abstraction for doing realtime computing on top of Storm. Trident provides
microbatching.
• Trudent allows you to seamlessly intermix high throughput (millions of messages
per second), stateful stream processing with low latency distributed querying. If
you're familiar with high level batch processing tools like Pig or Cascading, the
concepts of Trident will be very familiar – Trident has joins, aggregations,
grouping, functions, and filters. In addition to these, Trident adds primitives for
doing stateful, incremental processing on top of any database or persistence
store. Trident has consistent, exactly-once semantics, so it is easy to reason
about Trident topologies.
• Refer: http://storm.apache.org/releases/current/Trident-tutorial.html
References:
• Apache Storm: Overview
https://hortonworks.com/apache/storm/
• The Future of Apache Storm (YouTube, June 2016)
https://youtu.be/_Q2uzRIkTd8
• Storm or Spark: Choose your real-time weapon
https://www.infoworld.com/article/2854894/application-development/spark-and-
storm-for-real-time-computation.html

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Advantages of IBM Streams & Streams Studio

Streams Studio provides the ability to create stream processing
topologies without programming through a comprehensive set of
capabilities:
• Pre-built Sources (input): Kafka, Event Hubs, HDFS
• Pre-built Processing (operators): Aggregate, Branch, Join, PMML,
Projection Bolt, Rule
• Pre-built Sink (output): Cassandra, Druid, Hive, HBase, HDFS,
JDBC, Kafka
• Notification (email), Open TSDB, Solr
• Pre-built visualization: 30+ business visualizations - and these can
be organized into a dashboard
• Extensible: Ability to add user-defined functions (UDF) and
aggregates (UDA) through jar files

Advantages of IBM Streams & Streams Studio

Being able to create stream processing topologies without programming is a worthwhile
goal. This is something that is possible already with IBM Streams using Streams
Studio.
IBM Streams is a complete, off-the-shelf Streaming Data Engine (SDE) that is ready to
run immediately after installation. In addition, you have all the tools to develop custom
source and sink
PMML stands for "Predictive Model Markup Language." It is the de-facto standard to
represent predictive solutions. A PMML file may contain a myriad of data
transformations (pre- and post-processing) as well as one or more predictive models.
IBM Streams comes with the ability to cross-integrate with IBM SPSS Statistics to
provide PMML capability and work with R, the open source statistical package that
supports PMML.
What is PMML?
https://www.ibm.com/developerworks/library/ba-ind-PMML1/

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Where does IBM Streams fit in the processing cycle?

What if you wanted to add in fraud detection to your processing cycle before authorizing
the transaction and committing it to the database? Fraud detection has to happen in
real-time in order for it to truly be of a benefit. So instead of taking our data and running
it directly into the authorization / OLTP process, we process it as it streaming data
using IBM Streams. The results of this process can be directed to our transactional
system or data warehouse or to a Data Lake or Hadoop system storage (e.g., HDFS).

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Real-time processing to find new insights

Multi-channel customer
sentiment and experience a
analysis

Detect life-threatening
conditions at hospitals in
time to intervene

Predict weather patterns to plan

optimal wind turbine usage, and
optimize capital expenditure on
asset placement

Make risk decisions based on

real-time transactional data

Identify criminals and threats

from disparate video, audio,
and data feeds

Real-time processing to find new insights

The graphic represents some of the situations in which IBM Streams have been applied
to perform real-time analytics on streaming data.
Today organizations are only tapping in to a small fraction of the data that is available to
them. The challenge is figuring out how to analyze ALL of the data, and find insights in
these new and unconventional data types. Imagine if you could analyze the 7 terabytes
of tweets being created each day to figure out what people are saying about your
products, and figure out who the key influencers are within your target demographics.
Can you imagine being able to mine this data to identify new market opportunities?
What if hospitals could take the thousands of sensor readings collected every hour per
patients in ICUs to identify subtle indications that the patient is becoming unwell, days
earlier that is allowed by traditional techniques? Imagine if a green energy company
could use petabytes of weather data along with massive volumes of operational data to
optimize asset location and utilization, making these environmentally friendly energy
sources more cost competitive with traditional sources.
What if you could make risk decisions, such as whether or not someone qualifies for a
mortgage, in minutes, by analyzing many sources of data, including real-time
transactional data, while the client is still on the phone or in the office? Or if law
enforcement agencies could analyze audio and video feeds in real-time without human
intervention to identify suspicious activity?

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

As new sources of data continue to grow in volume, variety and velocity, so too does
the potential of this data to revolutionize the decision-making processes in every
industry.

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Components of IBM Streams

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Application graph of an IBM Streams application

Applications can be developed in the IBM Streams Studio by using IBM Streams
Processing Language (SPL), a declarative language that is customized for stream
computing. Applications with the latest release are, however, generally developed using
a drag-and-drop graphical approach.
After the applications are developed, they are deployed to Streams Runtime
environment. By using Streams Live Graph, you can monitor the performance of the
runtime cluster from the perspective of individual machines and the communications
between them.
Virtually any device, sensor, or application system can be defined by using the
language. But, there are predefined source and output adapters that can further simplify
application development. As examples, IBM delivers the following adapters, among
many others:
• TCP/IP, UDP/IP, and files
• IBM WebSphere Front Office, which delivers stock feeds from major exchanges
worldwide
• IBM solidDB® includes an in-memory, persistent database that uses the Solid
Accelerator API

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

• Relational databases, which are supported by using industry standard ODBC

Applications, such as the one shown above, almost always feature multiple steps.
For example, some utilities began paying customers who sign up for a particular usage
plan to have their air conditioning units turned off for a short time, allowing the
temperature to be changed. An application to implement this plan collects data from
meters and might apply a filter to monitor only for those customers who selected this
service. Then, a usage model must be applied that was selected for that company. Up-
to-date usage contracts then need to be applied by retrieving them, extracting text,
filtering on key words, and possibly applying a seasonal adjustment. Current weather
information can be collected and parsed from the US National Oceanic &
Atmospheric Administration (NOAA), which has weather stations across the United
States. After the correct location is parsed for, text can be extracted and temperature
history can be read from a database and compared to historical information.
Optionally, the latest temperature history could be stored in a warehouse for future use.
Finally, the three streams (meter information, usage contract, and current weather
comparison to historical weather) can be used to take actions.
This example is taken from the free IBM RedBook (PDF): Ebbers, M., Abdel-Gayed, A.,
Budhi, V. B., Dolot, F., Kamat, V., Picone, R., & Trevelin, J. (2013). Addressing Data
Volume, Velocity, and Variety with IBM InfoSphere Streams V3.0. Raleigh, NC: IBM
International Technical Support Organization (ITSO), p. 31.
This RedBook is available from:
http://www.redbooks.ibm.com/redbooks/pdfs/sg248108.pdf

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Demo: “Learn IBM Streams in 5 minutes” (5:53)

Play at full-screen, the YouTube Video:

https://www.ibm.com/ms-
en/marketplace/stream-computing

Demo: “Learn IBM Streams in 5 minutes” (5:53)

Example exploration: How to capture data streams and analyze them faster with
IBM Streams.
YouTube URL: https://www.youtube.com/watch?v=HLHGRy7Hif4

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Demo: “IBM Streams Designer Overview” (2:54)

Play at full-screen, the YouTube Video:

https://youtu.be/i6nBhFoXZqo

Demo: “IBM Streams Designer Overview” (2:54)

This video provides an overview of IBM Streams Designer, part of IBM Watson Data
Platform.
Find more videos in the IBM Streams Designer Learning Center at:
http://ibm.biz/streams-designer-learning.

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 11 Stream Computing

Unit summary
• Define streaming data
• Describe IBM as a pioneer in streaming data - with System S
IBM Streams
• Explain streaming data - concepts & terminology
• Compare and contrast batch data vs streaming data
• List and explain streaming components & Streaming Data Engines
(SDEs)

Unit summary

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Chapter 1
No ratings yet
Chapter 1
13 pages
V18684 QuestionBank AnswerAndExplanation
No ratings yet
V18684 QuestionBank AnswerAndExplanation
203 pages
BDA Unit 3
No ratings yet
BDA Unit 3
18 pages
Stream Processing for IT/CSE Students
No ratings yet
Stream Processing for IT/CSE Students
57 pages
Stream Processing in Big Data Analytics
No ratings yet
Stream Processing in Big Data Analytics
33 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
Chapter 1-1
No ratings yet
Chapter 1-1
34 pages
Big Data and Streaming Analytics Overview
No ratings yet
Big Data and Streaming Analytics Overview
17 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
Django Tutorial
No ratings yet
Django Tutorial
26 pages
JyothsnaDST Unit-1 Extra
No ratings yet
JyothsnaDST Unit-1 Extra
25 pages
Unit-Ii 30-1-24
No ratings yet
Unit-Ii 30-1-24
162 pages
Evaluate Stream Processing Technologies
No ratings yet
Evaluate Stream Processing Technologies
36 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
5-Introduction To Streams Concepts, Stream Data Model and Architecture-03!02!2025
No ratings yet
5-Introduction To Streams Concepts, Stream Data Model and Architecture-03!02!2025
17 pages
SPA Session 10 Stream Platforms
No ratings yet
SPA Session 10 Stream Platforms
26 pages
Streaming Data Insights for Tech Pros
No ratings yet
Streaming Data Insights for Tech Pros
4 pages
Ibm Infosphere
No ratings yet
Ibm Infosphere
8 pages
Data Stream in Data Analytics
No ratings yet
Data Stream in Data Analytics
4 pages
Real Time Data
No ratings yet
Real Time Data
4 pages
Big Data Stream Processing Guide
No ratings yet
Big Data Stream Processing Guide
22 pages
Lec 19
No ratings yet
Lec 19
23 pages
Real-Time Systems & Streaming Data
No ratings yet
Real-Time Systems & Streaming Data
39 pages
BigData Mod2
No ratings yet
BigData Mod2
12 pages
Unit 2
No ratings yet
Unit 2
10 pages
4 Building Blocks of A Streaming Data Architecture
No ratings yet
4 Building Blocks of A Streaming Data Architecture
11 pages
Data Stream Mining Essentials
No ratings yet
Data Stream Mining Essentials
33 pages
Fundaments of Stream Processing PDF
100% (3)
Fundaments of Stream Processing PDF
560 pages
Lec 19
No ratings yet
Lec 19
24 pages
Bigdata-Mining Data Streams
No ratings yet
Bigdata-Mining Data Streams
19 pages
Data Stream Processing Overview
No ratings yet
Data Stream Processing Overview
53 pages
Unit4 2
No ratings yet
Unit4 2
40 pages
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-08-19 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-08-19 Reference-Material-I
53 pages
Unit-II (Big Data)
No ratings yet
Unit-II (Big Data)
20 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
What Is Stream Processing
No ratings yet
What Is Stream Processing
3 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
2007 - StreamBase Algo Trading1.07
No ratings yet
2007 - StreamBase Algo Trading1.07
2 pages
Assignment 1 ADBMS
No ratings yet
Assignment 1 ADBMS
13 pages
Big Data Notes
No ratings yet
Big Data Notes
37 pages
Lecture 16
No ratings yet
Lecture 16
31 pages
5 Unit
No ratings yet
5 Unit
5 pages
V18684 QuestionBank
No ratings yet
V18684 QuestionBank
89 pages
Real Time Data Stream Processing Engine
No ratings yet
Real Time Data Stream Processing Engine
13 pages
BI Unit 3
No ratings yet
BI Unit 3
96 pages
UNIT 3 Notes Data Analytics
No ratings yet
UNIT 3 Notes Data Analytics
136 pages
HolisticBigData Transcript 5
No ratings yet
HolisticBigData Transcript 5
4 pages
Unit 4 Notes PDF
100% (2)
Unit 4 Notes PDF
27 pages
Stream Computing: Amit Kumar
No ratings yet
Stream Computing: Amit Kumar
32 pages
Unit 3
No ratings yet
Unit 3
51 pages
UNIT-2 (Big Data)
No ratings yet
UNIT-2 (Big Data)
30 pages
Unit 5
No ratings yet
Unit 5
20 pages
Unit Iv
No ratings yet
Unit Iv
5 pages
Unit Iv
No ratings yet
Unit Iv
11 pages
Ade Mod 1 Incremental Processing With Spark Structured Streaming
No ratings yet
Ade Mod 1 Incremental Processing With Spark Structured Streaming
73 pages
T09 Data Streaming
No ratings yet
T09 Data Streaming
52 pages
Stream Analytics PDF
No ratings yet
Stream Analytics PDF
22 pages
Counting Distinct Elements in Streams
No ratings yet
Counting Distinct Elements in Streams
19 pages
Prakash, Chandra - Google Cloud Professional Data Engineer Practice Tests 2019 - GCP Data Engineer Dumps 2019. 100 - Unconditional Pass Guarantee Ex (2019, 万千书友聚集地) - Libgen.li
No ratings yet
Prakash, Chandra - Google Cloud Professional Data Engineer Practice Tests 2019 - GCP Data Engineer Dumps 2019. 100 - Unconditional Pass Guarantee Ex (2019, 万千书友聚集地) - Libgen.li
141 pages
Profil Ecs Engineering Sarl
No ratings yet
Profil Ecs Engineering Sarl
25 pages
The Ultimate Veritas Cluster Server
No ratings yet
The Ultimate Veritas Cluster Server
26 pages
1.1 NLB
No ratings yet
1.1 NLB
5 pages
OS 30 Q Assignment
No ratings yet
OS 30 Q Assignment
8 pages
Hyperstore Software Architecture Overview
No ratings yet
Hyperstore Software Architecture Overview
6 pages
Document 1372859.1
No ratings yet
Document 1372859.1
4 pages
Splunk Cluster Admin Course
100% (1)
Splunk Cluster Admin Course
1 page
Workload Migration - Design Guide
No ratings yet
Workload Migration - Design Guide
23 pages
Snowpro Core
No ratings yet
Snowpro Core
55 pages
300 007 933 - A01
No ratings yet
300 007 933 - A01
142 pages
Explain All The Evolutionary Changes in The Age of Internet Computing. The Age of Internet Computing
No ratings yet
Explain All The Evolutionary Changes in The Age of Internet Computing. The Age of Internet Computing
5 pages
KA CGN Scaleout
No ratings yet
KA CGN Scaleout
13 pages
Oracle Grid Infrastructure Guide
No ratings yet
Oracle Grid Infrastructure Guide
24 pages
Unit 1
No ratings yet
Unit 1
21 pages
Kubernetes Objects On Azure
No ratings yet
Kubernetes Objects On Azure
42 pages
IIHT Oracle DBA
No ratings yet
IIHT Oracle DBA
14 pages
Understanding Rain Technology
No ratings yet
Understanding Rain Technology
19 pages
AcronisCyberInfrastructure 5 3 Abgw Quick Start Guide en-US
No ratings yet
AcronisCyberInfrastructure 5 3 Abgw Quick Start Guide en-US
24 pages
Kubernetes CKA 0100 Core Concepts PDF
No ratings yet
Kubernetes CKA 0100 Core Concepts PDF
77 pages
Big Data Analytics and MapReduce Overview
No ratings yet
Big Data Analytics and MapReduce Overview
10 pages
KIBES® Body and Chassis Control: Multiplex System For Commercial Vehicles
No ratings yet
KIBES® Body and Chassis Control: Multiplex System For Commercial Vehicles
20 pages
Quiz 2
100% (1)
Quiz 2
52 pages
Submitting Your MATLAB Jobs Using Slurm To High-Performance Clusters - by Rahul Bhadani - Towards Da
No ratings yet
Submitting Your MATLAB Jobs Using Slurm To High-Performance Clusters - by Rahul Bhadani - Towards Da
1 page
Scalable Computing Over The Internet
100% (1)
Scalable Computing Over The Internet
15 pages
Sun Cluster 3.1 Software Installation Guide: Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A
No ratings yet
Sun Cluster 3.1 Software Installation Guide: Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A
186 pages
Step by Step 10g R2 RAC Instalation On HP-UX
100% (4)
Step by Step 10g R2 RAC Instalation On HP-UX
23 pages
15 CCS335 Notes
No ratings yet
15 CCS335 Notes
108 pages
Windows Cluster Interview Questions and Answers - Mohammed Siddiqui - Academia
No ratings yet
Windows Cluster Interview Questions and Answers - Mohammed Siddiqui - Academia
4 pages
Helix - A Cloud-Based Big Data Orchestration Tool
No ratings yet
Helix - A Cloud-Based Big Data Orchestration Tool
8 pages