0% found this document useful (0 votes)

16 views29 pages

Lecture - Week04

Uploaded by

Santiago Cardenas Jimenez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views29 pages

Lecture - Week04

Uploaded by

Santiago Cardenas Jimenez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Maestría en Ciencia de Datos

Big Data

Week 4
Working with Data Models

Dr Sandra Ortega-Martorell

In this session…
• Streaming data: what it is and why is different

• Data Lakes

• Exploring streaming data

• DBMS-based and non-DBMS-based approaches to Big data

• Big Data Management Systems

• Retrieving Big Data: querying relational data with PostgreSQL

2
S Ortega-Martorell

1
Streaming data: what it is and why is different

3
S Ortega-Martorell

Streaming data
• We mentioned previously that one of the Big Data challenges
was the velocity of data, coming in varying rates.

• For some applications this presents the need to process data

as it is generated, or in other words, as it streams.

• We call these types of applications:

Streaming Data Processing Applications

• This terminology refers to a constant stream of data flowing from a source.

– E.g. data from a sensory machine or data from social media.

4
S Ortega-Martorell

2
Streaming data – example application
• FlightStats, [Link]
– It processes ~60 million weekly flight events that come into their data acquisition system, and turns it into real-time
intelligence for airlines and millions of travellers, daily.

5
S Ortega-Martorell

Data stream – what is it?

• A possibly unbounded sequence of data records
that may or may not be related to, or correlated
with each other.

• Each data is generally timestamped and in some

cases geo-tagged.

• Data can stream from many sources.

– E.g. instruments, many internet of things
application areas, computer programs, websites,
or social media posts

• Streaming data sometimes gets referred to as

event data as each data item is treated as an
individual event in a synchronised sequence.

6
S Ortega-Martorell

3
Data stream – challenges
• Conventional data management architectures are built primarily on the concept of persistent,
static data collections.

• Streams pose very difficult challenges for these conventional data management architectures
– as most often we have only one chance to look at, and process, streaming data before receiving
more.

• Streaming data management systems cannot be separated from real-time processing of data.
– Managing and processing data in motion is a typical capability of streaming data systems.

7
S Ortega-Martorell

Streaming Data Systems – characteristics

Streaming Data Systems

Designed to manage relatively simple computations
Data Stream – one record at a time or small time window of data
The sheer size, variety and velocity
of big data adds further challenges
Computations are done in near-real-time
– sometimes in memory

Computations are independent

The processing components often subscribe to a

stream source non-interactively
– this means they send nothing back to the source, nor
establish interaction with the source

8
S Ortega-Martorell

4
Data stream – dynamic steering
• The concept of dynamic steering involves dynamically changing the next steps or direction
of an application through a continuous computational process using streaming.
– Dynamic steering is often a part of streaming data management and processing.

Example of dynamic steering application: Self-driving car

9
S Ortega-Martorell

Streaming Data Systems

• Examples of big data streaming systems:

10
S Ortega-Martorell

5
Why is Streaming Data different?

Data-at-rest Data-in-motion
• Mostly static data from one or • Analysed as it is generated
more sources – E.g. sensor data processing in a plane
or a self-driving car
• Collected prior to analysis

Analysis of data-at-rest is called Analysis of data-in-motion is called

batch or static processing stream processing
11
S Ortega-Martorell

Data processing algorithms

Size determines time and space:

Static / Batch Processing – The run time and memory usage of most algorithms is usually
dependent on the data size, which can easily be calculated from
files or databases.

Unbounded size, but finite time and space:

– The size of the data is unbounded and this changes the types of
Streaming Processing algorithms that can be used.
– Algorithms that require iterating or looping over the whole data set
are not possible since with stream data, you never get to the end.

12
S Ortega-Martorell

6
Streaming Data Management and Processing
• Streaming Data Management and Processing should enable:

1. Computations on one data element or a small window

of data elements at a time.
– These computations can update metrics, monitor and
plot statistics on the streaming data.

2. Relatively fast and simple computations

– Since computations need to be completed in real time, the tasks processing streaming data should
be quicker (or not much longer) than the streaming rate of the data (data velocity).

3. No interactions with the data source

– In most streaming systems, the management and processing system subscribe to the data source,
but does not send anything back to the stream source in terms of feedback or interactions.
13
S Ortega-Martorell

Streaming Data
• These requirements for streaming data processing are quite different than batch processing.

• In batch processing, the analytical steps have access to (often) all data and can take more
time to complete a complex analytical task with less pressure on the completion time of
individual data management and processing tasks.

• Most organisations today use a hybrid architecture for processing streaming and batch jobs
at the same time, which sometimes get referred to as the lambda architecture.

14
S Ortega-Martorell

7
Streaming Data – Lambda architecture
• Lambda architecture is a data-processing architecture designed to handle massive
quantities of data by taking advantage of both batch- and stream-processing methods.

Now

Batch Real-time
Batch Real-time
Batch Real-time

…
Time

15
S Ortega-Martorell

Streaming data – challenges

• Scalability
– To accommodate rapid growth in traffic and data volume (scaling up)
– To adapt to decreases in demand (scaling down)

• Data replication and durability

Data Availability
– Refers to system uptime, i.e. the storage system is operational and
can deliver data upon request.
Data Availability vs. Durability
They are not the same thing Data Durability
– Refers to long-term data protection
• i.e. the stored data does not suffer from bit rot, degradation or other
corruption.
– It is concerned with data redundancy rather than hardware
redundancy, so that data is never lost or compromised.
16
S Ortega-Martorell

8
Streaming data – challenges
• These two main challenges mentioned before will need to be overcome to avoid data loss,
and enable real time analytical tasks.

The size and frequency of the stream data can significantly change over time.
Data changes may be periodic and sporadic.

17
S Ortega-Martorell

Streaming data – changes in size and frequency

The size and frequency of the stream data can significantly change over time

Example:
Size Streaming data found on social networks can increase in
volume during holidays, sports matches, or major news
events.

Periodic
Frequency Sporadic
18
S Ortega-Martorell

9
Streaming data – periodic changes
Data changes may be periodic

Periodic: evenings,
weekends, etc.

Example:
People may post messages on social
media more in the evenings.

19
S Ortega-Martorell

Streaming data – sporadic changes

Data changes may be sporadic

Sporadic: major
events.

Examples:

There can be an increase in data size

and frequency during major events,
sport matches, etc.

Dropping or missing data when

there are network problems.
20
S Ortega-Martorell

10
Streaming data – extreme changes example
• Example of extreme data fluctuation:
Average Tweets / Second: 6,000

During the first 10 years of Twitter, the record for Most tweets per minute was set during
Germany's victory over Argentina during the 2014 World Cup.

There were 618,725 tweets in

the 60 seconds after the final
whistle was blown

21
S Ortega-Martorell

Data Lakes

22
S Ortega-Martorell

11
Data Lakes
• With big data streaming from different sources in
varying formats, models, and speeds, we need to be
able to ingest this data into a fast and scalable
storage system that is flexible enough to serve many
current and future analytical processes.

• This is when traditional data warehouses with strict

data models and data formats do not fit the big data
challenges for streaming and batch applications.

• The concept of a data lake was created in response

of these big data storage and processing challenges.

23
S Ortega-Martorell

What is a Data Lake?

• Simply speaking, a data lake is a part of a big data infrastructure that many streams can flow
into and get stored for processing in their original form.

• We can think of it as a massive storage depository with huge processing power and ability to
handle a very large number of concurrence, data management and analytical tasks.

24
S Ortega-Martorell

12
How do Data lakes work?
• The concept can be compared to a water body, a lake, where water flows in, filling up a reservoir and
flows out.

25
S Ortega-Martorell

How do Data lakes work? – in more detail

• In a Data Lake:
– The data gets loaded from its source, stored in its native format until it is needed, at which time,
the applications can freely read the data and add structure to it.
– This is called ‘schema-on-read’.
– This approach ensures all data is stored for a potentially unknown use at a later time.

• In contrast, in a Data Warehouse:

– The data is loaded into the warehouse after transforming it into a well-defined and structured
format.
– This is called ‘schema-on-write’.
– Any application using the data needs to know this format in order to retrieve and use the data.
– In this approach, data is not loaded into the warehouse unless there is a use for it.

26
S Ortega-Martorell

13
Data warehouse vs. Data lake

Data warehouse Data lake

• Stores data in a hierarchical file system • Stores data as flat files with a unique identifier.
with a well-defined structure • This often gets referred to as object storage in
big data systems.
Hierarchical file system

Object storage
27
S Ortega-Martorell

Data lake object storage

Data lake
• Each data is stored as a Binary Large
Object (BLOB) and is assigned a unique
identifier.
• Each data object is tagged with a
number of metadata tags.
• The data can be searched using these
metadata tags to retrieve it.

• In Hadoop data architectures, data is loaded into HDFS and processed using the appropriate data management and
analytical systems on commodity clusters.
• The selection of the tools is based on the nature of the problem being solved, and the data format being accessed.
28
S Ortega-Martorell

14
Data lakes – summary
• A Big Data storage architecture

• Collects all data for current and future use/analysis

• Transforms data format only when needed (schema-on-read)

• Supports all types of Big data users

• Data Lakes Adapt Easily to Changes

– Infrastructure component that evolve over time, based on application-specific needs

29
S Ortega-Martorell

DBMS-based and non-DBMS-based

approaches to Big data

30
S Ortega-Martorell

15
Storing data – Files vs. DBMS
• In the past, database operations were applications in file systems.

• Problems with this approach:

– Data redundancy, inconsistency, and isolation
Multiple file formats, often with duplication of the information, large and complex files.
– Each task a problem
No uniform way to access data.
E.g. finding employees in a department sorted by their salary vs. finding employees in all departments sorted by
their start date -> two separate programs.
– Data integrity
Enforcement of constraints (also called integrity constraints).
E.g. every employee has exactly one job title. Changing this rule implies finding where it was coded.
– Atomicity of updates (all of the changes must happen altogether, as a single unit)
This has to do with system failures.
E.g. if there is a failure when updating a series of events, it is not easy to fix or even start all over again.

Hence the need for the transition to a DBMS

31
S Ortega-Martorell

Advantages of a DBMS
Current DBMSs, especially relational DBMSs, have a number
of advantages:

1. Declarative query languages

– Declarative means that we state what we want to retrieve without saying how to retrieve it.
– E.g. We can say ‘find the average salary of employees in the IT division for every job title and sort
from high to low’. We do not need to say ‘how’ to extract or group these records.
– Hence, no more task-based programs.

2. Data independence
– Applications do not worry about data storage formats and locations.
– The goal of data independence is to isolate the users from the record layout so long as the logical
definition of the data (tables and their attributes) are clearly specified.

32
S Ortega-Martorell

16
Advantages of a DBMS
3. Effective access through optimisation
– The system automatically finds an efficient way to access data
• Even when there are a large number of tables and hundreds of millions of records.

4. Data integrity and security A C I D

– Methods to keep the accuracy and
consistency of data despite failure Atomicity: Consistency: Isolation: Durability:
Transactions Only valid Transactions When written
• Transaction safety: ACID properties are all or data is saved do not affect data will not
of the transactions. nothing each other be lost
• Failure recovery

5. Concurrent access
– Many users can simultaneously access data without conflict
• E.g. An airline reservation system with lots of people buying tickets at the same time, the DBMS must
ensure that a ticket is not sold twice. Or if someone is in the middle of buying the last ticket, another
person does not see that ticket as available.
33
S Ortega-Martorell

How traditional databases handle large data volumes?

• Parallel Database System
– Improves performance through parallel implementation
• Operations like selection and join use parallel algorithms
– Often allows data replication
• Data redundancy against table corruption
• More concurrent queries

• Distributed Database System

– Data is stored across several sites, each site managed by a DBMS capable of running
independently.
• In this case, one component knows some part of the schema of it is neighbouring DBMS and can
pass a query, or part of a query, to the neighbour when needed.

34
S Ortega-Martorell

17
DBMS-based approaches
• Typical question:

If DBMSs are so powerful, why do we see the

rise of MapReduce-style Systems?

Unfortunately, the answer is not straightforward.

35
S Ortega-Martorell

DBMS and MapReduce-style Systems

• DBMSs: efficient storage, transactions and retrieval
– Partitioned data parallelism
• Different parts of a logical table can physically reside on different machines, then different parts of a
query can access the partitions in parallel and speed up query performance.
– Optimised computation and communication cost (i.e. time to exchange data between machines)
– However, they do not take into account machine failure

MapReduce (MR) was developed not for storage and retrieval,

but for distributive processing of large amounts of data.

• MR-style systems: their goal was to support complex data processing over a cluster of machines
– Since MR implementations are over HDFS, issues like node failure are automatically accounted for.
– They can be effectively used for data analytics: data mining, clustering, machine learning.
– Multi-stage, problem-specific algorithms are handled naturally (as opposed to in a RDBMS).
– They operate on wider variety of data, including unstructured data like text.
36
S Ortega-Martorell

18
Tension points in the data management world
The mixture of Data Management requirements and Data Processing Analysis
requirements have created an interesting tension in the data management world.

A few of these tension points are:

1. Data loading – a new bottleneck

– DBMSs perform storage and retrieval operations very efficiently. But first, the data must be loaded
into the DBMS. So, how long does loading take?
– Does the application need data sooner than the loading time?

2. Too much functionality

– For very simple operations (reading records) involving a huge table, do we need all the guarantees
offered by DBMSs?
– Does the application use only a few data management features?

37
S Ortega-Martorell

Tension points in the data management world

3. Combination of Transactional and Analytical capabilities

– There is an emerging class of optimisation that meets all the nice transactional guarantees that a
DBMS provides.
– At the same time, meets the support for efficient analytical operations.
– These are often required for systems like Real-Time Decision Support.

38
S Ortega-Martorell

19
Mixed solutions
The combination of traditional requirements with new ones is leading to new capabilities and
products.
• DBMS-Hadoop interoperation
– DBMS technologies are creating new techniques that make use of MapReduce-style data processing. Many of
them run on HDFS.
– DBMSs are providing ways to perform a MapReduce-style operation on HDFS files and exchange data between
the Hadoop subsystem and the DBMS.
• Relational operations in MapReduce systems like Spark
– Simple map and reduce operations are not sufficient for many data operations.
– Spark has several kinds of join and data grouping operations in addition to map and reduce.
• Streaming input to DBMS
– Some DBMSs are making use of large distributed memory management operations to accept streaming data.
• New parallel programming models for analytical computation within DBMS
– These algorithms use a MR-style computing and are becoming a part of a new generation of DBMS products
that invoke these algorithms from inside the database system. E.g. finding dense regions in a graph.
39
S Ortega-Martorell

Big Data Management Systems (BDMS)

[From DBMS to BDMS]

40
S Ortega-Martorell

20
Desired characteristics of BDMS
• A flexible, semi-structured, data model
– Support traditional application which requires the development of a schema, and also support applications
which require no schema, because the data can vary in terms of its attributes and relationships.

• Support for today’s common Big Data data types

– Textual, temporal, and spatial data values

• A full query language

– To effectively manage large volumes of data, it is often more convenient to use a query language and let the
query processor automatically determine optimal ways to receive data.
– This query language may or may not look like SQL, but it should at least be equally powerful.

• An efficient parallel query engine running on multiple machines

– The machines can be connected to a shared-nothing architecture, or shared-memory architecture, or a
shared cluster.
• The shared-nothing means two machines that do not share a risk on memory.

41
S Ortega-Martorell

Desired characteristics of BDMS

• Wide range of query sizes
– Some applications working on top of a BDMS will issue queries which will only have a few conditions and a
few small objects to return.
– But some applications, especially those generated by other software tools or machine learning algorithms,
can have many conditions and can return many large objects.

• Continuous data ingestion

– In many cases, a BDMS has both streaming and existing data, that needs to be combined to solve a problem.
– E.g. combination of streaming data from weather stations with historical data to make better forecasts.

• Scale gracefully to manage and query large volumes of data

– Designed to operate over a cluster, and know how to handle a failure.
– The system should be able to handle new machines joining or existing machines leaving the cluster.

• Full data management capability

– Easy to install, restart and configure, provide high availability and make operational management as simple
as possible.
42
S Ortega-Martorell

21
ACID and BASE
• ACID properties are hard to maintain in a BDMS:
A C I D
– There is too much data and too many updates from Atomicity: Consistency: Isolation: Durability:
Transactions Only valid Transactions When written
too many users.
are all or data is saved do not affect data will not
– The effort to maintain ACID properties may lead to a nothing each other be lost
significant slowdown of the system

• While ACID properties are still desirable, it

might be more practical to relax them.

• BASE relaxes ACID.

43
S Ortega-Martorell

CAP theorem
• A distributed computer system cannot simultaneously achieve:
– Consistency:
• Every read receives the most recent write or an error
– Availability:
• Every request receives a (non-error) response, without the guarantee that it contains the most recent
write
– Partition tolerance
• The system continues to operate despite an arbitrary number of messages being dropped (or delayed)
by the network between nodes

44
S Ortega-Martorell

22
CAP theorem Availability
Remains accessible and
operational at all times

Traditional relational databases:

A
PostgreSQL, MySQL, etc. Cassandra, CouchDB,
Dynamo-like systems.

CA AP
Pick two!

Consistency C CP P Partition Tolerance

Commits are atomic across the Only a total network failure can cause
entire distributed system the system to respond incorrectly
Hbase, MongoDB, Redis,
BigTable-like systems.
45
S Ortega-Martorell

Red Hot: The 2021 Machine Learning, AI and Data (MAD)

Landscape

[Link] 46
S Ortega-Martorell

23
Retrieving Big Data

47
S Ortega-Martorell

Data retrieval and Query language

Data retrieval is the way in which the desired data is

specified and retrieved from a data store

A query language is a language that allows us to

specify the data item we need
A query language is declarative:
• Specify what you need rather than how to obtain it
• SQL (Structured Query Language)
In contrast to a query language, a database programming language,
e.g. Oracle’s PL/SQL (Procedural Language/SQL):
• High-level procedural programming language
• Embeds query operations

48
S Ortega-Martorell

24
SQL
• Example
– Beer Drinkers Club that owns many bars, and each bar sells beer.
– Not every bar sells the same brands of beer, and even when they do, they may have different prices.
– It keeps information about the regular member customers.
– It also knows which member visits which bars, and which beer each member likes.

• Database schema

49
S Ortega-Martorell

Select-from-where
• Which beers are made by Heineken?
From Data Operations, this form of
Output attribute(s) query can also be represented as:
SELECT name
FROM Beers Table(s) to use
WHERE manf = ‘Heineken’

The condition(s) to satisfy

Strings like ‘Heineken’ are case-
sensitive and are put in quotes

50
S Ortega-Martorell

25
More example queries
• Find expensive beers

• Which businesses have a temporary license

(starts with 32) in San Diego?

51
S Ortega-Martorell

Select-Project queries in large tables

• Large tables can be partitioned

• There are many partitioned schemes

– One of them is a range partitioning on primary key

… …

Machine 1 Machine 2 Machine 5

52
S Ortega-Martorell

26
Select-Project queries in large tables

… …

Machine 1 Machine 2 Machine 5

• Two queries: SELECT *

– Find records for beers whose name FROM Beers
starts with ‘Am’ WHERE name like ‘Am%’ SELECT name
– Which beers are made by Heineken? FROM Beers
WHERE manf = ‘Heineken’
53
S Ortega-Martorell

Evaluating Select-Project queries for large data

• A query processing trick
– Use the partitioning information: just use partition 1.

SELECT *
Thus, so long as the system knows the partitioning
FROM Beers
strategy, it can make its job much more efficient
WHERE name like ‘Am%’

54
S Ortega-Martorell

27
Evaluating Select-Project queries for large data
• Let’s look at the second query in the same partition setting
– The query condition is on the second attribute, manf, so the same trick cannot be applied.
– As it stands, this time we need to look up in all partitions.
– It can be done in parallel.

SELECT name
FROM Beers
WHERE manf = ‘Heineken’

55
S Ortega-Martorell

Local and global indexing

• What if I had 100 machines, and the desired data is only in 20 of them?
– Should we needlessly go through all 100 machines?
– Can it not be avoided?

• Yes: to do this, it would need one more piece in the solution, called an index structure.
– Very simply, an index can be thought of as a reverse table,
where given the value in a column, you would get back
the records where the value appears.
– Using an index speeds up query processing significantly.
– With indexes, we can solve this problem in different ways:
• Use local index on each machine
• Use a machine index for each value
• Use a combined index in a global index server

This will use more space, but queries will be faster.

56
S Ortega-Martorell

28
Conclusions

57
S Ortega-Martorell

Summary
1. Summarised the key characteristics of a data stream and identified the requirements of
streaming data systems.
2. Described how Data Lakes enable batch processing of streaming data, and explained the
difference between ‘schema-on-write’ and ‘schema-on-read’.
3. Explained how data streams, data lakes, and data warehouse are organised on a spectrum
of a big data management and storage.
4. Explained the advantages of using DBMS over a file system; specified the differences
between a parallel and a distributed file system; and described a MapReduce-style DBMS.
5. Explained the desirable characteristics of a Big Data Management System (BDMS), gave
examples of BDMSs, and described their similarities and differences.

58
S Ortega-Martorell

Big Data and Streaming Analytics Overview
No ratings yet
Big Data and Streaming Analytics Overview
17 pages
Big Data Computing: Working With Data Models and Big Data Processing
No ratings yet
Big Data Computing: Working With Data Models and Big Data Processing
46 pages
Stream Processing in Big Data
No ratings yet
Stream Processing in Big Data
39 pages
Unit4 2
No ratings yet
Unit4 2
40 pages
Unit-2 BDA
No ratings yet
Unit-2 BDA
30 pages
Big Data Stream Processing Guide
No ratings yet
Big Data Stream Processing Guide
22 pages
BDA Unit 3
No ratings yet
BDA Unit 3
18 pages
JyothsnaDST Unit-1 Extra
No ratings yet
JyothsnaDST Unit-1 Extra
25 pages
DataStreaming L-4
No ratings yet
DataStreaming L-4
16 pages
Big Data Notes
No ratings yet
Big Data Notes
37 pages
Stream Processing in Big Data Analytics
No ratings yet
Stream Processing in Big Data Analytics
33 pages
Stream Processing for IT/CSE Students
No ratings yet
Stream Processing for IT/CSE Students
57 pages
Unit 3
No ratings yet
Unit 3
30 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
Big Data Analysis Apache Storm Perspecti
No ratings yet
Big Data Analysis Apache Storm Perspecti
6 pages
Latency 5
No ratings yet
Latency 5
8 pages
4 Building Blocks of A Streaming Data Architecture
No ratings yet
4 Building Blocks of A Streaming Data Architecture
11 pages
5-Introduction To Streams Concepts, Stream Data Model and Architecture-03!02!2025
No ratings yet
5-Introduction To Streams Concepts, Stream Data Model and Architecture-03!02!2025
17 pages
Yadranjiaghdam Mastersthesis 2016
No ratings yet
Yadranjiaghdam Mastersthesis 2016
49 pages
Streaming Data Insights for Tech Pros
No ratings yet
Streaming Data Insights for Tech Pros
4 pages
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
T09 Data Streaming
No ratings yet
T09 Data Streaming
52 pages
Chapter 1-1
No ratings yet
Chapter 1-1
34 pages
5 Unit
No ratings yet
5 Unit
5 pages
Real-Time Systems & Streaming Data
No ratings yet
Real-Time Systems & Streaming Data
39 pages
Bda Mid Ans
No ratings yet
Bda Mid Ans
18 pages
Unit 3 - BD - Streaming
No ratings yet
Unit 3 - BD - Streaming
42 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
Data Stream Processing Overview
No ratings yet
Data Stream Processing Overview
53 pages
Streaming Data
No ratings yet
Streaming Data
33 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
Lecture 1 and 2 Powerpoints
No ratings yet
Lecture 1 and 2 Powerpoints
32 pages
IoT Data Stream Mining Guide
No ratings yet
IoT Data Stream Mining Guide
22 pages
Unit-II (Big Data)
No ratings yet
Unit-II (Big Data)
20 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
TRabl StreamProcessing
No ratings yet
TRabl StreamProcessing
79 pages
Understanding Data Streams
No ratings yet
Understanding Data Streams
10 pages
Real-Time Data Stream Processing - Challenges and
No ratings yet
Real-Time Data Stream Processing - Challenges and
8 pages
Unit 1 Big Data - VII SEM (2024-25)
No ratings yet
Unit 1 Big Data - VII SEM (2024-25)
48 pages
Module3A MiningBigDataStreams
No ratings yet
Module3A MiningBigDataStreams
145 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
The Excitement of Data Science
No ratings yet
The Excitement of Data Science
137 pages
UNIT-2 (Big Data)
No ratings yet
UNIT-2 (Big Data)
30 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
24 pages
Emerging Technologies2
No ratings yet
Emerging Technologies2
27 pages
Unit-Ii 30-1-24
No ratings yet
Unit-Ii 30-1-24
162 pages
Big Data Topic5 (Streaming Analytics) (Thanh Binh Nguyen) .TextMark
No ratings yet
Big Data Topic5 (Streaming Analytics) (Thanh Binh Nguyen) .TextMark
48 pages
BDA Lec10
No ratings yet
BDA Lec10
33 pages
Streaming Data Systems Overview
No ratings yet
Streaming Data Systems Overview
24 pages
Data Evolution Unit 1 Material
No ratings yet
Data Evolution Unit 1 Material
28 pages
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
No ratings yet
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
71 pages
The Growing Enormous of Big Data Storage
No ratings yet
The Growing Enormous of Big Data Storage
6 pages
Big Data: Challenges and Technologies
No ratings yet
Big Data: Challenges and Technologies
6 pages
Unit - I Data Science
No ratings yet
Unit - I Data Science
95 pages
Unit 3
No ratings yet
Unit 3
51 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
Articol Disteibuted Data Processing
No ratings yet
Articol Disteibuted Data Processing
9 pages
Data Science
No ratings yet
Data Science
54 pages
SPA Notes
No ratings yet
SPA Notes
4 pages
KLN College of Engineering B.E. Computer Science and Engineering Choice Based Credit System Regulation-2017 I-VIII Semesters Course Outcomes
No ratings yet
KLN College of Engineering B.E. Computer Science and Engineering Choice Based Credit System Regulation-2017 I-VIII Semesters Course Outcomes
26 pages
Fullstack Developer & Data Expert
No ratings yet
Fullstack Developer & Data Expert
1 page
Sales Planning and Operations Assignment
33% (6)
Sales Planning and Operations Assignment
30 pages
RFP for Data Analytics Implementation
No ratings yet
RFP for Data Analytics Implementation
55 pages
Hibernate & Spring: ORM and JPA Guide
No ratings yet
Hibernate & Spring: ORM and JPA Guide
55 pages
Duplicate Operation With RMAN of A PDB With TDE Enabled
100% (1)
Duplicate Operation With RMAN of A PDB With TDE Enabled
14 pages
SAP HANA Master Guide en
No ratings yet
SAP HANA Master Guide en
84 pages
DBMS Question Bank 2024-25
No ratings yet
DBMS Question Bank 2024-25
10 pages
Data Models
No ratings yet
Data Models
10 pages
Transitioning To Rubrik Security Cloud
No ratings yet
Transitioning To Rubrik Security Cloud
70 pages
Accounting Textbook Solutions - 53
No ratings yet
Accounting Textbook Solutions - 53
19 pages
College Admission Predictor
No ratings yet
College Admission Predictor
6 pages
The Agile Unified Process AUP
No ratings yet
The Agile Unified Process AUP
34 pages
Day 26 BSO TSO
No ratings yet
Day 26 BSO TSO
10 pages
CRUD Operations for Quotes App
No ratings yet
CRUD Operations for Quotes App
13 pages
Data Mining Assignment for CAP-617
No ratings yet
Data Mining Assignment for CAP-617
11 pages
Computer-Science-Model-Paper-XI - (Paper II)
No ratings yet
Computer-Science-Model-Paper-XI - (Paper II)
12 pages
Auditing, Softwares
100% (1)
Auditing, Softwares
26 pages
Ii E. National Information System For Science and Technology
No ratings yet
Ii E. National Information System For Science and Technology
8 pages
Production Schedule
No ratings yet
Production Schedule
1 page
Spring Hibernate Integration Guide
No ratings yet
Spring Hibernate Integration Guide
11 pages
Cloud Midterm
No ratings yet
Cloud Midterm
20 pages
R/3 Architecture Client Server Technologies Layer and Uses: SAP R/3 Full Life Cycle Implementation
No ratings yet
R/3 Architecture Client Server Technologies Layer and Uses: SAP R/3 Full Life Cycle Implementation
148 pages
Module 1-Data Mining Introduction (Student Edition)
No ratings yet
Module 1-Data Mining Introduction (Student Edition)
39 pages
SQL Recommendations For MECM - White Paper v2.6
100% (1)
SQL Recommendations For MECM - White Paper v2.6
75 pages
DBMS Lab-4
No ratings yet
DBMS Lab-4
8 pages
TRIZ and Biomimetics in Design Integration
No ratings yet
TRIZ and Biomimetics in Design Integration
11 pages
Salesforceodbcuser SQL
No ratings yet
Salesforceodbcuser SQL
244 pages
Hibernate Notes
No ratings yet
Hibernate Notes
66 pages
Oracle Exadata X5 Exam 1Z0-070 Results
No ratings yet
Oracle Exadata X5 Exam 1Z0-070 Results
22 pages