0% found this document useful (0 votes)

189 views16 pages

Big Data and DBMS Overview

This document discusses big data and relational database management systems. It provides examples of how relational databases work using queries on employee and car owner data. It then discusses the history of relational databases and how data sizes and types have increased over time. Examples are given of the large amounts of data handled by companies like Google, Facebook, and CERN. The challenges of big data are also discussed.

Uploaded by

Yogendra Uikey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

189 views16 pages

Big Data and DBMS Overview

Uploaded by

Yogendra Uikey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Big Data

Sunnie S Chung

A Brief History
Relational database
management systems

Time

19751985
19851995
19952005
20052010
2020

Let us first see what a

relational database
system is

User/Application

Data Management
Query

Query

Data

DataBase Management System (DBMS)

Example: At a Company
Query 1: Is there an employee named Nemo?
Query 2: What is Nemos salary?
Query 3: How many departments are there in the company?
Query 4: What is the name of Nemos department?
Query 5: How many employees are there in the
Accounts department?
Employee

Department

Name

DeptID

Salary

Name

Nemo

120K

Dory

156

79K

Accounts

Gill

76K

Ray

85K

156

Marketing

DataBase Management System (DBMS)

High-level
Query Q

Answer

DBMS

Data

Translates Q into
best execution plan
for current conditions,
runs plan

Example: Store that Sells Cars

Make
Model OwnerID
ID Name
Owners of
12
12 Nemo
Honda Accords Honda Accord
who are <=
Honda Accord
156
156 Dory
23 years old
Join ([Link] = [Link])
Filter (Make = Honda and
Model = Accord)

Cars

Age
22
21

Filter (Age <= 23)

Owners

Make

Model

OwnerID

Name

Age

Honda

Accord

Nemo

Toyota

Camry

Ray

Mini

Cooper

Gill

Honda

Accord

156

Dory

DataBase Management System (DBMS)

High-level
Query Q

Answer

DBMS
Keeps data safe
and correct
despite failures,
concurrent
updates, online
processing, etc.

Data

Translates Q into
best execution plan
for current conditions,
runs plan

A Brief History
Relational database
management systems

Time

19751985
19851995
19952005
20052010
2020

Assumptions and
requirements changed
over time
Semi-structured and
unstructured data (Web)
Hardware developments
Developments in
system software
Changes in
data sizes

Big Data: How much data?

Google processes 20 PB a day (2008)

Wayback Machine has 3 PB + 100 TB/month (3/2009)

eBay has 6.5 PB of user data + 50 TB/day (5/2009)

Facebook has 36 PB of user data + 80-90 TB/day (6/2010)

CERNs LHC: 15 PB a year (any day now)

LSST: 6-10 PB a year (~2015)

640K ought to be
enough for
anybody.

From [Link]

From: [Link]

NEW REALITIES
The quest for knowledge used to
TBwith
disks
< $100
begin
grand
theories.
Everything is data
Now it begins with massive amounts
Rise of data-driven culture
of data.
Very publicly espoused
Welcome
to theWired,
Petabyte
by Google,
etc. Age.
Sloan Digital Sky Survey,
Terraserver, etc.

From: [Link]

FOX AUDIENCE
NETWORK
Greenplum parallel DB

42 Sun X4500s (Thumper) each

with:

48 500GB drives

16GB RAM

2 dual-core Opterons

Big and growing

200 TB data (mirrored)

Fact table of 1.5 trillion rows
Growing 5TB per day

4-7 Billion rows per day

From: [Link]

Also extensive use of R

and Hadoop
Yahoo! runs a 4000
node Hadoop cluster
(probably the largest).
Overall, there are
38,000 nodes running
Hadoop at Yahoo!
As reported by FAN, Feb, 2009

A SCENARIO FROM FAN

How many female WWF
fans under the age of 30
visited the Toyota
community over the last 4
days and saw a Class A ad?

How are these people

similar to those that
visited Nissan?

Open-ended question about

statistical densities
(distributions)
From: [Link]

MULTILINGUAL
DEVELOPMENT
SQL or MapReduce
Sequential code in a
variety of languages
Perl
Python
Java
R

Mix and Match!

From: [Link]

SE HABLA MAPREDUCE
SQL SPOKEN HERE
QUI SI PARLA PYTHON
HIER JAVA GESPROCKEN
R PARL ICI

From: [Link]

What is important to learn

Principles of query processing (35%)
Indexes
Query execution plans and operators
Query optimization
Data storage (15%)
Databases Vs. Filesystems (Google/Hadoop Distributed
FileSystem)
Data layouts (row-stores, column-stores, partitioning,
compression)
Scalable data processing (40%)
Parallel query plans and operators
Systems based on MapReduce
Scalable key-value stores
Processing rapid, high-speed data streams

Concurrency control and recovery (10%)

Consistency models for data (ACID, BASE, Serializability)
Write-ahead logging

Dessalegn Data
No ratings yet
Dessalegn Data
20 pages
01 Intro
No ratings yet
01 Intro
20 pages
Cours 1
No ratings yet
Cours 1
182 pages
CS 441 Handouts
No ratings yet
CS 441 Handouts
300 pages
Chapter-1-Introduction To Big Data
No ratings yet
Chapter-1-Introduction To Big Data
25 pages
Big Data
No ratings yet
Big Data
79 pages
Module 1
No ratings yet
Module 1
54 pages
Internet Technologies: By: Nandish Rao A
No ratings yet
Internet Technologies: By: Nandish Rao A
16 pages
03 Big Data and Analytics
No ratings yet
03 Big Data and Analytics
56 pages
Understanding Database Systems
No ratings yet
Understanding Database Systems
12 pages
What Is Bigdata
No ratings yet
What Is Bigdata
5 pages
Mod10-Wk10 CSG2132 Module 10 Big Data 2020
No ratings yet
Mod10-Wk10 CSG2132 Module 10 Big Data 2020
26 pages
03-04 Big Data and Analytics
No ratings yet
03-04 Big Data and Analytics
49 pages
Overview of Databases and Users
No ratings yet
Overview of Databases and Users
44 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
60 pages
CT113H Lecture 1 - Introduction To NoSQL
No ratings yet
CT113H Lecture 1 - Introduction To NoSQL
51 pages
Taming Big Data
No ratings yet
Taming Big Data
268 pages
Wa0033.
No ratings yet
Wa0033.
26 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
9 pages
DBMS - Intro 2023 - v2
No ratings yet
DBMS - Intro 2023 - v2
66 pages
Parcial Cono 1 21
No ratings yet
Parcial Cono 1 21
21 pages
Understanding Data Mining and Big Data
No ratings yet
Understanding Data Mining and Big Data
14 pages
Big Data Introduction Unit 1
No ratings yet
Big Data Introduction Unit 1
19 pages
Big Data Analytics (VN) 1
No ratings yet
Big Data Analytics (VN) 1
98 pages
Database Management Systems Overview
No ratings yet
Database Management Systems Overview
250 pages
Introduction To Big Data: Soorya Prasanna Ravichandran
No ratings yet
Introduction To Big Data: Soorya Prasanna Ravichandran
33 pages
Understanding Big Data
No ratings yet
Understanding Big Data
14 pages
Database Applications Cy S 125242: DR - Layla Abdour
No ratings yet
Database Applications Cy S 125242: DR - Layla Abdour
32 pages
Unit 1
No ratings yet
Unit 1
118 pages
KMG Zayan Mou
No ratings yet
KMG Zayan Mou
41 pages
App Dev Finals
No ratings yet
App Dev Finals
7 pages
1 Introduction
No ratings yet
1 Introduction
18 pages
Big Data Challenges & Solutions
100% (1)
Big Data Challenges & Solutions
17 pages
Notes, DB Intro
No ratings yet
Notes, DB Intro
33 pages
Big Data Analytics for B.Tech Students
No ratings yet
Big Data Analytics for B.Tech Students
134 pages
Enter The Purpose-Built Database Era:: Finding The Right Database Type For The Right Job
No ratings yet
Enter The Purpose-Built Database Era:: Finding The Right Database Type For The Right Job
24 pages
Chapter 01 Introduction
No ratings yet
Chapter 01 Introduction
52 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
Business Notes
No ratings yet
Business Notes
26 pages
Big Data Opportunities and Challenges - (2 BIG DATA TECHNOLOGIES)
No ratings yet
Big Data Opportunities and Challenges - (2 BIG DATA TECHNOLOGIES)
3 pages
Introduction Ch1
No ratings yet
Introduction Ch1
28 pages
Data Science
No ratings yet
Data Science
54 pages
Unit 1
No ratings yet
Unit 1
19 pages
BDA - M 3 - NoSQL
No ratings yet
BDA - M 3 - NoSQL
81 pages
Dbms Mod 1 Notes
No ratings yet
Dbms Mod 1 Notes
63 pages
b17b18 Dbms
No ratings yet
b17b18 Dbms
18 pages
Data Collection & Analysis Educational Presentation in Pink and Blue Lined Style
No ratings yet
Data Collection & Analysis Educational Presentation in Pink and Blue Lined Style
51 pages
Chapter 5c
No ratings yet
Chapter 5c
18 pages
Unit 1 Mangodb
No ratings yet
Unit 1 Mangodb
57 pages
DBMS-Module 1
No ratings yet
DBMS-Module 1
40 pages
The Growing Enormous of Big Data Storage
No ratings yet
The Growing Enormous of Big Data Storage
6 pages
Big Data Training
No ratings yet
Big Data Training
244 pages
Module 1 Intro To Big Data - Hadoop
No ratings yet
Module 1 Intro To Big Data - Hadoop
55 pages
Lecture1 Intro To DBMS
No ratings yet
Lecture1 Intro To DBMS
32 pages
Course Pack - Introduction To Databases
No ratings yet
Course Pack - Introduction To Databases
41 pages
73cettm MTNL Hostel
No ratings yet
73cettm MTNL Hostel
12 pages
(Economy Q) Difference Between Subprime Crisis and Eurozone Crisis Mrunal
No ratings yet
(Economy Q) Difference Between Subprime Crisis and Eurozone Crisis Mrunal
3 pages
Interview Digest SBI PO 2016 PDF
No ratings yet
Interview Digest SBI PO 2016 PDF
28 pages
Circular 568
77% (13)
Circular 568
11 pages
Shared Memory Programming Pthreads: DR Matthew Grove
No ratings yet
Shared Memory Programming Pthreads: DR Matthew Grove
41 pages
Limits Qna
No ratings yet
Limits Qna
5 pages
Guidance Tool For TB Notification in India - FINAL
No ratings yet
Guidance Tool For TB Notification in India - FINAL
92 pages
GATE Discrete Mathematics & Graph Theory Book
100% (2)
GATE Discrete Mathematics & Graph Theory Book
12 pages
Suffix Trees in Graph Applications
No ratings yet
Suffix Trees in Graph Applications
40 pages
English Grammar in Hindi
87% (15)
English Grammar in Hindi
14 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
17 pages
What Is The Difference Between A Von Neumann
No ratings yet
What Is The Difference Between A Von Neumann
2 pages
CBSE Air Ticket Reservation Project
No ratings yet
CBSE Air Ticket Reservation Project
25 pages
Linux Command Line Quiz Questions
No ratings yet
Linux Command Line Quiz Questions
26 pages
Incremental Data Ingestion Guide
No ratings yet
Incremental Data Ingestion Guide
11 pages
File Management & Operations Guide
No ratings yet
File Management & Operations Guide
28 pages
OMA Photo Tropic Tower
No ratings yet
OMA Photo Tropic Tower
6 pages
Akhil's Resume (2024) PDF
No ratings yet
Akhil's Resume (2024) PDF
1 page
CertyIQ PL-300 UpdatedExam Dumps - 40 Imp Que - Part 2
100% (6)
CertyIQ PL-300 UpdatedExam Dumps - 40 Imp Que - Part 2
85 pages
DBMS 2 Marks Questions Answers
No ratings yet
DBMS 2 Marks Questions Answers
4 pages
Literature Review Example Qut
No ratings yet
Literature Review Example Qut
16 pages
DIT622
No ratings yet
DIT622
4 pages
Snowflake Micro-Partition Clustering Insights
No ratings yet
Snowflake Micro-Partition Clustering Insights
6 pages
Oracle 1z0-083 Exam Questions
No ratings yet
Oracle 1z0-083 Exam Questions
37 pages
A Portable and Efficient Generic Parser For Flat Files - CodeProject
No ratings yet
A Portable and Efficient Generic Parser For Flat Files - CodeProject
8 pages
Networker Cheatsheet
No ratings yet
Networker Cheatsheet
2 pages
Lab 4
No ratings yet
Lab 4
2 pages
Jurnal Kentang 1
No ratings yet
Jurnal Kentang 1
9 pages
Types of Database Normal Forms
No ratings yet
Types of Database Normal Forms
19 pages
Data Analyst
No ratings yet
Data Analyst
12 pages
Objective of Online Quiz System
75% (4)
Objective of Online Quiz System
9 pages
Object Databases Explained
100% (5)
Object Databases Explained
78 pages
Understanding Arrays in Java
No ratings yet
Understanding Arrays in Java
18 pages
Building An Analytic Extension To MySQL With ClickHouse and Open Source
No ratings yet
Building An Analytic Extension To MySQL With ClickHouse and Open Source
36 pages
Array
No ratings yet
Array
17 pages
Inside RavenDB 4 0
100% (1)
Inside RavenDB 4 0
465 pages
NGT 3
No ratings yet
NGT 3
37 pages
Dell Powervault Me5 Ss
No ratings yet
Dell Powervault Me5 Ss
5 pages
Principles of Curriculum Content Selection
No ratings yet
Principles of Curriculum Content Selection
11 pages
Data Models for Database Designers
No ratings yet
Data Models for Database Designers
59 pages
AI & Data Protection Challenges
No ratings yet
AI & Data Protection Challenges
13 pages
COE 205 Lab Manual Lab 3: Defining Data and Symbolic Constants - Page 25
No ratings yet
COE 205 Lab Manual Lab 3: Defining Data and Symbolic Constants - Page 25
11 pages

Big Data and DBMS Overview

Uploaded by

Big Data and DBMS Overview

Uploaded by

Big Data

Let us first see what a

DataBase Management System (DBMS)

DataBase Management System (DBMS)

Example: Store that Sells Cars

Filter (Age <= 23)

DataBase Management System (DBMS)

Big Data: How much data?

Google processes 20 PB a day (2008)

Wayback Machine has 3 PB + 100 TB/month (3/2009)

eBay has 6.5 PB of user data + 50 TB/day (5/2009)

Facebook has 36 PB of user data + 80-90 TB/day (6/2010)

CERNs LHC: 15 PB a year (any day now)

LSST: 6-10 PB a year (~2015)

42 Sun X4500s (Thumper) each

Big and growing

200 TB data (mirrored)

4-7 Billion rows per day

Also extensive use of R

A SCENARIO FROM FAN

How are these people

Open-ended question about

Mix and Match!

What is important to learn

Concurrency control and recovery (10%)

You might also like