0% found this document useful (0 votes)
245 views31 pages

4.big Data Technology Landscape

The document provides an overview of the Big Data technology landscape, focusing on NoSQL and NewSQL databases, as well as the Hadoop framework. It outlines the characteristics, advantages, and differences between SQL, NoSQL, and NewSQL, while also detailing the features and versions of Hadoop. The content is structured for a learning session, including objectives, outcomes, and further readings.

Uploaded by

srujanashetty33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
245 views31 pages

4.big Data Technology Landscape

The document provides an overview of the Big Data technology landscape, focusing on NoSQL and NewSQL databases, as well as the Hadoop framework. It outlines the characteristics, advantages, and differences between SQL, NoSQL, and NewSQL, while also detailing the features and versions of Hadoop. The content is structured for a learning session, including objectives, outcomes, and further readings.

Uploaded by

srujanashetty33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Big Data

And Analytics

Seema Acharya
Subhashini Chellappan

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Chapter 4

The Big Data Technology Landscape

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Learning Objectives and Learning Outcomes

Learning Objectives Learning Outcomes

The big data technology


landscape
a) To understand the
1. What is NoSQL databases? significance of NoSQL
databases.
2. Why NoSQL?
b) To understand the need for
3. Key advantages of NoSQL. NewSQL.

4. What is NewSQL? c) To understand the Hadoop


platform and be able to
5. SQL Vs. NoSQL. appreciate the difference
between Hadoop 1.0 and
6. Getting familiar with Hadoop. Hadoop 2.0.

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Session Plan

Lecture time 45 to 60 minutes

Q/A 15 minutes

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Agenda

 NoSQL
 What is it?
 Types of NoSQL Databases
 Why NoSQL?
 Advantages of NoSQL
 NoSQL Vendors
 SQL versus NoSQL
 NewSQL
 Comparison of SQL, NoSQL and NewSQL
 Hadoop
 Features of Hadoop
 Key Advantages of Hadoop
 Versions of Hadoop
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
What is NoSQL?

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
What is NoSQL?

Non-relational data storage systems

No fixed table schema

No Joins
NoSQL

No multi-document transactions

Relaxes one or more ACID properties

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Types of NoSQL

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Types of NoSQL

Key value data Column-oriented Document data Graph data


store data store store store

• Riak • Cassandra • MongoDB • InfiniteGraph


• Redis • HBase • CouchDB • Neo4
• Membase • HyperTable • RavenDB • Allegro Graph

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Advantages of NoSQL

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Advantages of NoSQL

Cheap, Easy to implement

Easy to distribute

Can easily scale up & down


Advantages of NoSQL
Relaxes the data consistency
requirement

Doesn’t require a pre-defined


schema

Data can be replicated to


multiple nodes and can be
partitioned

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
NoSQL Vendors

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
NoSQL Vendors

Company Product Most widely used by

Amazon DynamoDB LinkedIn, Mozilla

Facebook Cassandra Netflix, Twitter, eBay

Google BigTable Adobe Photoshop

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
SQL Vs. NoSQL

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
SQL Vs. NoSQL

SQL NoSQL
Relational database Non-relational, distributed database
Relational model Model-less approach
Pre-defined schema Dynamic schema for unstructured data
Table based databases Document-based or graph-based or wide column store or
key-value pairs databases
Vertically scalable (by increasing system Horizontally scalable (by creating a cluster of
resources) commodity machines)
Uses SQL Uses UnQL (Unstructured Query Language)
Not preferred for large datasets Largely preferred for large datasets
Not a best fit for hierarchical data Best fit for hierarchical storage as it follows the key-
value pair of storing data similar to JSON (Java Script
Object Notation)
Emphasis on ACID properties Follows Brewer’s CAP theorem
Excellent support from vendors Relies heavily on community support
Supports complex querying and data Does not have good support for complex querying
keeping needs
Can be configured for strong consistency Few support strong consistency (e.g., MongoDB), few
others can be configured for eventual consistency (e.g.,
Cassandra)
Examples: Oracle, DB2, MySQL, MS SQL, MongoDB, HBase, Cassandra, Redis, Neo4j, CouchDB,
PostgreSQL, etc. Couchbase, Riak, etc.

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
NewSQL

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
NewSQL

SQL interface for application interaction

ACID support for transactions

Characteristics of NewSQL An architecture that provides higher per node


performance vis-a-vs traditional RDBMS solution

Scale out, shared nothing architecture

Non-locking concurrency control mechanism so


that real time reads will not conflict with writes

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
SQL Vs. NoSQL Vs. NewSQL

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
SQL Vs. NoSQL Vs. NewSQL

SQL NoSQL NewSQL


Adherence to ACID Yes No Yes
properties
OLTP/OLAP Yes No Yes
Schema rigidity Yes No Maybe
Adherence to data model Adherence to
relational model
Data Format Flexibility No Yes Maybe
Scalability Scale up Scale out Scale out
Vertical Scaling Horizontal Scaling
Distributed Computing Yes Yes Yes
Community Support Huge Growing Slowly
growing

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Hadoop

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Hadoop

Hadoop
Apache Open-Source Software Framework

Inspired by
- Google MapReduce
- Google File System

Hadoop Distributed File System


MapReduce

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Key Advantages of Hadoop

 Stores data in its native format


 Scalable
 Cost-effective
 Resilient to failure
 Flexibility
 Fast

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Versions of Hadoop

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Versions of Hadoop

Hadoop 1.0 Hadoop 2.0

MapReduce
MapReduce Others
(Cluster Resource Manager
(Data Processing) (Data Processing)
& Data Processing)

HDFS YARN
(redundant, reliable storage) (Cluster Resource Manager)
HDFS
(redundant, reliable storage)

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Answer a few quick questions …

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Fill in the blanks

1. The expansion for CAP is _____________, ____________ and ___________________.


2. The expansion of BASE is ___________________.
3. MongoDB is ___________________ and ___________________.
4. Cassandra is ___________________ and ___________________.
5. ___________________ has no support for ACID properties of transactions.
6. ___________________ is a robust database that supports ACID properties of
transactions and has the scalability of NoSQL.

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Answer Me

 Cite the difference between Hadoop 1.0 and Hadoop 2.0.

 Compare and contrast SQL, NoSQL and NewSQL.

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Summary please…

Ask a few participants of the learning program to summarize the lecture.

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
References …

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Further Readings

 http://www.mongodb.com/nosql-explained
 http://nosql-database.org/
 http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-
mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
 http://hadoop.apache.org/

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.
Thank you

Big Data and Analytics by Seema Acharya and Subhashini Chellappan


Copyright 2015, WILEY INDIA PVT. LTD.

You might also like