Big Data
And Analytics
Seema Acharya
Subhashini Chellappan
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Chapter 4
The Big Data Technology Landscape
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Learning Objectives and Learning Outcomes
Learning Objectives Learning Outcomes
The big data technology
landscape
a) To understand the
1. What is NoSQL databases? significance of NoSQL
databases.
2. Why NoSQL?
b) To understand the need for
3. Key advantages of NoSQL. NewSQL.
4. What is NewSQL? c) To understand the Hadoop
platform and be able to
5. SQL Vs. NoSQL. appreciate the difference
between Hadoop 1.0 and
6. Getting familiar with Hadoop. Hadoop 2.0.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Session Plan
Lecture time 45 to 60 minutes
Q/A 15 minutes
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Agenda
NoSQL
What is it?
Types of NoSQL Databases
Why NoSQL?
Advantages of NoSQL
NoSQL Vendors
SQL versus NoSQL
NewSQL
Comparison of SQL, NoSQL and NewSQL
Hadoop
Features of Hadoop
Key Advantages of Hadoop
Versions of Hadoop
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
What is NoSQL?
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
What is NoSQL?
Non-relational data storage systems
No fixed table schema
No Joins
NoSQL
No multi-document transactions
Relaxes one or more ACID properties
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Types of NoSQL
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Types of NoSQL
Key value data Column-oriented Document data Graph data
store data store store store
• Riak • Cassandra • MongoDB • InfiniteGraph
• Redis • HBase • CouchDB • Neo4
• Membase • HyperTable • RavenDB • Allegro Graph
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Advantages of NoSQL
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Advantages of NoSQL
Cheap, Easy to implement
Easy to distribute
Can easily scale up & down
Advantages of NoSQL
Relaxes the data consistency
requirement
Doesn’t require a pre-defined
schema
Data can be replicated to
multiple nodes and can be
partitioned
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
NoSQL Vendors
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
NoSQL Vendors
Company Product Most widely used by
Amazon DynamoDB LinkedIn, Mozilla
Facebook Cassandra Netflix, Twitter, eBay
Google BigTable Adobe Photoshop
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
SQL Vs. NoSQL
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
SQL Vs. NoSQL
SQL NoSQL
Relational database Non-relational, distributed database
Relational model Model-less approach
Pre-defined schema Dynamic schema for unstructured data
Table based databases Document-based or graph-based or wide column store or
key-value pairs databases
Vertically scalable (by increasing system Horizontally scalable (by creating a cluster of
resources) commodity machines)
Uses SQL Uses UnQL (Unstructured Query Language)
Not preferred for large datasets Largely preferred for large datasets
Not a best fit for hierarchical data Best fit for hierarchical storage as it follows the key-
value pair of storing data similar to JSON (Java Script
Object Notation)
Emphasis on ACID properties Follows Brewer’s CAP theorem
Excellent support from vendors Relies heavily on community support
Supports complex querying and data Does not have good support for complex querying
keeping needs
Can be configured for strong consistency Few support strong consistency (e.g., MongoDB), few
others can be configured for eventual consistency (e.g.,
Cassandra)
Examples: Oracle, DB2, MySQL, MS SQL, MongoDB, HBase, Cassandra, Redis, Neo4j, CouchDB,
PostgreSQL, etc. Couchbase, Riak, etc.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
NewSQL
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
NewSQL
SQL interface for application interaction
ACID support for transactions
Characteristics of NewSQL An architecture that provides higher per node
performance vis-a-vs traditional RDBMS solution
Scale out, shared nothing architecture
Non-locking concurrency control mechanism so
that real time reads will not conflict with writes
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
SQL Vs. NoSQL Vs. NewSQL
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
SQL Vs. NoSQL Vs. NewSQL
SQL NoSQL NewSQL
Adherence to ACID Yes No Yes
properties
OLTP/OLAP Yes No Yes
Schema rigidity Yes No Maybe
Adherence to data model Adherence to
relational model
Data Format Flexibility No Yes Maybe
Scalability Scale up Scale out Scale out
Vertical Scaling Horizontal Scaling
Distributed Computing Yes Yes Yes
Community Support Huge Growing Slowly
growing
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Hadoop
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Hadoop
Hadoop
Apache Open-Source Software Framework
Inspired by
- Google MapReduce
- Google File System
Hadoop Distributed File System
MapReduce
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Key Advantages of Hadoop
Stores data in its native format
Scalable
Cost-effective
Resilient to failure
Flexibility
Fast
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Versions of Hadoop
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Versions of Hadoop
Hadoop 1.0 Hadoop 2.0
MapReduce
MapReduce Others
(Cluster Resource Manager
(Data Processing) (Data Processing)
& Data Processing)
HDFS YARN
(redundant, reliable storage) (Cluster Resource Manager)
HDFS
(redundant, reliable storage)
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Answer a few quick questions …
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Fill in the blanks
1. The expansion for CAP is _____________, ____________ and ___________________.
2. The expansion of BASE is ___________________.
3. MongoDB is ___________________ and ___________________.
4. Cassandra is ___________________ and ___________________.
5. ___________________ has no support for ACID properties of transactions.
6. ___________________ is a robust database that supports ACID properties of
transactions and has the scalability of NoSQL.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Answer Me
Cite the difference between Hadoop 1.0 and Hadoop 2.0.
Compare and contrast SQL, NoSQL and NewSQL.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Summary please…
Ask a few participants of the learning program to summarize the lecture.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
References …
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Further Readings
http://www.mongodb.com/nosql-explained
http://nosql-database.org/
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-
mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
http://hadoop.apache.org/
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Thank you
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.