The Hadoop Ecosystem:
So much free stuff!
After this video you will be able to..
• Differentiate the major layers in the Hadoop
ecosystem
• Recognize key tools of the Hadoop
ecosystem including HDFS, YARN, and
MapReduce
Yahoo created
Hadoop in 2005
More Big Data frameworks released
Now there’s over a 100!
Layer Diagram
D
B C
A
One possible layer diagram for Hadoop
Hive Pig
Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
MongoDB
Zookeeper
YARN
HDFS
One possible layer diagram for Hadoop
Higher levels:
Interactivity
Hive Pig
Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
MongoDB
Zookeeper
YARN
HDFS
Lower levels:
Storage and scheduling
Distributed file system as foundation
Scalable storage
Fault tolerance
Hive Pig Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
YARN
HDFS
Flexible scheduling and
resource management
YARN schedules jobs on
Hive >Pig40,000 servers at Yahoo
Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
YARN
HDFS
Simplified programming model
Map apply()
Reduce summarize()
Hive Pig Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
YARN
Google used MapReduce
for indexing web sites
HDFS
Higher-level programming models
Pig = dataflow scripting
Hive = SQL-like queries
Hive Pig Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
Pig created at Yahoo,
MongoDB
YARN
Hive created at Facebook
HDFS
Specialized models
for graph processing
Giraph used by Facebook
to analyze social graphs
Hive Pig Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
YARN
HDFS
Real-time and
in-memory processing
In-memory 100x faster
for some tasks
Hive Pig Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
YARN
HDFS
NoSQL for non-files
Key-values
Sparse tables
Hive Pig Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
Zookeeper
YARN for Facebook’s
MongoDB
HBase used
Messaging Platform
HDFS
Zookeeper for management
Synchronization
Configuration
High-availability
Hive Pig Giraph
Spark
Storm
Flink
Created by Yahoo to wrangle
MapReduce
HBase
Cassandra
Zookeeper
MongoDB
services
YARN named after animals
HDFS
All these tools are open-source
All these tools are open-source
Large community
for support
All these tools are open-source
Large community
for support
Download separately
or part of pre-built image
All these tools are open-source
Large community
for support
Download separately
or part of pre-built image
Hive Pig
Giraph
Spark
Storm
Flink
MapReduce
HBase
Cassandra
MongoDB
Zookeeper
YARN
HDFS
Growing number of open-source tools