BIG DATA AND ANALYTICS
Course Code: CS8T02 Credit: 4-0-0-4
COURSE OBJECTIVES :
Understand the Big Data Platform and its Use cases
Introduce students the concept and challenge of big data
Provide HDFS Concepts and Interfacing with HDFS
Teach students in applying skills and tools to manage and analyze the big data.
UNIT I: Getting an Overview of Big Data 10 Hrs
What is Big Data? History of Data Management-Evolution of Big Data, Structuring Big Data-
Types of Data, Elements of Data, Advantages of Big Data Analytics Introducing Technologies
for Handling Big Data Distributed and Parallel Computing for Big Data, Introducing Hadoop,
Cloud Computing and Big Data: Cloud Delivery Models, Cloud Services for Big Data, Cloud
Providers in Big Data Market, In-Memory Computing Technology for Big Data.
UNIT II: Understanding Hadoop Ecosystem 10 Hrs
Hadoop Ecosystem, Hadoop Distributed File System: HDFS Architecture, Concept of Blocks in
HDFS in HDFS Architecture, NameNodes and DataNodes, The Command-line Interface, Using
HDFS Files, HDFS High Availability, Features of HDFS, MapReduce, Hadoop YARN,
Introducing HBase: HBase Architecture, Regions, Storing Big Data with Hbase, Interacting with
Hadoop Ecosystem, Hbase in Operation – Programming with HBase,
Combining HBase and HDFS: REST and Thrift, Data Integrity in HDFS, Features of HBase,
Hive, Pig and Pig Latin, Sqoop, Zookeeper, Flume, Oozie
.
UNIT III: Understanding MapReduce Fundamentals and HBase 11 Hrs
The MapReduce Framework: Exploring the Features of MapReduce , working of MapReduce,
Exploring Map and Reduce functions.
Techniques to Optimize MapReduce Jobs : Harware / Network Topology, Synchronization, File
System. Uses of MapReduce, Role of HBase in Big data Processing : Characteristics of HBase,
Installation of HBase.
UNIT IV: Introduction to MongoDB and Cassandra 10 Hrs
Introduction to MongoDB: What is and Why MongoDB? Terms used in RDBMS and
MongoDB, Data types in MongoDB,MongoDB Query language.
Apache Cassandra, features, CQL data types, CQLSH, key spaces, CRUD, collections, TTL,
using a counter, ALTER commands, import and export, query system tables.
UNIT V: Introduction to Hive and Pig 11 Hrs
what is Hive? , Hive Architecture, Hive Data Types, Hive File Format, Hive Query Language
(HQL), RCFile Implementation, SerDe, User-defined Function(UDF).
What is Pig? The Anatomy of Pig, Pig on Hadoop , Pig Philosophy, Use Case for Pig: ETL
Processing, Pig Latin Overview , Data Types in Pig ,Running Pig , Execution Modes of Pig
,HDFS Commands ,Relational Operators, Eval Function, Complex Data Types ,Piggy Bank,
User- Defined Functions (UDF) ,Parameter Substitution , Diagnostic Operator , Word Count
Example using Pig ,When to use Pig? When not to use Pig? Pig at Yahoo! ,Pig versus Hive .
TEXT BOOK
1. Big Data: Black Book :Dt Editorial Services, Dreamtech Press, Edition 2016 (Chapter 1).
2. Big Data and Analytics, Seema Acharya, Subhashini Chellappan, Infosys Limited,
Publication:Wiley India Private Limited,1st Edition 2015.
REFERENCE BOOKS
1. Hadoop in Practice, Alex Holmes, Manning Publications Co., September 2014, Second
Edition.
2. Programming Pig, Alan Gates, O’Reilly, Kindle Publication.
3. Programming Hive, Dean Wampler, O’Reilly, Kindle Publication
COURSE OUTCOMES
1. Identify the characteristics of datasets and compare the trivial data and big data for
various applications.
2. Demonstrate an open source software framework called Hadoop and supported tool to
empower any meaningful conversation on Big data and analytics.
3. Compare and Contrast different Hadoop supporting tools with traditional tool
4. How Big Data can be analyzed to extract knowledge and apply tools for bigdata analytics