DR. A.P.J.
ABDUL KALAM TECHNICAL UNIVERSITY, UTTAR PRADESH, LUCKNOW
Big Data (BCS061)
Course Outcome (CO) Bloom’s Knowledge Level (KL)
At the end of course , the student will be able to
CO 1 Demonstrate knowledge of Big Data Analytics concepts and its applications in business. K1,K2
CO 2 Demonstrate functions and components of Map Reduce Framework and HDFS. K1,K2
CO 3 Discuss Data Management concepts in NoSQL environment. K6
CO 4 Explain process of developing Map Reduce based distributed processing applications. K2,K5
CO 5 Explain process of developing applications using HBASE, Hive, Pig etc. K2,K5
DETAILED SYLLABUS 3-0-0
Unit Topic Proposed
Lectures
Introduction to Big Data: Types of digital data, history of Big Data innovation, introduction
to Big Data platform, drivers for Big Data, Big Data architecture and characteristics, 5 Vs of
Big Data, Big Data technology components, Big Data importance and applications, Big Data
I
features – security, compliance, auditing and protection, Big Data privacy and ethics, Big 06
Data Analytics, Challenges of conventional systems, intelligent data analysis, nature of data,
analytic processes and tools, analysis vs reporting,
modern data analytic tools.
Hadoop: History of Hadoop, Apache Hadoop, the Hadoop Distributed File System,
components of Hadoop, data format, analyzing data with Hadoop, scaling out, Hadoop
streaming, Hadoop pipes, Hadoop Echo System.
II 08
Map Reduce: Map Reduce framework and basics, how Map Reduce works, developing a
Map Reduce application, unit tests with MR unit, test data and local tests, anatomy of a Map
Reduce job run, failures, job scheduling, shuffle and sort, task execution, Map Reducetypes,
input formats, output formats, Map Reduce features, Real-world Map Reduce
HDFS (Hadoop Distributed File System): Design of HDFS, HDFS concepts, benefits and
challenges, file sizes, block sizes and block abstraction in HDFS, data replication, how does
HDFS store, read, and write files, Java interfaces to HDFS, command line interface, Hadoop
III file system interfaces, data flow, data ingest with Flume and Scoop, Hadoop archives, 08
Hadoop I/O: compression, serialization, Avro and file-based data structures.
Hadoop Environment: Setting up a Hadoop cluster, cluster specification, cluster setup
and installation, Hadoop configuration, security in Hadoop, administering Hadoop, HDFS
monitoring & maintenance, Hadoop benchmarks, Hadoop in the cloud
Hadoop Eco System and YARN: Hadoop ecosystem components, schedulers, fair and
capacity, Hadoop 2.0 New Features - NameNode high availability, HDFS federation,MRv2,
YARN, Running MRv1 in YARN.
NoSQL Databases: Introduction to NoSQL
IV MongoDB: Introduction, data types, creating, updating and deleing documents, querying, 09
introduction to indexing, capped collections
Spark: Installing spark, spark applications, jobs, stages and tasks, Resilient Distributed
Databases, anatomy of a Spark job run, Spark on YARN
SCALA: Introduction, classes and objects, basic types and operators, built-in control
structures, functions and closures, inheritance.
Hadoop Eco System Frameworks: Applications on Big Data using Pig, Hive and HBase
V 09
Pig - Introduction to PIG, Execution Modes of Pig, Comparison of Pig with Databases,
Grunt, Pig Latin, User Defined Functions, Data Processing operators,
Curriculum & Evaluation Scheme: CS, Computer Engineering and CSE (V & VI Semester) 26
DR. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY, UTTAR PRADESH, LUCKNOW
Hive - Apache Hive architecture and installation, Hive shell, Hive services, Hive metastore,
comparison with traditional databases, HiveQL, tables, querying data and user defined
functions, sorting and aggregating, Map Reduce scripts, joins & subqueries.
HBase – Hbase concepts, clients, example, Hbase vs RDBMS, advanced usage, schema
design, advance indexing, Zookeeper – how it helps in monitoring a cluster, how to build
applications with Zookeeper.
IBM Big Data strategy, introduction to Infosphere, BigInsights and Big Sheets, introduction
to Big SQL.
Text books and References:
1. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging Business
Intelligence and Analytic Trends for Today's Businesses", Wiley
2. DT Editorial Services, Big-Data Black Book, Wiley
3. Dirk deRoos, Chris Eaton, George Lapis, Paul Zikopoulos, Tom Deutsch, “Understanding Big Data Analytics for
Enterprise Class Hadoop and Streaming Data”, McGraw Hill.
4. Thomas Erl, Wajid Khattak, Paul Buhler, “Big Data Fundamentals: Concepts, Drivers and Techniques”, Prentice
Hall.
5. Bart Baesens “Analytics in a Big Data World: The Essential Guide to Data Science and its Applications (WILEY
Big Data Series)”, John Wiley & Sons
6. ArshdeepBahga, Vijay Madisetti, “Big Data Science & Analytics: A HandsOn Approach “, VPT
7. Anand Rajaraman and Jeffrey David Ullman, “Mining of Massive Datasets”, CUP
8. Tom White, "Hadoop: The Definitive Guide", O'Reilly.
9. Eric Sammer, "Hadoop Operations", O'Reilly.
10. Chuck Lam, “Hadoop in Action”, MANNING Publishers
11. Deepak Vohra, “Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools”,
Apress
12. E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilly
13. Lars George, "HBase: The Definitive Guide", O'Reilly.
14. Alan Gates, "Programming Pig", O'Reilly.
15. Michael Berthold, David J. Hand, “Intelligent Data Analysis”, Springer
16. Bill Franks, “Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced
Analytics”, John Wiley & sons
17. Glenn J. Myatt, “Making Sense of Data”, John Wiley & Sons
18. Pete Warden, “Big Data Glossary”, O’Reilly
Curriculum & Evaluation Scheme: CS, Computer Engineering and CSE (V & VI Semester) 27