CSE 3002 Big Data Technologies - 7sem
CSE 3002 Big Data Technologies - 7sem
SEMESTER/YEAR : 7/ 4th
Ms. Ayesha Taranum, Mr. Krishna Mehar P Tirumala, Ms. Kimmi Kumari, Mr. Praveen P
PROGRAM OUTCOMES :
PO-1: Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals, and
an engineering specialization to the solution of complex engineering problems.
PO-2: Problem analysis: Identify, formulate, review research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences.
PO-3: Design/development of solutions: Design solutions for complex engineering problems and design system
components or processes that meet the specified needs with appropriate consideration for the public health and
safety, and the cultural, societal, and environmental considerations.
PO-4: Conduct investigations of complex problems: Use research-based knowledge and research methods including
design of experiments, analysis and interpretation of data, and synthesis of the information to provide valid
conclusions.
PO-5: Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern engineering
and IT tools including prediction and modeling to complex engineering activities with an understanding of the
limitations.
PO-6: The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal, health,
safety, legal and cultural issues and the consequent responsibilities relevant to the professional engineering practice.
PO-7: Environment and sustainability: Understand the impact of the professional engineering solutions in societal and
environmental contexts, and demonstrate the knowledge of and need for sustainable development.
PO-8: Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the
engineering practice.
PO-9: Individual and teamwork: Function effectively as an individual, and as a member or leader in diverse
teams, and in multidisciplinary settings.
PO-10: Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports and design
documentation, make effective presentations, and give and receive clear instructions.
PO-11: Project management and finance: Demonstrate knowledge and understanding of the engineering and
management principles and apply these to one's own work, as a member and leader in a team, to manage
projects and in multidisciplinary environments.
PO-12: Life-long learning: Recognize the need for, and have the preparation and ability to engage in independent and
life-long learning in the broadest context of technological change.
PSO 01. [Problem Analysis]: Identify, formulate, research literature, and analyse complex engineering
problems related to Software Engineering principles & practice, Programming, Big Data computing &
analytics Substantiated conclusions using first principles of mathematics, natural sciences, and engineering
sciences.
PSO 02. [Design/development of Solutions]: Design solutions for complex engineering problems related to
Software Engineering principles & practice, Programming, Big Data Computing & analytics and design system
components or processes that meet the specified needs with appropriate consideration for the public health and
safety, and the cultural, societal, and environmental considerations.
PSO 03. [Modern Tool usage]: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modelling to complex engineering activities related to
Software Engineering principles & practice, Programming, Big Data Computing & analytics with an
understanding of the limitations.
COURSE PREREQUISITES:
Database Management System (DDL, DML of SQL Queries and Creation of Class & object, interface, reading &
writing a file, control statements in java programming).
COURSE DESCRIPTION:
The purpose of the course is to provide the fundamentals of Big data technology, to emphasize the importance of
choosing suitable tools for processing and analyzing big data to gain insights.
The student should have knowledge and skill to select and use most appropriate big data tools to solve business
problems. The associated laboratory provides an opportunity to implement the concepts and enhance critical thinking
and analytical skills.
With a good knowledge in the fundamentals of Big data technology the student can gain practical experience in
implementing them, enabling the student to be an effective solution provider for applications that involve huge volume
of data.
COURSE OBJECTIVES:
The objective of the course is to familiarize the learners with the concepts of Big Data Technologies and attain Skill
Development through experiential Learning techniques.
COURSE OUTCOMES: On successful completion of the course the students shall be able to
CO PO-1 PO-2 PO-3 PO-4 PO-5 PO-6 PO-7 PO-8 PO-9 PO-10 P0-11 P0-12
No.
CO1 M H H - H - - - M L L -
CO2 M H M - M - - - M L L -
CO3 M H M - H - - - M L L -
Introduction to Big Data and its importance: Basics of Distributed File System, Four Vs, Drivers for Big data, Big
data applications, Structured, unstructured, semi-structured and quasi structured data. Big data Challenges-
Traditional versus big data approach, The Big Data Technology Landscape: No-SQL.
The Hadoop: History of Hadoop-Hadoop use cases, The Design of HDFS, Blocks and replication management, Rack
awareness, HDFS architecture, HDFS Federation, Name node and data node, Anatomy of File write. Anatomy of File
read, Hadoop Map Reduce paradigm, Map and reduce tasks, Job Tracker and task tracker, Map reduce execution
pipeline, Key value pair, Shuffle and sort, Combiner and Partitioner, APIs used to Write/Read files into/from Hadoop,
Need for Flume and Sqoop.
Anatomy of a YARN: Hadoop 2.0 Features, Name Node High Availability, YARN Architecture, Introduction to
Schedulers, YARN scheduler policies, FIFO, Fair and Capacity scheduler
Introduction to SQOOP: SQOOP features, Sqoop Architecture, Sqoop Import All Tables, Sqoop Export All Tables,
Sqoop Connectors, Sqoop Import from MySQL to HDFS, Sqoop vs flume.
Hive: Apache Hive with Hive Installation, Hive Data Types, Hive Table partitioning, Hive DDL commands, Hive
DML commands, and Hive sort by vs. order by, Hive Joining tables, Hive bucketing.
Hbase: Introduction to HBase and its working architecture- Commands for creation and listing of tables- disabled and
is disabled of table - enable and is enabled of table- describing and dropping of table-Put and Get command - delete
and delete all command-commands for scan, count, truncate of tables.
Introduction to Apache Spark A unified Spark, Who uses Spark and for what? A Brief History of Spark, Spark version
and releases, Storage layers for Spark. Programming with RDDs: RDD Basics, Creating RDDs, RDD Operations,
Passing functions to Spark, Common Transformations and Actions, Persistence. Spark SQL: Linking with Spark SQL,
Using Spark SQL in Applications, Loading and Saving Data, JDBC/ODBC Server, User-defined functions, Spark
SQL Performance.
REFERENCE MATERIALS:
(i) Textbooks
T1. Big Data and Analytics- Seema Acharya, Subhashini Chellappan-2016, 2nd Edition, Wiley Publication.
T2. Analytics in a Big data world- Bart Baesens- 2nd Edition, Wiley Publication. 2018
R1. Big data Analytics, Radha Shankarmani and vijayalakshmi second edition wiley publication 2017
R3. Tom White, “Hadoop: The Definitive Guide”, 3rd Edition, O’reilly. 2016
Ebook:
http://182.72.188.195/cgi-bin/koha/opac-
search.pl?idx=ti&q=Big%20data%20and%20analytics&sort_by=relevance_dsc&count=200&limit=au:Acharya,%2
0Seema
SPECIFIC GUIDELINES TO STUDENTS:
COURSE SCHEDULE:
Introduction to T2,CH-2
Schedulers, YARN Participa (12-15)
scheduler policies, tive
FIFO, Fair and Capacity
L1 Learning
scheduler
8 L10 LO1: Define L2 - CO1
Schedulers. PPT /
LO2: Explain FIFO, Interacti
Fair and Capacity ve
scheduler Lecture
User-defined functions,
Spark SQL
Performance, Scala: The PPT / R2:CH11
Basics L1
23 L25 - CO3 Interactive (Pg.216-
LO1: list User-defined
L3 Lecture 243)
functions
LO2: Demonstrate
Spark SQL Performance
Control Structures and
functions, Working with
arrays, Maps and Tuples L1 PPT /
LO1: Describe Maps R1:CH4
24 L26 - CO3 Interactive
and Tuples concepts. L3 (Pg.99-119)
Lecture
LO2: Apply Control
Structures and functions
CO1, PPT /
Revision and Conclusion of the
L27 CO2, Interactive
Course
CO3 Lecture
COURSE CONTENT &TASK SCHEDULE FOR LABORATORY COMPONENT:
Number
of Lab Course
Task & Learning Outcome RBT
Sessions Skills Outco
Sl. Task require to be me to
No. No LO: Student shall be able to LOL HOL d to develo be
(Lower (Higher complet ped develo
Order Order e the ped
Learning) Learning) task
Installation of Hadoop single node L1 -
cluster using Ubuntu operating L3
system SK1,
01 P1 LO1: Explain Hadoop 1 SK3, CO 1
concepts. SK5,
LO2: Demonstrate Installation of SK8
Hadoop single node cluster
L2 - SK1,
Working with Hadoop Commands L3 SK2,
LO1: Describe Hadoop SK3,
SK5,
02 P2 concepts. 1
SK6,
CO 1
LO2: Demonstrate various SK7,
Hadoop Commands SK8,
SK9
L1 -
L3
Word Count analysis using sample SK1,
data set (MapReduce) SK2,
LO1: Describe MapReduce SK3,
03 P3 1 CO 2
concepts. SK5,
LO2: Demonstrate Word Count SK6,
analysis SK7,
SK8,
SK9
L1 -
Stock analysis using sample data set L3
(MapReduce) SK1,
LO1: Describe MapReduce SK2,
concepts. SK3,
04 P4 2
SK5,
CO 2
LO2: Demonstrate Stock analysis
program SK6,
SK7,
SK8,
SK9
Web log analysis using sample data L1 - SK1,
set (MapReduce) L3 SK2,
LO1: Describe MapReduce SK3,
05 P5
concepts. 2 SK5, CO 2
LO2: Demonstrate Web log SK6,
analysis program SK7,
Number
of Lab Course
Task & Learning Outcome RBT
Sessions Skills Outco
Sl. Task require to be me to
No. No LO: Student shall be able to LOL HOL d to develo be
(Lower (Higher complet ped develo
Order Order e the ped
Learning) Learning) task
SK8,
SK9
Topics relevant to Entrepreneurial Skills: Project Life Cycle, Risk Management, Project Planning for Entrepreneurship
Development through Problem Solving methodologies/Participative Learning Techniques/ Experiential Learning
Techniques.
This is attained through the Assignment/ Presentation/ Lab experiments as mentioned in the assessment component
ASSESSMENT SCHEDULE:
1 Assignmen CO1 1 10 5%
t [Review Module 1
of digital /
e-resources https://web.
from Pres. s.ebscohost
Univ. link .com/ehost/
given in the detail/detail
References ?vid=9&sid
Section - =cbc51846
(Mandatory -7bf7-
to submit 482b-8aac-
screenshot fbd99ab97e
accessing e4%40redis
digital &bdata=Jn
resource. NpdGU9Z
Otherwise Whvc3Qtb
it will not Gl2ZQ%3d
be %3d#
evaluated]
2 Mini Module 1 CO1 30 15%
Project
2 Midterm Module 1, 2 CO1, CO2 1.30hr 60 30% 22-10-2023
3 Endterm All CO1,CO2, 3hr 100 50%
Lab modules CO3
TABLE 8: TARGET SET FOR ATTAINMENT OF EACH CO and ATTAINMENT ANALYSIS AFTER
RESULTS
Sl.no C.O. Course Outcomes Threshold Target set Actual C.O. Remarks on
No. Set for the for Attainment attainment
CO attainment In &Measures
in Percentage to enhance
percentage the
attainment
* *
01 CO1 Apply Map-Reduce
programming on the given 65 65%
datasets to extract required
insights.
02 CO2 Employ appropriate Hadoop
Ecosystem tools such as 65 65%
scoop, Hbase, Hive, to
perform data analytics for a
given problem.
03 CO3 Use Spark tool to analyze the
given dataset for a given 60 60%
problem
APPROVAL:
Name and signature of the Instructor In-Charge (s) AFTER completing entries in Table number 3 and 8 at end of
semester:
Name and signature of the DAC Chairperson AFTER completing entries in Table number 3 and 8 at end of semester:
BLOOM'S TAXONOMY SAMPLE VERBS
Learning Outcomes Verbs at Each Bloom Taxonomy Level to be used for writing the course Outcomes.