0% found this document useful (0 votes)

28 views33 pages

Ibda Course File

The document outlines the course structure for 'Introduction to Big Data Analytics' (Course Code: 2012PE05) for B.Tech students in the Department of Information Technology for the academic year 2024-2025. It includes course objectives, outcomes, specific outcomes, and a detailed syllabus covering topics such as Hadoop, MapReduce, and data visualization tools. The document also emphasizes the department's vision and mission to empower students, particularly women, in the field of Information Technology.

Uploaded by

Raj kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views33 pages

Ibda Course File

Uploaded by

Raj kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

DEPARTMENT OF INFORMATION TECHNOLOGY

COURSE FILE
ACADEMIC YEAR: 2024-2025

Course Title INTRODUCTION TO BIG DATA ANALYTICS

Course Code 2012PE05

Program B.Tech.

Year & Semester IV Year II Sem

Course Type PROFESSIONAL ELECTIVE- 1

Regulation R20

Theory Practical

Lecture Tutorials Credits Laboratory Credits

Course Structure
3 0 3 0 3

Course Coordinator Mr. SUVARNA SUNIL KUMAR

Faculty In-Charge HOD

DEPARTMENT OF INFORMATION TECHNOLOGY

INSTITUTE VISION

• Visualizing a great future for the intelligentsia by imparting state-of the art
Technologies in the field of Engineering and Technology for the bright future and
prosperity of the students.

• To offer world class Name training to the promising Engineers.

DEPARTMENT OF INFORMATION TECHNOLOGY

INSTITUTE MISSION

• To nurture high level of Decency, Dignity and Discipline in women to attain high
intellectual abilities.
• To produce employable students at National and International levels by effective training
programmes.
• To create pleasant academic environment for generating high level learning attitudes
DEPARTMENT OF INFORMATION TECHNOLOGY

DEPARTMENT VISION

To empower women in the field of Information Technology through quality education,

nurturing them into globally competent professionals with strong technical skills, ethical
values, and leadership qualities, ready to meet the challenges of the evolving IT industry.
DEPARTMENT OF INFORMATION TECHNOLOGY
DEPARTMENT MISSION

❖ M1: To offer a high quality education that integrates cutting-edge technologies,

nurtures creativity and analytical skills, and shapes ethical, globally competitive
professionals.
❖ M2: To develop leadership qualities and enhance employability through hands-on
training, industry collaboration, and research in emerging technologies,
preparing women to address the dynamic challenges of the IT sector.
❖ M3: To impart technological education with a strong emphasis on dignity, decency
and discipline to develop professional engineers who are both technically
component and socially responsible
DEPARTMENT OF INFORAMTION TECHNOLOGY
PROGRAMME EDUCATIONAL OBJECTIVES (PEO)

Technical Proficiency and Innovation Graduates will develop a solid foundation in Information
PEO 1 Technology, employing modern tools and innovative methodologies to effectively solve industry
challenges.
Leadership and Professional Excellence Graduates will demonstrate leadership abilities, effective
PEO 2
teamwork, and ethical practices, enabling them to achieve career success and contribute to the
global IT sector.
Lifelong Learning and Societal Impact Graduates will engage in continuous learning, adapt to
PEO 3
technological advancements, and apply their skills to positively influence society, with a special
emphasis on empowering students in the field of Information Technology.
DEPARTMENT OF INFORAMTION TECHNOLOGY
PROGRAMME SPECIFIC OUTCOMES (PSOs)

PSO 1 Problem Solving and Application Development Graduates will be able to analyze real-world
problems and design efficient IT solutions, applying programming, database management, and
software development methodologies.
PSO 2 Modern Tool Usage and Technological Adaptability Graduates will be proficient in using modern IT
tools and technologies, while continuously adapting to emerging trends to enhance system
development and deployment.
PSO 3 Professional Ethics and Societal Contributions Graduates will uphold professional ethics,
effectively contribute to team-based projects, and apply IT solutions to address societal needs,
with a focus on women’s empowerment and community development.
DEPARTMENT OF INFORMATION TECHNOLOGY
SYLLABUS
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
B.Tech. IV Year II Sem LTPC

3003

COURSE OBJECTIVES:
➢ Gain an understanding of what constitutes "Big Data" and the key characteristics
(e.g., volume, velocity, variety, and veracity).
➢ Learn about the challenges of working with big data and how these challenges
differ from traditional data analysis.
➢ Introduction to popular big data tools and platforms like Hadoop, Apache Spark,
and NoSQL databases.
➢ Learn how to set up, configure, and work with these technologies to process and
analyze big data.
➢ Learn various data analysis techniques including descriptive, diagnostic, predictive,
and prescriptive analytics.
➢ Explore how to visualize big data to derive actionable insights using tools like
Tableau, Power BI, or Python visualization libraries.

COURSE OUTCOMES:
➢ Demonstrate a clear understanding of the characteristics and challenges of big data
(volume, velocity, variety, and veracity).
➢ Identify the key differences between traditional data analysis and big data analytics.
➢ Gain hands-on experience with essential big data tools and technologies, such as
Hadoop, Apache Spark, NoSQL databases (e.g., MongoDB), and data storage
solutions (e.g., HDFS).
➢ Be able to configure and use big data tools to process large datasets efficiently.
➢ Effectively store, manage, and retrieve big data using appropriate data storage
systems and distributed computing frameworks.
➢ Apply data processing techniques such as MapReduce, Spark, and batch processing
to analyze large datasets.
UNIT-I INTRODUCTION TO BIG DATA Introduction to Big data: Overview,
Characteristics of Data, Evolution of Big Data, Definition of Big Data, Challenges
with Big Data. Big data analytics: Classification of Analytics, Importance and
challenges of big data, Terminologies, Data storage and analysis.

UNIT-II HADOOP TECHNOLOGY Introduction to Hadoop: A brief history of

Hadoop, Convolution approach versus Hadoop, Introduction to Hadoop
Ecosystem, Processing data with Hadoop, Hadoop distributors, Use case,
Challenge in Hadoop.

UNIT-III HADOOP FILE SYSTEM Introduction to Hadoop distributed file

system (HDFS): Overview, Design of HDFS, Concepts, Basic File systems vs.
Hadoop File systems, Local File System, File-Based Data Structures, Sequential
File, Map File. The Java Interface: Library Classes, inserting data from a Hadoop
URL,inserting data using the file system API, Storing Data.

UNIT-IV FUNDAMENTALS OF MAP REDUCE Introduction to Map reduce:

Its framework, Features of Map reduce, Its working, Analyze Map reduce
functions, Map reduce techniques to optimize job, Uses, Controlling input formats
in map reduce execution, Different phases in map reduce, Applications .
UNIT-V BIG DATA PLATFORMS Sqoop, Cassandra, Mongo DB, Hive, PIG,
Storm, Flink, Apache.

TEXT BOOK:
1. Seema Acharya, SubhashiniChellappan, “Big Data and
Analytics”, WileyPublications, First Edition,2015
REFERENCE BOOKS:
1. Judith Huruwitz, Alan Nugent, Fern Halper, Marcia
Kaufman, “Big datafor dummies”, John Wiley &
Sons, Inc.(2013)
2. Tom White, “Hadoop The Definitive Guide”, O’Reilly
Publications, FourthEdition, 2015
3. Dirk Deroos, Paul C.Zikopoulos, Roman B.Melnky, Bruce Brown, RafaelCoss,
DEPARTMENT OF INFORMATION TECHNOLOGY
ACCADAMIC CALENDER
DEPARTMENT OF INFORMATION TECHNOLOGY
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
COURSE OBJECTIVES
➢ Gain an understanding of what constitutes "Big Data" and the key characteristics
(e.g., volume, velocity, variety, and veracity).
➢ Learn about the challenges of working with big data and how these challenges
differ from traditional data analysis.
➢ Introduction to popular big data tools and platforms like Hadoop, Apache Spark,
and NoSQL databases.
➢ Learn how to set up, configure, and work with these technologies to process and
analyze big data.
➢ Learn various data analysis techniques including descriptive, diagnostic, predictive,
and prescriptive analytics.
➢ Explore how to visualize big data to derive actionable insights using tools like
Tableau, Power BI, or Python visualization libraries.
DEPARTMENT OF INFORMATION TECHNOLOGY
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
COURSE OUTCOMES

➢ Demonstrate a clear understanding of the characteristics and challenges of big data

(volume, velocity, variety, and veracity).
➢ Identify the key differences between traditional data analysis and big data analytics.
➢ Gain hands-on experience with essential big data tools and technologies, such as
Hadoop, Apache Spark, NoSQL databases (e.g., MongoDB), and data storage
solutions (e.g., HDFS).
➢ Be able to configure and use big data tools to process large datasets efficiently.
➢ Effectively store, manage, and retrieve big data using appropriate data storage
systems and distributed computing frameworks.
➢ Apply data processing techniques such as MapReduce, Spark, and batch processing
to analyze large datasets.
DEPARTMENT OF INFORMATION TECHNOLOGY
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
PROGRAMME SPECIFIC OUTCOMES-PSO’s

PSO 1: Students will gain hands-on experience with industry-standard tools and
technologies used to manage, process, and analyze large-scale data. They will
become proficient in using distributed computing frameworks like Apache Hadoop
and Apache Spark for big data processing..
PSO 2: Students will develop the ability to clean, preprocess, and transform raw data
into a usable format for analysis and machine learning. They will master data
cleaning techniques to handle missing values, remove duplicates, and deal with
outliers.
• PSO 3: Students will gain expertise in exploring large datasets using statistical
methods to identify patterns, trends, and anomalies. They will also learn how to
present these insights using data visualization techniques.
DEPARTMENT OF INFORMATION TECHNOLOGY
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
COURSE OUTCOMES MAPPING WITH Pos and PSOs:

Program Specific
PROGRAM OUTCOMES(PO)
Outcomes(PSO)

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3

CO1 2 2 2 1 3 2 3

CO2 2 2 3 2 1 3 2 3

CO3 2 2 3 2 1 3 2 3

CO4 2 2 3 2 3 1 3 3 3

CO5 2 2 3 2 1 3 2 3 3

CO6 2 2 3 2 1 3 3

Avg. 2 2 3 3 2 3 1 3 2 3 3

3 – High
2 - Medium
1 - Low
DEPARTMENT OF INFORMATION TECHNOLOGY
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
APPICATIONS OF EACH UNIT
UNIT-I
1. Healthcare & Medicine, 2, Finance & Banking 3. E-commerce & Retail 4. Social Media & Digital Marketing, 5.
Transportation & Traffic Management, 7. Education & Research 8. Entertainment & Media 9. Cybersecurity & Fraud Prevention
10. Manufacturing & Industry.
UNIT-II
1. Healthcare & Life Sciences, 2. Finance & Banking, 3. E-commerce & Retail,4. Social Media & Digital Marketing, 5.
Government & Smart Cities , 6. Telecommunications, 7. Manufacturing & Industry 4.0 , 8. Cybersecurity & Fraud Detection. 9.
Media & Entertainment , 10. Education & Research.
UNIT-III
1.Big Data Storage & Processing, 2. Data Warehousing & ETL Pipelines ,3. Healthcare & Bioinformatics, 4. Finance &
Banking, 5. Social Media & Web Analytics, 6. Telecommunications & IoT, 17. E-commerce & Retail, 8. Cybersecurity &
Log Management, 9. Media & Entertainment, 10. Scientific Research & Smart Cities.

UNIT-IV
1. Data Processing & Analytics, 2. Search Engines (Google, Bing, Yahoo, etc.), 3. Social Media & Sentiment Analysis
4. E-commerce & Retail, 5. Healthcare & Bioinformatics , 6. Finance & Banking, 7. Telecommunications & IoT
8. Cybersecurity & Threat Detection, 9. Scientific Research & Climate Modeling, 10. Media & Entertainment

UNIT-V
1. Hadoop Distributed File System (HDFS)
1. Application: Distributed storage of large datasets

🔹 Industries: Data warehousing, cloud storage, and archival systems

🔹 Examples:

• Storing and managing massive datasets in organizations like Facebook, Yahoo, and LinkedIn
• Handling petabytes of genomic data for medical research
• Storing and retrieving financial transactions for fraud analysis

2. MapReduce
DEPARTMENT OF INFORMATION TECHNOLOGY
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
APPICATIONS OF EACH UNIT
1. Apache
Application: Distributed data processing using parallel computing
🔹 Industries: Search engines, finance, and big data analytics

🔹 Examples:
• Google's PageRank algorithm for web indexing
• Fraud detection in banking by analyzing transaction patterns
• Analyzing social media trends on platforms like Twitter and Facebook

2. Apache Hive
Application: Data warehousing and SQL-like querying on big data
🔹 Industries: Business intelligence, e-commerce, and financial analytics

🔹 Examples:
• Querying large-scale sales data in retail businesses like Amazon and Walmart
• Processing financial reports and stock market analysis
• Ad-hoc querying for insights in customer data

3.Apache HBase

3 .Application: High-level data processing for ETL (Extract, Transform, Load)

🔹 Industries: Telecommunications, social media analytics, and sensor data processing

🔹 Examples:
• Analyzing call detail records (CDR) in telecom companies
• Processing raw logs from web servers to analyze website traffic
• Transforming sensor data from IoT devices for predictive maintenance

4. Apache NoSQL

Application: Real-time NoSQL database for large-scale applications

🔹 Industries: Internet of Things (IoT), real-time analytics, and fraud detection

🔹 Examples:
• Storing and retrieving data from billions of social media posts
• Managing real-time sensor data in IoT applications
• Processing real-time customer transactions in e-commerce
DEPARTMENT OF INFORMATION TECHNOLOGY
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
APPICATIONS OF EACH UNIT
📌 Application: Real-time data streaming and messaging

🔹 Industries: Stock market analysis, IoT, and fraud detection

🔹 Examples:

• Streaming stock market data for algorithmic trading

• Processing IoT sensor data in real-time for predictive maintenance
• Detecting fraudulent transactions by analyzing live banking data
DEPARTMENT OF INFORMATION TECHNOLOGY
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
PROJECT RELATED TO SUBJECT
1. . Sentiment Analysis on Social Media Data

📌 Objective: Analyze social media posts (e.g., tweets, Facebook comments) to determine public sentiment on a
specific topic (e.g., a political event, a product launch).
🔹 Data Source: Twitter API, Kaggle sentiment datasets
🔹 Technologies: Hadoop, Apache Spark, Python (NLTK, TextBlob), Apache Hive
🔹 Key Features:

• Collect and preprocess social media data

• Apply Natural Language Processing (NLP) for sentiment detection
• Classify sentiments as Positive, Negative, or Neutral
• Visualize results using dashboards (Tableau, Power BI)

2. Customer Purchase Behavior Analytics in E-Commerce

📌 Objective: Analyze shopping trends and recommend products based on customer behavior.
🔹 Data Source: Amazon product review dataset (Kaggle), e-commerce transaction records
🔹 Technologies: Hadoop, Spark MLlib, Python, Apache Hive
🔹 Key Features:

• Process customer purchase data

• Use recommendation algorithms (Collaborative Filtering, Content-based Filtering)
• Predict best-selling products and optimize stock inventory

3. Real-Time Fraud Detection in Banking Transactions

📌 Objective: Detect fraudulent financial transactions using machine learning and big data analytics.
🔹 Data Source: Bank transaction datasets (Kaggle), financial transaction logs
🔹 Technologies: Hadoop, Apache Spark Streaming, Kafka, Python (Scikit-learn), HBase
🔹 Key Features:

• Analyze customer transaction patterns

• Detect anomalies using ML models
• Generate real-time alerts for suspicious transactions
DEPARTMENT OF INFORMATION TECHNOLOGY
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
PROJECT RELATED TO SUBJECT
4.Healthcare Data Analytics for Disease Prediction
📌 Objective: Predict diseases (e.g., diabetes, heart disease) based on patient records.
🔹 Data Source: Public healthcare datasets (WHO, CDC, Kaggle)
🔹 Technologies: Hadoop, Spark, Python (Scikit-learn, TensorFlow), Apache Hive
🔹 Key Features:

• Process and analyze large-scale patient records

• Identify patterns in medical conditions
• Use ML models for disease prediction

5. Traffic Flow Prediction and Congestion Analysis

📌 Objective: Predict and analyze traffic congestion using GPS and real-time data.
🔹 Data Source: Google Maps API, open traffic datasets
🔹 Technologies: Hadoop, Spark, Kafka, Python, NoSQL (MongoDB)
🔹 Key Features:

• Collect real-time traffic data

• Analyze congestion patterns and peak traffic hours
• Suggest alternative routes using predictive models

6. Crime Rate Prediction Using Big Data

📌 Objective: Predict crime hotspots based on historical crime data and external factors.
🔹 Data Source: FBI crime datasets, Kaggle datasets, open government data
🔹 Technologies: Hadoop, Spark MLlib, Python, Apache Hive
🔹 Key Features:

• Analyze past crime records based on location and time

• Identify high-crime areas
• Use ML models to predict future crime trends
DEPARTMENT OF INFORMATION TECHNOLOGY
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
PROJECT RELATED TO SUBJECT
7. Energy Consumption Analytics Using IoT Data

📌 Objective: Analyze smart meter data to predict and optimize energy consumption.
🔹 Data Source: IoT-based smart meter datasets
🔹 Technologies: Hadoop, Spark Streaming, Kafka, NoSQL (MongoDB), Python
🔹 Key Features:

• Collect real-time energy consumption data

• Identify peak usage times
• Provide recommendations for energy savings

8. Airline Customer Satisfaction Analytics

📌 Objective: Analyze customer feedback and complaints to improve airline services.

🔹 Data Source: Airline review datasets (Kaggle), customer feedback forms
🔹 Technologies: Hadoop, Apache Spark, Python (NLP), Apache Hive
🔹 Key Features:

• Process and analyze customer reviews

• Categorize sentiments into positive, neutral, or negative
• Suggest improvements for airline services

9. Climate Change and Weather Pattern Analysis

📌 Objective: Analyze global temperature and climate data to detect patterns.

🔹 Data Source: NASA, NOAA, Kaggle climate datasets
🔹 Technologies: Hadoop, Apache Spark, Python, Power BI
🔹 Key Features:

• Process large-scale climate datasets

• Identify trends in temperature rise and CO₂ emissions
• Predict future climate changes using ML models

10. Customer Churn Prediction in Telecom Industry

📌 Objective: Predict customer churn based on call records and usage patterns.
🔹 Data Source: Telecom datasets (Kaggle), real-world CDR data
🔹 Technologies: Hadoop, Apache Spark, Python (Scikit-learn), Apache Pig
🔹 Key Features:
• Process customer call and usage data
• Identify factors leading to customer churn
• Use ML models to predict at-risk customers
DEPARTMENT OF INFORMATION TECHNOLOGY
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
SUPPORTING DOCUMENTS PPTS
DEPARTMENT OF INFORMATION TECHNOLOGY
INTRODUCTION TO BIGDATA ANALYTICS (2012PE05)
SUPPORTING DOCUMENTS PPTS
COURSE: IV B.TECH II SEM SUBJECT: INTRODUCTION TO BIG
DEPARTMENT: IT
DATA ANALYTICS
UNIT-I,II,III R20 REGULATION ACADEMIC YEAR : 2024-2025

ASSIGNMENT SHEET-I
SET-1
1. "Could you elucidate the intricacies encompassed within the deployment journey of a
BigDatsolution?

2. In the academic context, could you delve into the intricate reasoning that drives the adoption of
Hadoop as a pivotal component within the expansive domain of Big Data Analytics?

3. As an esteemed academician within the university, could you exemplify the practical application
of Big Data Analytics and elucidate its profound significance in contemporary domains?

4. "As a distinguished academician within the university, could you undertake an in-depth
examination and deconstruction of the myriad methodologies employed in unlocking the latent
potential inherent in the realm of Big Data analytics?"

5. As an esteemed scholar in the field, could you expound upon a plethora of fundamental
attributes intricately woven into the complex fabric of the Hadoop framework??

SET-2
1. As a scholar in the field, could you delve into the intricate details encapsulating the architectural
features of Hadoop??
2. As an academician within the scholarly realm, could you delineate the disparities between
RDBMS and Hadoop?

3. In the capacity of an academic professional, could you elaborate on how Big Data Analytics
exemplifies its pivotal role?

4. As an esteemed academic, could you elaborate on the intricate subject of Big Data Modelling?
5. Could you delve into the multifaceted intricacies surrounding Apache Spark from a scholarly
perspective?
SET-3

1. As an academician, could you delineate the distinctions between Hadoop and Spark?
2. Could you elaborate on the concepts of HDFS blocks and InputSplits?
3. elucidate on the intricate web of components and frameworks comprising the Hadoop Ecosystem
4. enumerate the constraints and drawbacks associated with Hadoop?
5Could you delve into the intricacies surrounding the distributed cache paradigm within the domain of Big Data
Analytics?

SET-4

1. Could you elaborate on the diverse array of configuration files utilized within the Hadoop ecosystem?
2. As part of your academic assignment, please analyze and expound upon the multifaceted attributes
that epitomize Big Data Analytics
3. Elucidate the intricate components comprising Apache HBase's Region Server architecture,
highlighting their respective functionalities and interdependencies.
4.In the academic context, your task is to delve into the complexities surrounding the elucidation of the
"V's" that delineate the landscape of Big Data Analytics, exploring the nuances inherent in each
dimension: volume, velocity, variety, veracity, and value
5.In the academic domain, your assignment is to undertake a comprehensive exploration of the
multifaceted components comprising Apache HBase, delving into their intricate functionalities,
interconnections, and contributions to the overarching architecture of the system.

SET-5:

1. Within the academic realm, your scholarly task is to embark on a detailed exploration delineating the
intricacies surrounding Apache Spark, encompassing its multifaceted architecture, core functionalities,
and pivotal role within the domain of big data processing and analytics.
2. Underlying rationales driving the adoption of Hadoop in the domain of Big Data

Analytics?
3Could you outline the methodologies for utilizing Big Data Analytics?
4. Enumerate the attributes of Apache Spark?
5. Could you elucidate the sequential processes entailed within the realm of Big Data Analytics?
SET-6:
1. Could you enumerate a selection of optimal methodologies adhered to in the domain of Big Data
Analytics?
2. Compose a comprehensive analysis detailing five distinctive features inherent to the practice of Big
Data Analytics, emphasizing their significance and implications within the field.
3. Could you explicate the concept of the various "V's" in the context of data analytics, elucidating their
respective roles and contributions to the overarching framework, with a focus on their implications for
data management and processing strategies?
4. Could you provide a detailed analysis of the intricate interplay and components comprising the
Hadoop Ecosystem, elucidating its structural framework, constituent technologies, and their respective
functionalities, with an emphasis on their collective impact on distributed data processing and analytics?
5. Could you provide a comprehensive analysis delineating the operational dynamics and significance of
distributed cache within the realm of Big Data Analytics, focusing on its role in enhancing data
processing efficiency, mitigating latency, and optimizing resource utilization, while also exploring its
integration within distributed computing frameworks?

FACULTY HOD
SUBJECT: INTRODUCTION TO BIG
DEPARTMENT: IT COURSE: IV B.TECH II SEM
DATA ANALYTICS
UNIT-I,II,III R20 REGULATION ACADEMIC YEAR : 2024-2025

ASSIGNMENT SHEET-II
SET-1
1. Provide an overview of Hadoop and discuss its methodology for managing extensive data processing tasks.
2. Explore the nuanced roles and significance of the Map phase in the context of the MapReduce framework,
emphasizing its primary functions and contributions to distributed data processing.
3 Identify and discuss two distinctive attributes inherent to the MapReduce framework, elucidating their
significance in distributed data processing environments.
4. Explore the fundamental operational characteristics of Sqoop, highlighting its role and impact in facilitating
data integration processes between relational databases and Hadoop ecosystems.
5. Investigate the core principles and structural foundations of the document-oriented data model employed by
MongoDB, emphasizing its architectural aspects and implications for database design and management
strategies.

SET-2
1. Identify and discuss a selection of prominent vendors offering Hadoop distributions within the current market
landscape, highlighting their key contributions and market positioning in the realm of big data technologies.
2. Analyze the intricate process by which MapReduce decomposes extensive computational tasks into
manageable, granular components, elucidating the mechanisms underlying the subdivision of labor within
distributed computing frameworks.
3. Conduct a comprehensive examination of the distinct characteristics and operational disparities between the
Map and Reduce functions within the MapReduce paradigm, delving into their respective roles, methodologies,
and contributions to distributed data processing workflows.
4. Investigate the operational mechanisms and strategic approaches employed by Sqoop to facilitate seamless
data interchange between Hadoop environments and relational databases, emphasizing the intricacies of data
migration and synchronization processes across disparate data storage systems.
5.Write a MongoDB query to retrieve documents that match specific criteria.
SET-3

1. Examine the ways in which Hadoop distributors contribute to the optimization and enrichment of the Hadoop
ecosystem's capabilities, emphasizing their role in augmenting the platform's functionalities and expanding its
utility within diverse data processing environments.
2. Illustrate a hypothetical situation demonstrating the strategic advantage of implementing a Combiner
function in MapReduce, emphasizing its practical utility in optimizing data processing workflows within
distributed computing environments.
3. Present a case study showcasing a specific input format utilized in MapReduce, highlighting its functional
importance and operational impact on data processing efficiencies within distributed computing frameworks.
4. Outline a step-by-step procedure showcasing the integration of data from a MySQL database into Hadoop
through Sqoop, emphasizing practical methodologies and techniques employed in transferring data between
relational databases and distributed computing environments.
5. Develop a comprehensive set of instructions to establish a Hive table and populate it with data sourced
externally, focusing on the practical implementation steps involved in configuring data storage and retrieval
within a distributed computing environment.
SET-4
1. Evaluate the suitability of employing Hadoop for data processing tasks across diverse scenarios, considering
factors such as data volume, complexity, and performance requirements to formulate informed
recommendations regarding its applicability within specific contexts.
2.Investigate the fault tolerance mechanisms within MapReduce's job execution framework, considering its
distributed nature and complex interdependencies. Challenge students to critically evaluate the effectiveness of
these mechanisms through real-world case studies or simulations, culminating in proposals for advancing fault
tolerance in similar distributed computing environments.
3. Examine how different partitioning strategies affect MapReduce job optimization. Analyze their impact on
performance and resource utilization. Assign students to compare various partitioning approaches and explore
advanced techniques for further optimization.
4. Analyze Sqoop alongside other data transfer tools in the Hadoop ecosystem. Explore their similarities,
differences, strengths, and weaknesses. Assign students to conduct in-depth evaluations of each tool's features,
performance benchmarks, and compatibility with different data sources. Encourage them to present their
findings through comparative studies and propose recommendations for selecting the most suitable tool for
specific data transfer requirements.
5.Investigate how Pig streamlines data processing tasks within the Hadoop ecosystem. Explore its role in
abstracting complex operations and facilitating efficient data manipulation. Assign students to analyze real-
world use cases where Pig's scripting language simplifies data processing workflows, and challenge them to
propose advanced optimization techniques or integration strategies to enhance its functionality further.

1.Enumerate the potential obstacles encountered when handling varied datasets using Hadoop.
SET 5:
1.Explore a scenario where the implementation of a Hadoop distributor demonstrably enhanced the efficiency
of data processing.
2.Evaluate the scalability of MapReduce and its importance in big data processing.
3.Compare the pros and cons of employing MapReduce in data warehousing versus machine learning
applications.
4. List the primary characteristics of the Cassandra NoSQL database.
5. Detail the roles of spouts and bolts within the Storm architecture.
SET-6:
1.Demonstrate the files based data structure in sequential file system.

2. Outline the Applications of Hive and how it helps .

3. Compare Pig and Storm tools
4. Extract the features and applications of Flink and Apache.
5. =Summarize the benefits of Big data resources in IT.
6. Describe the steps involved in support vector-based inference methodology
7. What is sampling and sampling distribution give a detailed analysis
8. . Define Arcing classifier & Bagging predictors in detail.

FACULTY HOD
SUBJECT:
COURSE: IV B.TECH
DEPARTMENT: IT INTRODUCTION TO BIG
II SEM
DATA ANALYTICS

UNIT - II R20 REGULATION 2024-2025

TUTORIAL SHEET-II
1. Compare and contrast the Convolution approach with Hadoop, emphasizing the
strengths and weaknesses of each approach in handling large-scale data processing
tasks.
2. Analyze the significance of various components within the Hadoop Ecosystem, such as
HDFS, MapReduce, and YARN, in addressing specific challenges associated with big
data processing.
3. Analyze the fundamental principles behind processing data with Hadoop, focusing on
the MapReduce programming model and how it facilitates parallel and distributed
computing.
4. Explore the role of Hadoop distributors in the ecosystem, discussing how different
distributors contribute to the adoption and implementation of Hadoop in various
enterprise environments.
5. Identify and analyze challenges associated with Hadoop implementation, addressing
issues such as data security, complexity in configuration, and optimizing performance in
different deployment scenarios.
6. Compare and contrast different core methods of a Reducer

7. Elucidate HDFS and YARN? Extract their respective components.

8. Outline some of the main configuration files used in Hadoop

Signature of Faculty Signature of HOD

SUBJECT:
COURSE: IV B.TECH
DEPARTMENT: IT INTRODUCTION TO BIG
II SEM
DATA ANALYTICS

UNIT – V R20 REGULATION 2024-2025

TUTORIAL SHEET-V
1. Analyze the challenges and benefits associated with using Sqoop for importing and
exporting data in a big data environment.
2. Compare Cassandra with traditional relational databases, highlighting scenarios where
Cassandra excels in terms of performance and data handling.
3. Formulate the use cases where MongoDB is particularly suitable, considering factors
such as flexibility, scalability, and ease of development.
4. Explore the purpose of Apache Pig in the context of big data processing. How does Pig
simplify the development of complex data processing tasks in comparison to raw
MapReduce?
5. Identify the significance of Apache projects in the big data ecosystem and their impact
on the evolution of data processing technologies.
6. Illustrate the commands can you use to start and stop all the Hadoop daemons at one time.
7. Elaborate HDFS environment fault-tolerant.
8. Elucidate Zookeeper. Illustrate the benefits of using a zookeeper.

Signature of Faculty Signature of HOD

SUBJECT:
COURSE: IV B.TECH
DEPARTMENT: IT INTRODUCTION TO BIG
II SEM
DATA ANALYTICS

UNIT - I R20 REGULATION 2023-2024

TUTORIAL SHEET-I
1. Can you trace the evolution of big data and highlight the major milestones that have
contributed to its current significance in the realm of information technology?
2. Analyze the challenges associated with big data, considering factors such as volume,
velocity, variety, and complexity. How do these challenges impact data management
and analysis?
3. Classify the different types of analytics used in big data analytics, and provide insights
into how each type contributes to extracting meaningful insights from large datasets.
4. Explore the terminologies commonly associated with big data analytics, elucidating
their meanings and contextual relevance within the analytics process.
5. Examine the methods and techniques employed in big data storage and analysis, with a
focus on how these approaches facilitate efficient processing and extraction of valuable
information.
6. Elaborate Hadoop and Big Data co-related.
7. Examine data management tools used with Edge Nodes in Hadoop

Signature of Faculty Signature of HOD

SUBJECT:
COURSE: IV B.TECH
DEPARTMENT: IT INTRODUCTION TO BIG
II SEM
DATA ANALYTICS

UNIT - IV R20 REGULATION 2024-2025

TUTORIAL SHEET-IV
1. Can you delineate the critical stages involved in the MapReduce process and highlight
their respective functions?
2. Could we contribute continue functionality be ensured, and what measures contribute to
maintaining high system reliability in the face of node failures?
3. Give the purpose of GroupingByKey in distributed data processing, and how does it
enhance the efficiency of data manipulation?
4. Could you expound upon the role played by FileInputFormat in Hadoop's data
processing, emphasizing its significance in managing and processing large-scale
datasets?
5. Identify the control mechanism for executing MapReduce tasks using InputFormat in
Hadoop, and how does it influence the overall execution flow?
6. Justify what makes an HDFS environment fault-tolerant.
7. Factorize Hadoop YARN and what are its main components.

8. Elucidate Map reduce architecture and its working in Hadoop

Signature of Faculty Signature of HOD

SUBJECT:
COURSE: IV B.TECH
DEPARTMENT: IT INTRODUCTION TO BIG
II SEM
DATA ANALYTICS

UNIT - III R20 REGULATION 2024-2025

TUTORIAL SHEET-III
1. Analyze key concepts associated with HDFS, such as block storage, fault tolerance, and
data replication. How do these concepts contribute to the reliability and scalability of
HDFS?
2. Discuss scenarios where traditional file systems might be more appropriate and
instances where Hadoop File Systems offer distinct advantages.
3. Analyze the role of the Local File System in Hadoop, highlighting its function and how
it interacts with Hadoop's distributed architecture.
4. Examine the Java Interface in Hadoop, focusing on key library classes and their
functions. How does the Java Interface facilitate interaction with HDFS for data
processing tasks?
5. Assess best practices for efficient data storage in HDFS, considering factors like data
compression, partitioning, and the impact on overall performance.
6. Name the most popular data management tools used with Edge Nodes in Hadoop.

7. Review the different file formats that can be used in Hadoop.

8. Compare and contrast Active and Passive Namenodes.

Signature of Faculty Signature of HOD

Bda Lab Manual 21-22 - 22-08-2022
No ratings yet
Bda Lab Manual 21-22 - 22-08-2022
44 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
Bda 1
No ratings yet
Bda 1
95 pages
It - (R20) - 4-1 - Big Data Analytics - Digital Notes
No ratings yet
It - (R20) - 4-1 - Big Data Analytics - Digital Notes
117 pages
Big Data Analytics Course Plan 2023-24
No ratings yet
Big Data Analytics Course Plan 2023-24
19 pages
Big Data Analytics for B.Tech Students
No ratings yet
Big Data Analytics for B.Tech Students
175 pages
Co-Po Big Data Analytics
100% (1)
Co-Po Big Data Analytics
41 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
5 pages
Unit 1 - BD - Introduction To Big Data
100% (1)
Unit 1 - BD - Introduction To Big Data
90 pages
Information Technology Engineering Syllabus Sem Viii Mumbai University
No ratings yet
Information Technology Engineering Syllabus Sem Viii Mumbai University
60 pages
Pgdbda 2024 2025
No ratings yet
Pgdbda 2024 2025
22 pages
BCS714D Syllabus
No ratings yet
BCS714D Syllabus
3 pages
CCS334 Updated 05-05-2025
No ratings yet
CCS334 Updated 05-05-2025
19 pages
Big Data Analytics Overview and Tools
No ratings yet
Big Data Analytics Overview and Tools
139 pages
Big Data Analytics-Digital Notes
No ratings yet
Big Data Analytics-Digital Notes
86 pages
BigDataAnalytics Lab Manual (DS)
No ratings yet
BigDataAnalytics Lab Manual (DS)
44 pages
BE AIDS R 20 VII VIII Sem Syllabus - Compressed
No ratings yet
BE AIDS R 20 VII VIII Sem Syllabus - Compressed
55 pages
Ai4146 - Bda - Course Handout
No ratings yet
Ai4146 - Bda - Course Handout
7 pages
Big Data Analytics Course Guide
No ratings yet
Big Data Analytics Course Guide
2 pages
QB - Updated 1
No ratings yet
QB - Updated 1
15 pages
Course File 15IT423E
No ratings yet
Course File 15IT423E
7 pages
Appendix-74
No ratings yet
Appendix-74
42 pages
Big Data Analytics
No ratings yet
Big Data Analytics
131 pages
BIG DATA ANALYTICS - Syllabus
No ratings yet
BIG DATA ANALYTICS - Syllabus
4 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
84 pages
Information Technology s7 & s8
No ratings yet
Information Technology s7 & s8
317 pages
BDA Lesson Plan Final
No ratings yet
BDA Lesson Plan Final
20 pages
Indexing Tape Data in Big Data Analytics
No ratings yet
Indexing Tape Data in Big Data Analytics
89 pages
Big Data and Analytics Course Overview
No ratings yet
Big Data and Analytics Course Overview
18 pages
BDA Manual
No ratings yet
BDA Manual
56 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
BD Course Handout (Spring 2024)
No ratings yet
BD Course Handout (Spring 2024)
4 pages
Jaya - BDA Record Front Pages
No ratings yet
Jaya - BDA Record Front Pages
8 pages
J. B. Institute of Engineering and Technology
No ratings yet
J. B. Institute of Engineering and Technology
1 page
Data Analytics Course Handout 2024 29.11.24 Anjamma
No ratings yet
Data Analytics Course Handout 2024 29.11.24 Anjamma
42 pages
22IS61 Big Data Analytics 2025
No ratings yet
22IS61 Big Data Analytics 2025
4 pages
Big Data Analytics Course File
No ratings yet
Big Data Analytics Course File
133 pages
Bigdata
No ratings yet
Bigdata
2 pages
Unit 1 - BD - Introduction To Big Data
No ratings yet
Unit 1 - BD - Introduction To Big Data
83 pages
Big Data Course Overview and Insights
No ratings yet
Big Data Course Overview and Insights
83 pages
MCA 3rd Semester Big Data Analytics Syllabus
No ratings yet
MCA 3rd Semester Big Data Analytics Syllabus
15 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
84 pages
Big Data Course Overview
No ratings yet
Big Data Course Overview
97 pages
CS8091 Bigdata QB 2022-2023 Final
No ratings yet
CS8091 Bigdata QB 2022-2023 Final
6 pages
All in One
No ratings yet
All in One
362 pages
Big Data Course Insights
No ratings yet
Big Data Course Insights
85 pages
Big Data - 2 Marks-1
No ratings yet
Big Data - 2 Marks-1
1 page
CS8091 Big Data Analytics
No ratings yet
CS8091 Big Data Analytics
28 pages
DA Full
No ratings yet
DA Full
738 pages
Big Data Analytics for Engineers
No ratings yet
Big Data Analytics for Engineers
4 pages
CCS334 Bda
No ratings yet
CCS334 Bda
19 pages
Big Data Analytics Course
No ratings yet
Big Data Analytics Course
19 pages
Bda - Digital Notes
No ratings yet
Bda - Digital Notes
85 pages
20ai402 Data Analytics Unit-1
No ratings yet
20ai402 Data Analytics Unit-1
52 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
57 pages
Siddharth Big Data Report 1000016431
No ratings yet
Siddharth Big Data Report 1000016431
6 pages
Big Data
No ratings yet
Big Data
66 pages
Flair Data Analytics Tutorial
No ratings yet
Flair Data Analytics Tutorial
9 pages
7STEPUX Gift Chapter-6094217
No ratings yet
7STEPUX Gift Chapter-6094217
43 pages
Iot Book Chapter Published
No ratings yet
Iot Book Chapter Published
11 pages
AI in Hollywood Film Production
No ratings yet
AI in Hollywood Film Production
22 pages
Jyothsna Vanamala Resume
No ratings yet
Jyothsna Vanamala Resume
2 pages
Senior ML Ops Engineer Job Houston
No ratings yet
Senior ML Ops Engineer Job Houston
2 pages
Magic Quadrant for SIEM 2022
No ratings yet
Magic Quadrant for SIEM 2022
14 pages
Digital Transformation Improving The Odds of Success Final
No ratings yet
Digital Transformation Improving The Odds of Success Final
5 pages
AI Skill Trend Report
No ratings yet
AI Skill Trend Report
35 pages
Agentic AI
No ratings yet
Agentic AI
10 pages
Job Seekers: Disruptive Cover Letters
100% (2)
Job Seekers: Disruptive Cover Letters
4 pages
Kaushal Jha's Academic and Work Achievements
No ratings yet
Kaushal Jha's Academic and Work Achievements
1 page
Tableau
100% (4)
Tableau
12 pages
OM2025 Duyen
No ratings yet
OM2025 Duyen
28 pages
HBR - How - To - Marry - Process - Davenport, Thomas H
No ratings yet
HBR - How - To - Marry - Process - Davenport, Thomas H
8 pages
Quantifying HR: Basics of HR Analytics: David B Turetsky 15 May 2015
No ratings yet
Quantifying HR: Basics of HR Analytics: David B Turetsky 15 May 2015
49 pages
Big Data To Avoid Weather Related Flight Delays
No ratings yet
Big Data To Avoid Weather Related Flight Delays
22 pages
Marketing Information System Challenges
No ratings yet
Marketing Information System Challenges
13 pages
Doc-20240412-Wa0003 240417 184525
No ratings yet
Doc-20240412-Wa0003 240417 184525
30 pages
Slide Ba Orientation Ftu Edited
No ratings yet
Slide Ba Orientation Ftu Edited
18 pages
Iim Tirchapali
No ratings yet
Iim Tirchapali
16 pages
Arvind RK - Glasgow - Final
No ratings yet
Arvind RK - Glasgow - Final
2 pages
Industry Solutions - Sap For Utilities Highlights
No ratings yet
Industry Solutions - Sap For Utilities Highlights
52 pages
SAS Big Data Analytics Expanded
No ratings yet
SAS Big Data Analytics Expanded
4 pages
Health and Fitness Clubs - UK - 2023
No ratings yet
Health and Fitness Clubs - UK - 2023
8 pages
Microsoft Modern Data Estate
No ratings yet
Microsoft Modern Data Estate
48 pages
Big Data Analytics Overview at York University
No ratings yet
Big Data Analytics Overview at York University
6 pages
UNIT 2:fundamentals of Business Analytics
No ratings yet
UNIT 2:fundamentals of Business Analytics
30 pages
Big Data and Cloud Computing Overview
No ratings yet
Big Data and Cloud Computing Overview
85 pages

Ibda Course File

Uploaded by

Ibda Course File

Uploaded by

DEPARTMENT OF INFORMATION TECHNOLOGY

Course Title INTRODUCTION TO BIG DATA ANALYTICS

Course Code 2012PE05

Year & Semester IV Year II Sem

Course Type PROFESSIONAL ELECTIVE- 1

Lecture Tutorials Credits Laboratory Credits

Course Coordinator Mr. SUVARNA SUNIL KUMAR

Faculty In-Charge HOD

• To offer world class Name training to the promising Engineers.

To empower women in the field of Information Technology through quality education,

❖ M1: To offer a high quality education that integrates cutting-edge technologies,

UNIT-II HADOOP TECHNOLOGY Introduction to Hadoop: A brief history of

UNIT-III HADOOP FILE SYSTEM Introduction to Hadoop distributed file

UNIT-IV FUNDAMENTALS OF MAP REDUCE Introduction to Map reduce:

➢ Demonstrate a clear understanding of the characteristics and challenges of big data

🔹 Industries: Data warehousing, cloud storage, and archival systems

3 .Application: High-level data processing for ETL (Extract, Transform, Load)

🔹 Industries: Telecommunications, social media analytics, and sensor data processing

Application: Real-time NoSQL database for large-scale applications

🔹 Industries: Stock market analysis, IoT, and fraud detection

• Streaming stock market data for algorithmic trading

• Collect and preprocess social media data

2. Customer Purchase Behavior Analytics in E-Commerce

• Process customer purchase data

3. Real-Time Fraud Detection in Banking Transactions

• Analyze customer transaction patterns

• Process and analyze large-scale patient records

5. Traffic Flow Prediction and Congestion Analysis

• Collect real-time traffic data

6. Crime Rate Prediction Using Big Data

• Analyze past crime records based on location and time

• Collect real-time energy consumption data

8. Airline Customer Satisfaction Analytics

📌 Objective: Analyze customer feedback and complaints to improve airline services.

• Process and analyze customer reviews

9. Climate Change and Weather Pattern Analysis

📌 Objective: Analyze global temperature and climate data to detect patterns.

• Process large-scale climate datasets

10. Customer Churn Prediction in Telecom Industry

2. Outline the Applications of Hive and how it helps .

UNIT - II R20 REGULATION 2024-2025

7. Elucidate HDFS and YARN? Extract their respective components.

8. Outline some of the main configuration files used in Hadoop

Signature of Faculty Signature of HOD

UNIT – V R20 REGULATION 2024-2025

Signature of Faculty Signature of HOD

UNIT - I R20 REGULATION 2023-2024

Signature of Faculty Signature of HOD

UNIT - IV R20 REGULATION 2024-2025

8. Elucidate Map reduce architecture and its working in Hadoop

Signature of Faculty Signature of HOD

UNIT - III R20 REGULATION 2024-2025

7. Review the different file formats that can be used in Hadoop.

8. Compare and contrast Active and Passive Namenodes.

Signature of Faculty Signature of HOD

You might also like