Big Data Analytics Lab Manual Full

The Big Data Analytics Lab Manual outlines three experiments focusing on Hadoop and HBase. Experiment 1 involves setting up a Hadoop cluster for distributed data processing, Experiment 2 implements a MapReduce job to create an inverted index, and Experiment 3 demonstrates data storage and retrieval using HBase. Each experiment includes objectives, requirements, procedures, expected results, key notes, and advantages.

Uploaded by

gayathripawar12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views5 pages

Big Data Analytics Lab Manual Full

Uploaded by

gayathripawar12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Big Data Analytics Lab Manual

(DS605PC)
Experiment 1: Create a Hadoop Cluster

Objective:
To set up and configure a Hadoop cluster for distributed data processing.

Requirements:
- One or more computers (or virtual machines)
- Linux operating system (e.g., Ubuntu, CentOS)
- Java Development Kit (JDK) installed on each machine

Procedure:
1. Install Hadoop on each machine:
- Download the Hadoop binary from the official Apache Hadoop website.
- Extract the files to a designated directory.
2. Edit configuration files:
- Set up core-site.xml: Hadoop core configuration, including HDFS settings.
- Modify hdfs-site.xml: Set HDFS configurations like replication factor.
- Configure yarn-site.xml and mapred-site.xml for resource and job management.
3. Set up SSH key-based authentication to allow password-less login between machines.
4. Format the HDFS filesystem by running the 'hdfs namenode -format' command on the
master node.
5. Start the Hadoop daemons by executing the following on the master node:
- start-dfs.sh (Starts NameNode and DataNode daemons)
- start-yarn.sh (Starts ResourceManager and NodeManager daemons)
6. Verify the setup using the web UI for NameNode and ResourceManager.

Expected Result:
A running Hadoop cluster capable of distributed data processing.

Key Notes:
- Master node manages the file system namespace, job scheduling, and resource allocation.
- Worker nodes store actual data and perform computations.

Advantages:
- Enables scalable and reliable data storage and processing.
- Fault tolerance through data replication across nodes.
- Efficient resource management using YARN.
Figure 1: Hadoop Cluster Architecture
Experiment 2: Implement MapReduce Job for Inverted Index

Objective
To implement a simple MapReduce job that builds an inverted index on the set of input
documents using Hadoop.

Requirements
- Hadoop installed and configured
- Java installed
- Input text files
- Basic knowledge of Java MapReduce programming

Procedure
1. Create input files containing sample text.
2. Write the Mapper class that processes input lines and emits word-document pairs.
3. Write the Reducer class that groups all documents containing each word.
4. Compile the Java classes and create a JAR file.
5. Run the job using the Hadoop command line and pass the input/output paths.
6. Check the output directory for the inverted index results.

Expected Result
A text-based inverted index where each word is mapped to a list of documents containing it.

Key Notes
- Demonstrates core MapReduce workflow
- Enhances understanding of text processing and Hadoop jobs

Advantages
- Facilitates faster search queries in distributed document sets

Figure 2: Word-Document mapping from MapReduce output

Experiment 3: Process Big Data in HBase

Objective
To store and retrieve large volumes of data using HBase, a distributed, scalable NoSQL
database.

Requirements
- Hadoop and HBase installed
- Sample dataset (e.g., student records)
- HBase shell or Java API

Procedure
1. Start Hadoop and HBase services.
2. Create a new table using HBase shell: 'create 'students', 'info''.
3. Insert data using 'put' command: 'put 'students', '1', 'info:name', 'John''.
4. Retrieve data using 'get' and 'scan' commands.
5. Perform updates and deletions using HBase shell.
6. Explore integration with MapReduce for batch processing.

Expected Result
Data inserted, updated, and retrieved successfully using HBase shell or API.

Key Notes
- HBase provides random real-time read/write access to big data
- Column-family based flexible schema

Advantages
- Scalability and consistency for OLTP systems

Figure 3: HBase Table with Column Families

Big Data Analysis 3170722 Lab Manual
No ratings yet
Big Data Analysis 3170722 Lab Manual
68 pages
Bigdata Lab
No ratings yet
Bigdata Lab
55 pages
BDA Lab Manual-2
No ratings yet
BDA Lab Manual-2
61 pages
BDA Exp (1 To 7)
No ratings yet
BDA Exp (1 To 7)
22 pages
Big Data MapReduce Lab Manual
No ratings yet
Big Data MapReduce Lab Manual
32 pages
9924 Experiment 1
No ratings yet
9924 Experiment 1
17 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
Big Data Analytics Lab Manual (BE AI&DS)
No ratings yet
Big Data Analytics Lab Manual (BE AI&DS)
29 pages
Big Data Lab Guide for CS Students
No ratings yet
Big Data Lab Guide for CS Students
53 pages
Bda 1
No ratings yet
Bda 1
54 pages
Java & Hadoop Setup Guide
No ratings yet
Java & Hadoop Setup Guide
67 pages
BIGDATA LAB MANUAL
No ratings yet
BIGDATA LAB MANUAL
27 pages
Hadoop Single Node Cluster Setup Guide
No ratings yet
Hadoop Single Node Cluster Setup Guide
61 pages
CSE488 Lab01
No ratings yet
CSE488 Lab01
6 pages
Hadoop Lab Practical Guide
No ratings yet
Hadoop Lab Practical Guide
69 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
Big Data Analytics Lab Certificate
No ratings yet
Big Data Analytics Lab Certificate
55 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
62 pages
Hadoop Mapreduce V2 Cookbook 2Nd Edition Explore The Hadoop Mapreduce V2 Ecosystem To Gain Insights From Very Large Datasets Thilina Gunarathne
No ratings yet
Hadoop Mapreduce V2 Cookbook 2Nd Edition Explore The Hadoop Mapreduce V2 Ecosystem To Gain Insights From Very Large Datasets Thilina Gunarathne
51 pages
BDA Lab Manual 2023-2024
No ratings yet
BDA Lab Manual 2023-2024
54 pages
Bda Exp1 Chinmay
No ratings yet
Bda Exp1 Chinmay
13 pages
C58 BDA Exp-1
No ratings yet
C58 BDA Exp-1
12 pages
V Ai-Ds Ccs334 Bda Labmanual
No ratings yet
V Ai-Ds Ccs334 Bda Labmanual
49 pages
Big Data Lab Guide for AI Students
No ratings yet
Big Data Lab Guide for AI Students
83 pages
Hadoop Course Content
No ratings yet
Hadoop Course Content
3 pages
Hadoop 2 Quick Start Guide PDF
100% (1)
Hadoop 2 Quick Start Guide PDF
736 pages
8CS4-21 BDA Lab - Dr. Varun P Saxena
100% (1)
8CS4-21 BDA Lab - Dr. Varun P Saxena
37 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Data Science
No ratings yet
Data Science
82 pages
Apache Hadoop Developer Training PDF
100% (1)
Apache Hadoop Developer Training PDF
397 pages
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
83 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Hadoop Interview Question
No ratings yet
Hadoop Interview Question
25 pages
Hadoop Ecosystem Overview and Setup
No ratings yet
Hadoop Ecosystem Overview and Setup
48 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
34 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
BDA Module2
No ratings yet
BDA Module2
83 pages
Understanding Hadoop Architecture and Benefits
No ratings yet
Understanding Hadoop Architecture and Benefits
10 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
34 pages
Big Data Lab File
No ratings yet
Big Data Lab File
49 pages
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
85 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
27 pages
HADOOP
No ratings yet
HADOOP
4 pages
Analyzing Big Data in Hadoop Spark
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
BDA Practical
No ratings yet
BDA Practical
18 pages
Chapter 4 MapReduce
No ratings yet
Chapter 4 MapReduce
82 pages
Bda Manual Lab Manual
No ratings yet
Bda Manual Lab Manual
117 pages
Rush
No ratings yet
Rush
90 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
59 pages
Apache Hadoop: A Guide For Cluster Configuration & Testing
No ratings yet
Apache Hadoop: A Guide For Cluster Configuration & Testing
6 pages
MapReduce Programming Architecture Guide
No ratings yet
MapReduce Programming Architecture Guide
50 pages
Invoice of Acer Laptop
No ratings yet
Invoice of Acer Laptop
1 page
Canon MF469!5!1 2ppbrochure FA Low 1
No ratings yet
Canon MF469!5!1 2ppbrochure FA Low 1
2 pages
E-Learning Platform Document
No ratings yet
E-Learning Platform Document
3 pages
The Forrester Wave™ - Big Data Fabric Q2 2018 PDF
No ratings yet
The Forrester Wave™ - Big Data Fabric Q2 2018 PDF
18 pages
Research Proposal UTeM - ARIF RAHMAN
No ratings yet
Research Proposal UTeM - ARIF RAHMAN
12 pages
Worksheet Chapter 2 Polynomials
No ratings yet
Worksheet Chapter 2 Polynomials
4 pages
ATC Module 3
No ratings yet
ATC Module 3
38 pages
Online Examination Portal Project Report
No ratings yet
Online Examination Portal Project Report
60 pages
Birthday Attack
No ratings yet
Birthday Attack
3 pages
Deep Learning for OOP Education
No ratings yet
Deep Learning for OOP Education
15 pages
Manual Indent No-60
No ratings yet
Manual Indent No-60
2 pages
How To Best Rate
100% (1)
How To Best Rate
5 pages
Logic Circuits Worksheet in Blue Clean Style
No ratings yet
Logic Circuits Worksheet in Blue Clean Style
3 pages
Skill Lab Manual-All Branches
No ratings yet
Skill Lab Manual-All Branches
192 pages
Nutch Api Documentation
No ratings yet
Nutch Api Documentation
5 pages
Data Extraction and Analysis Techniques
No ratings yet
Data Extraction and Analysis Techniques
6 pages
UKMT JMC 2014 Solutions PDF Download
No ratings yet
UKMT JMC 2014 Solutions PDF Download
11 pages
SCG 5x Ve DG
No ratings yet
SCG 5x Ve DG
33 pages
RFID Reader Manual
No ratings yet
RFID Reader Manual
29 pages
SPC 2101 Introduction To Computer Programming Concepts Year I Semester II
No ratings yet
SPC 2101 Introduction To Computer Programming Concepts Year I Semester II
2 pages
Confirmation Matching - User Guide: Release R15.000
No ratings yet
Confirmation Matching - User Guide: Release R15.000
25 pages
Top 5 Data Strategies Against Financial Crime
No ratings yet
Top 5 Data Strategies Against Financial Crime
23 pages
TIMELINE: Cybercrime Prevention Act of 2012: A Law 11 Years in The Making
No ratings yet
TIMELINE: Cybercrime Prevention Act of 2012: A Law 11 Years in The Making
4 pages
6 Ynu 7 U 7
No ratings yet
6 Ynu 7 U 7
4 pages
Lab03: Constructor and Destructor
No ratings yet
Lab03: Constructor and Destructor
9 pages
QRG - Bluebeam Revu - Scale and Measurement Tools
No ratings yet
QRG - Bluebeam Revu - Scale and Measurement Tools
7 pages
MS 203
No ratings yet
MS 203
278 pages
It Sector Report
No ratings yet
It Sector Report
25 pages
SEER Robotics Brandbook
No ratings yet
SEER Robotics Brandbook
61 pages
Business Applications: Tanmay Roy
No ratings yet
Business Applications: Tanmay Roy
32 pages

Big Data Analytics Lab Manual Full

Uploaded by

Big Data Analytics Lab Manual Full

Uploaded by

Big Data Analytics Lab Manual

Figure 2: Word-Document mapping from MapReduce output

Figure 3: HBase Table with Column Families

You might also like