0% found this document useful (0 votes)

50 views2 pages

Hadoop Compression

Hadoop compression reduces storage space and enhances processing performance in big data environments by compressing data stored in HDFS. It offers built-in codecs like gzip, bzip2, and Snappy, and allows for custom codecs. The document provides a step-by-step example of using gzip compression in a MapReduce job, detailing the necessary commands and configuration properties.

Uploaded by

akarshg638

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views2 pages

Hadoop Compression

Uploaded by

akarshg638

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Hadoop compression : It is the process of compressing data stored in Hadoop Distributed File System

(HDFS) to reduce storage space and improve processing performance. Compression is especially
important in big data environments where storage and processing requirements can quickly become
prohibitively expensive.

Hadoop provides a number of built-in compression codecs, including gzip, bzip2, and Snappy, that can
be used to compress and decompress data. In addition, Hadoop allows users to create custom
compression codecs if they have specific compression requirements.

Here is an example of how to use Hadoop compression using the gzip codec:

1. First, we need to create a Hadoop input directory and copy the input data into it:

$ hadoop fs -mkdir input

$ hadoop fs -put [Link] input/

2. Next, we run a MapReduce job that uses gzip compression on the input data. Here is an example
command to do this:

$ hadoop jar /path/to/[Link] \

-D [Link]=true \

-D [Link]=[Link] \

-input input \

-output output \

-mapper [Link] \

-reducer [Link]

In this command, we use the -D option to set two configuration properties:

 [Link]: This property enables compression for the intermediate

data produced by the mapper.

 [Link]: This property specifies the compression codec to use,

in this case the gzip codec.

The rest of the command is standard MapReduce job configuration, including the input and
output directories, and the mapper and reducer scripts.

3. After the job completes, we can view the output data using the following command:

$ hadoop fs -cat output/[Link] | zcat

This command uses the hadoop fs -cat command to print the output file to the console, and the zcat
command to decompress the output data using gzip.

Here is a diagram that shows the steps involved in Hadoop compression:

1. Input data is stored in HDFS in uncompressed form.

2. The MapReduce job is configured to enable compression and specify the compression codec to
be used. This can be done using the following configuration properties:

 [Link]: Enables compression for the intermediate data

produced by the mapper.

 [Link]: Specifies the compression codec to use.

These properties can be set using the hadoop jar command, as shown in the previous answer.

3. The input data is processed by the mapper, which produces intermediate data in uncompressed
form.

4. The intermediate data is passed through a compression stage, where it is compressed using the
specified codec.

5. The compressed data is then passed to the reducer, which decompresses the data as part of its
processing. This step is optional, depending on the requirements of the job.

6. The output data is stored in HDFS in uncompressed form.

Compression
No ratings yet
Compression
6 pages
Hadoop
No ratings yet
Hadoop
30 pages
Hadoop Compression Techniques Guide
100% (1)
Hadoop Compression Techniques Guide
3 pages
Big Data
No ratings yet
Big Data
28 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
Hadoop Primitives
No ratings yet
Hadoop Primitives
6 pages
Big Data
No ratings yet
Big Data
23 pages
Big Data PPT Unit 2 1
No ratings yet
Big Data PPT Unit 2 1
25 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
Hadoop I/O: Compression and Serialization
No ratings yet
Hadoop I/O: Compression and Serialization
36 pages
Hadoop Installation Steps
No ratings yet
Hadoop Installation Steps
16 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Data Analytics
No ratings yet
Data Analytics
26 pages
Hadoop Installation Steps
No ratings yet
Hadoop Installation Steps
10 pages
Hadoop MapReduce for Temperature Analysis
No ratings yet
Hadoop MapReduce for Temperature Analysis
22 pages
Bigdata Manual Final
No ratings yet
Bigdata Manual Final
66 pages
Hadoop Performance Tuning
100% (1)
Hadoop Performance Tuning
13 pages
Bda - Unit 3
No ratings yet
Bda - Unit 3
29 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
Lab Manual
No ratings yet
Lab Manual
86 pages
Hadoop I/O for Data Engineers
No ratings yet
Hadoop I/O for Data Engineers
36 pages
Install Apache Hadoop Guide
No ratings yet
Install Apache Hadoop Guide
6 pages
MapReduce - Notes
No ratings yet
MapReduce - Notes
17 pages
BDP 2023 04
No ratings yet
BDP 2023 04
25 pages
Unit 3 Topic 9 Hadoop Archives
No ratings yet
Unit 3 Topic 9 Hadoop Archives
32 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Data Science
No ratings yet
Data Science
82 pages
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
29 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
20 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
27 pages
BIGDATALABCURRENT
No ratings yet
BIGDATALABCURRENT
54 pages
Bda Lab Manual Print 3.6.24
No ratings yet
Bda Lab Manual Print 3.6.24
45 pages
BDA - Unit - II Big Data
No ratings yet
BDA - Unit - II Big Data
43 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
BDA Record
No ratings yet
BDA Record
58 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
33 pages
Big Data Printout
No ratings yet
Big Data Printout
46 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
Lab Manual
No ratings yet
Lab Manual
34 pages
Analyzing Data With Hadoop
No ratings yet
Analyzing Data With Hadoop
54 pages
Hadoop 1 Ref
No ratings yet
Hadoop 1 Ref
4 pages
Hive File Formats Guide
No ratings yet
Hive File Formats Guide
2 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
MapReduce Basics: Components & Code
No ratings yet
MapReduce Basics: Components & Code
25 pages
Hadoop: The Definitive Guide Unit 2 Part 2: Hadoop I/O
No ratings yet
Hadoop: The Definitive Guide Unit 2 Part 2: Hadoop I/O
26 pages
Unit II Hadoop IO
No ratings yet
Unit II Hadoop IO
27 pages
Ad8704 BDM Manual
No ratings yet
Ad8704 BDM Manual
46 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Hadoop Setup for CSE Students
No ratings yet
Hadoop Setup for CSE Students
17 pages
Big Data Analytics - Basics of Hadoop
No ratings yet
Big Data Analytics - Basics of Hadoop
15 pages
DSBDSAssingment 11
No ratings yet
DSBDSAssingment 11
20 pages
Concurrency Control, Need For Concurrency
No ratings yet
Concurrency Control, Need For Concurrency
11 pages
Veeam Backup & Replication 9.5 Update 4 Release Notes
No ratings yet
Veeam Backup & Replication 9.5 Update 4 Release Notes
29 pages
Ad3381 Set3
No ratings yet
Ad3381 Set3
4 pages
? Perfect Choice - PostgreSQL + PostGIS Is Exactly What Government GIS Projects Use in Production
No ratings yet
? Perfect Choice - PostgreSQL + PostGIS Is Exactly What Government GIS Projects Use in Production
4 pages
Instant Download SQL Server 2000 Stored Procedures Handbook 1st Edition Tony Bain PDF All Chapter
100% (20)
Instant Download SQL Server 2000 Stored Procedures Handbook 1st Edition Tony Bain PDF All Chapter
84 pages
DDL Commands - Creating Tables: Employee Table
No ratings yet
DDL Commands - Creating Tables: Employee Table
6 pages
Tableau Blueprint Planner
No ratings yet
Tableau Blueprint Planner
30 pages
DriveWorksProAdvanced V20
No ratings yet
DriveWorksProAdvanced V20
126 pages
QP23EP1 - 311: Time: 3 Hours Total Marks: 100
No ratings yet
QP23EP1 - 311: Time: 3 Hours Total Marks: 100
2 pages
Ontap 9.16.1
No ratings yet
Ontap 9.16.1
2,329 pages
Advanced SQL Data Structures and Optimization
No ratings yet
Advanced SQL Data Structures and Optimization
234 pages
Web Service-Based E-Bank Design
No ratings yet
Web Service-Based E-Bank Design
2 pages
QStar Archive Manager Data Sheet
No ratings yet
QStar Archive Manager Data Sheet
2 pages
DM Question Bank
No ratings yet
DM Question Bank
50 pages
MT6735 Android Scatter
0% (1)
MT6735 Android Scatter
7 pages
SPL Tcodes
No ratings yet
SPL Tcodes
5 pages
As ISO IEC 13249.6-2005 Information Technology - Database Languages - SQL Multimedia and Application Packages
No ratings yet
As ISO IEC 13249.6-2005 Information Technology - Database Languages - SQL Multimedia and Application Packages
13 pages
Advanced SQL Techniques for Data Analysis
No ratings yet
Advanced SQL Techniques for Data Analysis
17 pages
Multi Org Structure 11i
No ratings yet
Multi Org Structure 11i
18 pages
Fast17 Full Proceedings
No ratings yet
Fast17 Full Proceedings
417 pages
Class XI DBMS Notes
No ratings yet
Class XI DBMS Notes
4 pages
Working With Tables: Data Management and Retrieval: Samia Arshad
No ratings yet
Working With Tables: Data Management and Retrieval: Samia Arshad
12 pages
Business Intelligence Tools Guide
No ratings yet
Business Intelligence Tools Guide
59 pages
How To Activate The SoMachine or SoMachine Motion License
No ratings yet
How To Activate The SoMachine or SoMachine Motion License
5 pages
Dbms Important Questions and Answers 2
No ratings yet
Dbms Important Questions and Answers 2
67 pages
How To Improve ASCP Data Collections Performance
No ratings yet
How To Improve ASCP Data Collections Performance
6 pages
Scaler Masterclass on AI and ML
No ratings yet
Scaler Masterclass on AI and ML
10 pages
Allen Mottershead's Circuit Guide Issues
0% (7)
Allen Mottershead's Circuit Guide Issues
1 page
Spark SQL
No ratings yet
Spark SQL
41 pages
Introduction To DBMS Textbook Class 1000
No ratings yet
Introduction To DBMS Textbook Class 1000
13 pages

Hadoop Compression

Uploaded by

Hadoop Compression

Uploaded by

Hadoop compression : It is the process of compressing data stored in Hadoop Distributed File System

$ hadoop fs -mkdir input

$ hadoop fs -put [Link] input/

$ hadoop jar /path/to/[Link] \

In this command, we use the -D option to set two configuration properties:

 [Link]: This property enables compression for the intermediate

 [Link]: This property specifies the compression codec to use,

$ hadoop fs -cat output/[Link] | zcat

Here is a diagram that shows the steps involved in Hadoop compression:

1. Input data is stored in HDFS in uncompressed form.

 [Link]: Enables compression for the intermediate data

 [Link]: Specifies the compression codec to use.

6. The output data is stored in HDFS in uncompressed form.

You might also like