0% found this document useful (0 votes)

31 views20 pages

CC Mini Project Report

Uploaded by

03rajput.ki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views20 pages

CC Mini Project Report

Uploaded by

03rajput.ki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Savitribai Phule Pune University

Modern Education Society’s Wadia College of Engineering,

Pune
19, Bund Garden, V.K. Joag Path, Pune – 411001.

ACCREDITED BY NBA AND NAAC WITH ’A++’ GRADE

DEPARTMENT OF COMPUTER ENGINEERING

MINI PROJECT REPORT

SUBMITTED BY
Ms. Khushi Ranjitsing Rajput (55)
Ms. Anushka Jaywant Kharade(56)
Ms. Aditi Pankaj Pawar(58)

(Academic Year: 2023-2024)

Savitribai Phule Pune University
Modern Education Society’s Wadia College of Engineering,
Pune
19, Bund Garden, V.K. Joag Path, Pune – 411001.

ACCREDITED BY NBA AND NAAC WITH ’A++’ GRADE )

DEPARTMENT OF COMPUTER ENGINEERING

Certificate

This is to certify that the “Mini Project Reposrt” submitted by Khushi

Ranjitsing Rajput(55), Anushka Jaywant Kharade(56), Aditi Pankaj
Pawar(58)is work done by her and submitted during the 2023-24 academic year, in
partial fulfillment of the requirements for the award of the degree of BACHELOR
OF ENGINEERING in COMPUTER ENGINEERING, at MES Wadia
College of Engineering,Pune.

Prof.A.D.Dhawae (Dr.(Mrs.) N. F. Shaikh)

Supervisor Head of Department
Internship Report

ACKNOWLEDGEMENT

I would like to express my sincere gratitude to our HOD Dr. Mrs. N.F.Shaikh
for guiding us through the mini project on cloud computing, and for their continuous
support and encouragement during the process.
Special thanks to my subject teacher and mentor Prof. A.D. Dhawale for his in-
valuable mentorship, patience, and expertise during my mini project. Additionally, I
appreciate my entire team for their collaborative spirit.

Ms.Khushi Ranjitsing Rajput(F21111066)

T.E. Computer

Dept. of Computer Engineering i

Contents

1 INTRODUCTION 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 PROBLEM STATEMENT 3
2.1 Project Problem Statements . . . . . . . . . . . . . . . . . . . . . . . . 3

3 SYSTEM ANALYSIS 4
3.1 Architecture of Hadoop with Focus on HDFS: . . . . . . . . . . . . . . 4
3.2 Key Features of HDFS: . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Hardware Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.4 Software Requirement Specifications . . . . . . . . . . . . . . . . . . . . 6

4 METHODOLOGY 7

5 RESULTS 11

6 CONCLUSION 13

7 REFERENCES 14

ii
List of Figures

3.1 Hdfs Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4.1 Implementation step 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.2 Implementation step 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.1 Resultant on browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

iii
Abstract

This report presents the implementation and evaluation of a file management sys-
tem using Hadoop Distributed File System (HDFS) within the context of beginner-level
cloud computing. The project aimed to demonstrate the scalability, fault tolerance,
and efficiency of HDFS in handling basic file management tasks, including file creation,
retrieval, and deletion. Through practical experimentation and testing, the system’s
performance and reliability were rigorously evaluated, showcasing its suitability for
cloud computing projects of varying complexities. The results highlight the system’s
ability to efficiently manage data across distributed clusters, maintain data integrity,
and provide a user-friendly interface for executing file management operations. Fur-
thermore, the report discusses the significance of implementing a file management
system using HDFS in the context of cloud computing, emphasizing its role in stream-
lining data storage, retrieval, and processing tasks, and fostering innovation in data
management practices. Lastly, potential future enhancements and extensions to the
project are discussed, offering insights into opportunities for further exploration and
improvement in cloud-based file management and data processing.
Chapter 1

INTRODUCTION

1.1 Introduction
Cloud computing has revolutionized technology by providing on-demand access to com-
puting services over the internet, transforming how businesses and individuals utilize
computational resources. This shift offers unparalleled flexibility, scalability, and ac-
cessibility globally. Cloud computing’s significance lies in its unmatched scalability,
cost-effectiveness, and accessibility, allowing organizations to dynamically adjust re-
sources, optimize IT budgets, and collaborate seamlessly from anywhere. Its reliable
infrastructure ensures high availability and fault tolerance, while democratizing access
to advanced technologies like AI and big data analytics, driving innovation and growth
in the digital era.
The project focuses on implementing a File Management System using Hadoop Dis-
tributed File System (HDFS). In distributed systems like Hadoop, efficient file man-
agement is crucial for handling large volumes of data across multiple nodes. HDFS
provides a scalable and fault-tolerant solution for storing and managing files in such
environments. This project explores the significance of effective file management in dis-
tributed systems and demonstrates how HDFS enables seamless storage, retrieval, and
manipulation of data, laying the foundation for robust data processing and analytics
workflows.

1
Internship Report

1.2 Motivation
The project on ’File Management System using HDFS’ is motivated by the escalating
importance of big data and the necessity for scalable file storage solutions to handle
the massive volumes of data generated in today’s digital era. Traditional file systems
often struggle to manage such extensive datasets, prompting the exploration of HDFS
(Hadoop Distributed File System) as a distributed file storage solution tailored for
big data workloads. HDFS, with its fault-tolerant architecture and scalability fea-
tures, offers a platform capable of storing and processing vast amounts of data across
distributed clusters, aligning with the evolving data storage and processing needs of
modern organizations.
Hadoop and HDFS play a crucial role in managing large data volumes across dis-
tributed clusters by enabling parallel processing of massive datasets on commodity
hardware. The scalability and cost-effectiveness of Hadoop and HDFS make them es-
sential tools for organizations dealing with growing data requirements, allowing for effi-
cient storage, management, and analysis of extensive datasets from diverse sources like
IoT devices and social media platforms. These technologies have become indispensable
for organizations seeking to leverage big data analytics for informed decision-making
and driving innovation in a data-driven world.

Dept. of Computer Engineering 2

Chapter 2

PROBLEM STATEMENT

2.1 Project Problem Statements

The project aims to address the challenges faced by traditional file management systems
in distributed environments, particularly in handling big data. These challenges include
limited scalability, performance bottlenecks, and lack of fault tolerance, which hinder
efficient data management and processing across distributed clusters. By emphasizing
the need for a robust and scalable file management system capable of handling big
data, the project seeks to explore the implementation of HDFS (Hadoop Distributed
File System) as a solution to overcome these challenges and enable organizations to
effectively store, manage, and analyze large datasets in distributed environments with
enhanced scalability, fault tolerance, and performance.

3
Chapter 3

SYSTEM ANALYSIS

3.1 Architecture of Hadoop with Focus on HDFS:

• Master-Slave Architecture: Hadoop follows a master-slave architecture where
there is one master node (NameNode) and multiple slave nodes (DataNodes).
The NameNode manages the file system namespace and regulates access to files
by clients. DataNodes store the actual data blocks of files and execute read and
write requests.

• NameNode: Acts as the central metadata repository for the file system. Stores
information about the directory tree structure, file permissions, and the mapping
of data blocks to DataNodes. Handles client requests for file system operations
such as opening, closing, and renaming files.

• DataNodes: Store and manage the actual data blocks of files in the distributed file
system. Report their status to the NameNode periodically, providing information
about available storage capacity and health.

Figure 3.1: Hdfs Architecture

4
Internship Report

3.2 Key Features of HDFS:

1. Fault Tolerance: HDFS achieves fault tolerance through data replication. Data
blocks are replicated across multiple DataNodes, typically three replicas by de-
fault. If a DataNode fails, the replicas hosted on other DataNodes ensure data
availability and durability.

2. Scalability: HDFS is designed to scale horizontally to accommodate growing data

volumes. New DataNodes can be added to the cluster to increase storage capacity
and throughput. The distributed nature of HDFS allows it to handle petabytes
or even exabytes of data seamlessly.

3. Data Locality: Data locality is a fundamental principle of HDFS that aims to

optimize data processing performance. HDFS tries to execute computations on
the nodes where the data resides to minimize data movement across the network.
By co-locating computation with data, HDFS reduces network congestion and
improves processing efficiency.

4. High Throughput: HDFS is optimized for streaming data access patterns, making
it suitable for applications that require high throughput. It prioritizes sequen-
tial reads and writes over random access, making it ideal for large-scale data
processing tasks such as batch processing and data warehousing.

5. Data Integrity: HDFS ensures data integrity through checksums and periodic
data integrity checks. Checksums are used to verify data consistency during reads
and writes. Periodic data integrity checks detect and correct data corruption
issues proactively.

6. Compression: HDFS supports data compression to reduce storage requirements

and improve data transfer efficiency. It provides built-in compression codecs such
as Gzip, Snappy, and LZO, allowing users to compress data transparently.

These key features make HDFS a robust and reliable distributed file system, well-
suited for storing and processing large-scale data sets in Hadoop clusters.

3.3 Hardware Infrastructure

• Require high processing power and memory

• Large-scale distributed storage (HDDs or SSDs).

• Networking Equipment: Gigabit Ethernet or higher-speed networking.Network

switches for inter-node communication.

• Rack Infrastructure: Rack-mounted servers for space efficiency. Proper cable

management and PDUs.

• Cluster management software and monitoring systems.

• Redundant hardware components and failover mechanisms.

• Off-site backup storage and replication.

Dept. of Computer Engineering 5

Internship Report

3.4 Software Requirement Specifications

• Apache Hadoop

• Linux-based operating system

• Java Development Kit (JDK)

• Hadoop Command-Line Interface (CLI)

• Hadoop Distributed File System (HDFS)

• Hadoop Common

• Additional Tools and Libraries:

1. Apache Hive
2. Apache HBase
3. Apache Spark
4. Apache Pig
5. Apache Sqoop
6. Apache Flume
7. Apache Oozie
8. Apache Mahout

Dept. of Computer Engineering 6

Chapter 4

METHODOLOGY

The methodology for implementing file management tasks in Hadoop involves a series
of structured steps to leverage the Hadoop ecosystem effectively. Initially, the process
commences by starting the Hadoop daemon and initializing the local host, setting
the groundwork for subsequent operations. Accessing the Hadoop Distributed File
System (HDFS) through a web browser interface provides a centralized platform for file
management actions. From there, utilizing command-line interfaces becomes pivotal for
executing various operations. Creating files and directories is achieved through specific
commands, such as ’hadoop fs -touchz’ for files and ’hadoop fs -mkdir’ for directories,
allowing users to establish and organize data structures within the distributed file
system seamlessly.
1. Start Hadoop Daemon and Initialize Local Host: Begin by launching the Hadoop
daemon and initializing the local host on your workstation.
2. Access HDFS System Files: Open a web browser and navigate to ”localhost:9870”
to access the Hadoop Distributed File System (HDFS) through the Hadoop web
interface. Within the utilities section, select ”Browse” to explore system files and
directories stored in HDFS.
3. Creating Files: Open the command prompt. Utilize the following commands to
create files:
Example: hadoop fs -touchz /1.txt
Additional example: hadoop fs -touchz /docx
4. Creating Directories: Use the following command to create directories:
hadoop fs -mkdir /mydir
5. Creating Subdirectories: Employ the command below to create subdirectories
within existing directories: hadoop fs -mkdir /mydir/dirl
6. Retrieving Files: For retrieving files from HDFS:
Utilize the command: hadoop fs -get /path/to/1.txt c: local Alternately, use:
hadoop fs -put /path/to/1.txt c: local for uploading files.
7. Deleting Files: To delete a file from HDFS, execute:
hadoop fs -rm /1.txt
8. Deleting Directories: For deleting directories:
Utilize the command: hadoop fs -rm -r /directoryname

7
Internship Report

Figure 4.1: Implementation step 1

Dept. of Computer Engineering 8

Internship Report

Figure 4.2: Implementation step 2

Following these steps ensured effective utilization of Hadoop for file management
tasks, enabling the creation, deletion, and retrieval of files and directories within the
HDFS environment.
Moreover, the methodology underscores the importance of clarity and precision in
command execution to avoid errors and streamline operations effectively. Retrieving
files from HDFS involves employing ’hadoop fs -get’ or ’hadoop fs -put’ commands, fa-
cilitating seamless data transfer between local and distributed file systems. Similarly,
deletion operations for files and directories are executed with the ’hadoop fs -rm’ com-
mand, ensuring efficient management of data resources within the Hadoop ecosystem.
By adhering to this structured methodology, users can harness the power of Hadoop
for file management tasks, enabling efficient creation, retrieval, and deletion of files
and directories, thereby facilitating robust data management practices in distributed
computing environments.

Dept. of Computer Engineering 9

Internship Report

Figure 4.3: Terminal

Dept. of Computer Engineering 10

Chapter 5

RESULTS

The project’s results demonstrate the successful implementation of a file management

system using Hadoop Distributed File System (HDFS) for our cloud computing subject.
Through practical experimentation and testing, we showcased the system’s ability to
handle basic file management tasks efficiently within a Hadoop cluster environment.
The system exhibited notable scalability, allowing it to manage varying workloads and
data volumes effectively, showcasing its suitability for beginner-level cloud computing
projects.

Figure 5.1: Resultant on browser

Furthermore, the fault tolerance mechanisms inherent in HDFS were demonstrated,

ensuring data integrity and availability even under simulated failure scenarios. The
simplicity and ease of use of the system’s command-line interface (CLI) were evident,

11
Internship Report

providing us with a straightforward means of executing file management operations

using familiar commands. Overall, the project’s results underscore our grasp of funda-
mental cloud computing concepts and our ability to apply them in practical scenarios,
setting a strong foundation for our future endeavors in the field.

Dept. of Computer Engineering 12

Chapter 6

CONCLUSION

In our project, we successfully implemented a file management system using Hadoop

Distributed File System (HDFS), showcasing its scalability, fault tolerance, and ef-
ficiency within a beginner-level cloud computing context. Through rigorous testing,
we demonstrated the system’s ability to handle basic file management tasks, such as
creating, retrieving, and deleting files and directories, while maintaining consistent
performance. The system’s scalability was evident as it efficiently managed varying
workloads and data volumes, showcasing its suitability for cloud computing projects of
varying complexities. Additionally, the fault tolerance mechanisms inherent in HDFS
ensured data integrity and availability, even under simulated failure scenarios, high-
lighting the reliability of the system in handling adverse conditions. The simplicity
and ease of use of the system’s command-line interface (CLI) provided a user-friendly
means of executing file management operations, underscoring our grasp of fundamental
cloud computing concepts and our ability to apply them effectively.
Implementing a file management system using HDFS is significant in the context
of cloud computing as it provides a scalable, reliable, and efficient solution for manag-
ing large volumes of data across distributed clusters. This not only streamlines data
storage and retrieval processes but also lays the foundation for more advanced data
processing and analytics tasks. By leveraging HDFS, organizations can unlock new
possibilities for collaboration, innovation, and insights, driving efficiency and competi-
tiveness in the digital era. Furthermore, integrating fault tolerance mechanisms ensures
data reliability and continuity, enhancing the resilience of cloud-based applications and
services. Moving forward, potential enhancements to the project could include explor-
ing the integration of additional Hadoop ecosystem components, optimizing the system
for specific use cases, and incorporating advanced security features to address evolving
data privacy and security concerns in cloud environments.

13
Chapter 7

REFERENCES

[1]https://aws.amazon.com/what-is/hadoop/.

[2]https://www.simplilearn.com/tutorials/hadoop-tutorial

[3]https://www.youtube.com/watch?v=UcU8XiqL7MA

[4]https://www.databricks.com/glossary/hadoop-distributed-file-system-hdfs.

[5]https://data-flair.training/blogs/top-hadoop-hdfs-commands-tutorial/

[6]https://www.youtube.com/watch?v=JcRqicngvrA

[7]https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCom

Assignment-3 Bda
No ratings yet
Assignment-3 Bda
5 pages
CC Report Dhanu
No ratings yet
CC Report Dhanu
23 pages
Hadoop MapReduce Cloud Service Model
No ratings yet
Hadoop MapReduce Cloud Service Model
32 pages
Hadoop Distributed File System (HDFS) : Suresh Pathipati
No ratings yet
Hadoop Distributed File System (HDFS) : Suresh Pathipati
43 pages
Installation and Configuration System Tool For Hadoop
No ratings yet
Installation and Configuration System Tool For Hadoop
30 pages
CC Mini Project Report
No ratings yet
CC Mini Project Report
13 pages
CC MiniProj Hadoop Final Both
No ratings yet
CC MiniProj Hadoop Final Both
21 pages
Load Balancing ME
No ratings yet
Load Balancing ME
60 pages
Project Format FOR AKTU
No ratings yet
Project Format FOR AKTU
18 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
25 pages
CAPSTONE PROJECTInstallation PDF
No ratings yet
CAPSTONE PROJECTInstallation PDF
33 pages
Decentralising Big Data Processing
No ratings yet
Decentralising Big Data Processing
59 pages
HDFS Essentials for Data Engineers
No ratings yet
HDFS Essentials for Data Engineers
22 pages
Final Synopsis
No ratings yet
Final Synopsis
7 pages
CC Mini Project 5
No ratings yet
CC Mini Project 5
9 pages
BG 345
No ratings yet
BG 345
26 pages
HDFS: Big Data Storage Solution
No ratings yet
HDFS: Big Data Storage Solution
14 pages
Hadoop: Big Data Processing Essentials
No ratings yet
Hadoop: Big Data Processing Essentials
19 pages
Merchant Rating System Using Hadoop MapReduce
No ratings yet
Merchant Rating System Using Hadoop MapReduce
32 pages
Hadoop Ecosystem & HDFS Guide
No ratings yet
Hadoop Ecosystem & HDFS Guide
46 pages
Lec 4
No ratings yet
Lec 4
27 pages
HDFS Concepts and Command Line Guide
No ratings yet
HDFS Concepts and Command Line Guide
42 pages
CC Report Rass
No ratings yet
CC Report Rass
13 pages
Cloud-Based File Server Application
No ratings yet
Cloud-Based File Server Application
34 pages
Optimize Small Files in Hadoop
No ratings yet
Optimize Small Files in Hadoop
62 pages
File System and Storage PDF
No ratings yet
File System and Storage PDF
12 pages
Guo Thesis V5
No ratings yet
Guo Thesis V5
196 pages
HDFS Architecture Guide: by Dhruba Borthakur
No ratings yet
HDFS Architecture Guide: by Dhruba Borthakur
13 pages
Big Data and Hadoop Ecosystem Overview
No ratings yet
Big Data and Hadoop Ecosystem Overview
260 pages
Final Report by Binu
No ratings yet
Final Report by Binu
77 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
Hadoop & Big Data for Tech Students
No ratings yet
Hadoop & Big Data for Tech Students
45 pages
Hadoop HDFS Notes
No ratings yet
Hadoop HDFS Notes
4 pages
Lec 4
No ratings yet
Lec 4
28 pages
Lec4 Merged
No ratings yet
Lec4 Merged
84 pages
BDA CW Chapter 2
No ratings yet
BDA CW Chapter 2
6 pages
Unit 3 - BD - Hadoop Ecosystem
No ratings yet
Unit 3 - BD - Hadoop Ecosystem
42 pages
Final Report
No ratings yet
Final Report
40 pages
Be Project
No ratings yet
Be Project
75 pages
An Optimized and Effective Big Data Hadoop Processing Environment Using Job Schedulers
No ratings yet
An Optimized and Effective Big Data Hadoop Processing Environment Using Job Schedulers
6 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Understanding Apache Hadoop Ecosystem
No ratings yet
Understanding Apache Hadoop Ecosystem
48 pages
? Chapter 2 - Hadoop & HDFS - Detailed Summary
No ratings yet
? Chapter 2 - Hadoop & HDFS - Detailed Summary
4 pages
Project Proposal - HDFS
No ratings yet
Project Proposal - HDFS
1 page
15 The End 2o
No ratings yet
15 The End 2o
71 pages
Alluxio
No ratings yet
Alluxio
95 pages
8th Sem Internship Report Tejas
No ratings yet
8th Sem Internship Report Tejas
44 pages
Project Stage I Report Format
No ratings yet
Project Stage I Report Format
50 pages
Engineering Students' Hadoop Guide
No ratings yet
Engineering Students' Hadoop Guide
31 pages
Introduction to Hadoop & DFS
No ratings yet
Introduction to Hadoop & DFS
34 pages
Project Title in Bold
No ratings yet
Project Title in Bold
19 pages
A Novel Architecture To Efficient Utilization of Hadoop Distributed File Systems For Small Files
No ratings yet
A Novel Architecture To Efficient Utilization of Hadoop Distributed File Systems For Small Files
8 pages
Identity-Based Integrity Auditing for Cloud
No ratings yet
Identity-Based Integrity Auditing for Cloud
69 pages
Big-Data Unit-4
No ratings yet
Big-Data Unit-4
10 pages
BDA Report
No ratings yet
BDA Report
20 pages
10th August Morning and Afternoon Session Hadoop
No ratings yet
10th August Morning and Afternoon Session Hadoop
18 pages
Unit4 - 1
No ratings yet
Unit4 - 1
13 pages
Evolution of Concrete Arch Bridges
No ratings yet
Evolution of Concrete Arch Bridges
14 pages
Lecture 28 Corticosteroids & Antagonists
No ratings yet
Lecture 28 Corticosteroids & Antagonists
24 pages
AutoCAD Plant 3D Tips & Tricks Guide
No ratings yet
AutoCAD Plant 3D Tips & Tricks Guide
19 pages
IOM Proc Man ANNEX 9 Request For Proposal EN
No ratings yet
IOM Proc Man ANNEX 9 Request For Proposal EN
45 pages
Logic Problems: Doors and Traffic Lights
0% (1)
Logic Problems: Doors and Traffic Lights
28 pages
Module 5
No ratings yet
Module 5
17 pages
PermaFlux Overview Presentation - May 2024
No ratings yet
PermaFlux Overview Presentation - May 2024
31 pages
Book Review Harry Potter and The Cursed Child
No ratings yet
Book Review Harry Potter and The Cursed Child
1 page
MXWD As of Jun 28 20221
No ratings yet
MXWD As of Jun 28 20221
124 pages
Instruction Classification: Unit 2
No ratings yet
Instruction Classification: Unit 2
34 pages
VNX - Su Clio 2 PDF
94% (18)
VNX - Su Clio 2 PDF
1,290 pages
F0 SINAvf
No ratings yet
F0 SINAvf
12 pages
Nokia Case Study
No ratings yet
Nokia Case Study
5 pages
Experiment 11 - Smell and Taste
No ratings yet
Experiment 11 - Smell and Taste
20 pages
Channel Service
No ratings yet
Channel Service
1 page
? P4 - 02
No ratings yet
? P4 - 02
2 pages
Chapter 2 BS 2 2PUC
No ratings yet
Chapter 2 BS 2 2PUC
18 pages
Projects R12 New Features
100% (1)
Projects R12 New Features
7 pages
Husqvarna 701 Service Manual Guide
0% (3)
Husqvarna 701 Service Manual Guide
406 pages
Fluids and Electrolytes Exam
100% (3)
Fluids and Electrolytes Exam
3 pages
Never Fear, Never Quit - A Story of Courage and Perseverance PDF
100% (2)
Never Fear, Never Quit - A Story of Courage and Perseverance PDF
68 pages
Installation Instructions: 27" and 30" Electric Built-In Microwave/ Oven Combination
No ratings yet
Installation Instructions: 27" and 30" Electric Built-In Microwave/ Oven Combination
5 pages
Earth House in Merida
No ratings yet
Earth House in Merida
6 pages
Encyclopedia of Propeller Airliners
100% (11)
Encyclopedia of Propeller Airliners
272 pages
Field Study 2
100% (2)
Field Study 2
24 pages
Computer Network Basics and Types
No ratings yet
Computer Network Basics and Types
29 pages
Greetings and Introductions in Spanish
No ratings yet
Greetings and Introductions in Spanish
7 pages
Stauff Clamps Heavy Series
No ratings yet
Stauff Clamps Heavy Series
19 pages
XC-A30 ESR Analyzer User Manual
100% (1)
XC-A30 ESR Analyzer User Manual
25 pages
My NASA Data - Data Literacy Cube - Final - 0
No ratings yet
My NASA Data - Data Literacy Cube - Final - 0
30 pages

CC Mini Project Report

Uploaded by

CC Mini Project Report

Uploaded by

Savitribai Phule Pune University

Modern Education Society’s Wadia College of Engineering,

ACCREDITED BY NBA AND NAAC WITH ’A++’ GRADE

DEPARTMENT OF COMPUTER ENGINEERING

MINI PROJECT REPORT

(Academic Year: 2023-2024)

ACCREDITED BY NBA AND NAAC WITH ’A++’ GRADE )

DEPARTMENT OF COMPUTER ENGINEERING

This is to certify that the “Mini Project Reposrt” submitted by Khushi

Prof.A.D.Dhawae (Dr.(Mrs.) N. F. Shaikh)

Ms.Khushi Ranjitsing Rajput(F21111066)

Dept. of Computer Engineering i

3.1 Hdfs Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4.1 Implementation step 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5.1 Resultant on browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Dept. of Computer Engineering 2

2.1 Project Problem Statements

3.1 Architecture of Hadoop with Focus on HDFS:

Figure 3.1: Hdfs Architecture

3.2 Key Features of HDFS:

2. Scalability: HDFS is designed to scale horizontally to accommodate growing data

3. Data Locality: Data locality is a fundamental principle of HDFS that aims to

6. Compression: HDFS supports data compression to reduce storage requirements

3.3 Hardware Infrastructure

• Large-scale distributed storage (HDDs or SSDs).

• Networking Equipment: Gigabit Ethernet or higher-speed networking.Network

• Rack Infrastructure: Rack-mounted servers for space efficiency. Proper cable

• Cluster management software and monitoring systems.

• Redundant hardware components and failover mechanisms.

• Off-site backup storage and replication.

Dept. of Computer Engineering 5

3.4 Software Requirement Specifications

• Linux-based operating system

• Java Development Kit (JDK)

• Hadoop Command-Line Interface (CLI)

• Hadoop Distributed File System (HDFS)

• Additional Tools and Libraries:

Dept. of Computer Engineering 6

Figure 4.1: Implementation step 1

Dept. of Computer Engineering 8

Figure 4.2: Implementation step 2

Dept. of Computer Engineering 9

Figure 4.3: Terminal

Dept. of Computer Engineering 10

The project’s results demonstrate the successful implementation of a file management

Figure 5.1: Resultant on browser

Furthermore, the fault tolerance mechanisms inherent in HDFS were demonstrated,

providing us with a straightforward means of executing file management operations

Dept. of Computer Engineering 12

In our project, we successfully implemented a file management system using Hadoop

You might also like