Study of HADOOP

Ankit Naik

Study of HADOOP

Ankit Naik

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Hadoop is a software framework that supports data intensive distributed application. Hadoop creates clusters of machine and coordinates the work among them. It include two major component, HDFS (Hadoop Distributed File System) and Map Reduce. HDFS is designed to store large amount of data reliably and provide high availability of data to user application running at client. It creates multiple data blocks and store each of the block redundantly across the pool of servers to enable reliable, extreme rapid computation. Map Reduce is software framework for the analyzing and transforming a very large data set in to desired output. This paper describe introduction of hadoop, types of hadoop, architecture of HDFS and Map Reduce, benefit of HDFS and Map Reduce.

Key takeaways

Hadoop consists of two main parts: Map Reduce and Hadoop Distributed File System (HDFS) , in which Map Reduce is responsible of parallel computing and the HDFS is responsible for data management.
After the NameNode accepted, the HDFS client directly writes the file to the assigned DataNodes [9].The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
HDFS is built using the Java language; any machine that supports Java can run the NameNode or the DataNode software.
The rate at which HDFS can supply data to the programming layers of Hadoop equates to faster batch processing times and quicker answers to complex analytic questions.
One benefit of HDFS is its portability between various Hadoop distributions, which helps minimize vendor lock-in.

Hanan M . Shukur, Lailan M Haji, Subhi R M Zeebaree, Rizgar R . Zebari

Technology Reports of Kansai University, 2020

The last days, the data and internet are become increasingly growing which occurring the problems in big-data. For these problems, there are many software frameworks used to increase the performance of the distributed system. This software is used for available of large data storage. One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop. This software creates machine clustering and formatting the work between them. The Hadoop consists of two major components which are Hadoop Distributed File System (HDFS) and Map Reduce (MR). By Hadoop, we can process, count and distribute of each word in a large file and know the number of affecting for each of them. In this paper, we will explain what is Hadoop and its architectures, how it works and its performance analysis in a distributed system according to many authors. In addition, assessing each paper and compare with each other.

Log In

Study of HADOOP

Sign up for access to the world's latest research

Abstract

Key takeaways

Related papers

Related papers