Academia.eduAcademia.edu

Study of HADOOP

Abstract

Hadoop is a software framework that supports data intensive distributed application. Hadoop creates clusters of machine and coordinates the work among them. It include two major component, HDFS (Hadoop Distributed File System) and Map Reduce. HDFS is designed to store large amount of data reliably and provide high availability of data to user application running at client. It creates multiple data blocks and store each of the block redundantly across the pool of servers to enable reliable, extreme rapid computation. Map Reduce is software framework for the analyzing and transforming a very large data set in to desired output. This paper describe introduction of hadoop, types of hadoop, architecture of HDFS and Map Reduce, benefit of HDFS and Map Reduce.

Key takeaways

  • Hadoop consists of two main parts: Map Reduce and Hadoop Distributed File System (HDFS) , in which Map Reduce is responsible of parallel computing and the HDFS is responsible for data management.
  • After the NameNode accepted, the HDFS client directly writes the file to the assigned DataNodes [9].The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
  • HDFS is built using the Java language; any machine that supports Java can run the NameNode or the DataNode software.
  • The rate at which HDFS can supply data to the programming layers of Hadoop equates to faster batch processing times and quicker answers to complex analytic questions.
  • One benefit of HDFS is its portability between various Hadoop distributions, which helps minimize vendor lock-in.