Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2014
A file system is used for the organization, storage,[1]retrieval, naming, sharing, and protection of files. Distributed file system has certain degrees of transparency to the user and the system such as access transparency,[2] location transparency, failure transparency, heterogeneity, replication transparency etc. [1][3]NFS (Network File System), RFS (Remote File Sharing), Andrew File System (AFS) are examples of Distributed file system. Distributed file systems are generally used for cloud computing applications based on [4] the MapReduce programming model. A MapReduce program consist of a Map () procedure that performs filtering and a Reduce () procedure that performs a summary operation. However, in a cloud computing environment, sometimes failure is occurs and nodes may be upgraded, replaced, and added in the system. Therefore load imbalanced problem arises. To solve this problem, load rebalancing algorithm is implemented in this paper so that central node should not overloaded. The implementation is done in hadoop distributed file system. As apache hadoop is used, security issues are arises. To solve these security issues and to increase security, [20] Kerberos authentication protocol is implemented to handle multiple nodes. This paper shows real time implementation experiment on cluster with result.
[1]A file system is used for the organization, storage,[1]retrieval, naming, sharing, and protection of files. Distributed file system has certain degrees of transparency to the user and the system such as access transparency,[2] location transparency, failure transparency, heterogeneity, replication transparency etc. [1][3]NFS (Network File System), RFS (Remote File Sharing), Andrew File System (AFS) are examples of Distributed file system. Distributed file systems are generally used for cloud computing applications based on [4] the MapReduce programming model. A MapReduce program consist of a Map () procedure that performs filtering and a Reduce () procedure that performs a summary operation. However, in a cloud computing environment, sometimes failure is occurs and nodes may be upgraded, replaced, and added in the system. Therefore load imbalanced problem arises. To solve this problem, load rebalancing algorithm is implemented in this paper so that central node should not overloaded. The implementation is done in hadoop distributed file system. As apache hadoop is used, security issues are arises. To solve these security issues and to increase security, [20] Kerberos authentication protocol is implemented to handle multiple nodes. This paper shows real time implementation experiment on cluster with result.
2014
Map Reduce programming paradigm plays a vital role in the development of cloud computing application using the Distributed file system where nodes concurrently provide computing as well as storage functions. Initially a file is partitioned into number of chunks allocated into different nodes so that Map Reduce technique can be performed in the nodes. Since cloud computing is a dynamic environment upgrading, replacing and adding new nodes to the environment is a frequent concern. This confidence is obviously insufficient in a large-scale, failure-prone atmosphere since the central load balancer is put under significant workload that is linearly scaled with the structure of the system range, and may lead to a performance bottleneck the single point of failure. To overcome the failure in this paper, a fully distributed load rebalancing algorithm is presented to handle the load imbalance problem. The proposed algorithm is compared alongside a centralized approach in a production system ...
This paper discusses a propose cloud system that mixes On-Demand allocation of resources with improved utilization, opportunistic provisioning of cycles from idle cloud nodes to alternative processes .Because for cloud computing to avail all the demanded services to the cloud customers is extremely troublesome. It's a significant issue to fulfil cloud consumer's needs. Hence On-Demand cloud infrastructure exploitation Hadoop configuration with improved C.P.U. utilization and storage hierarchy improved utilization is projected using Fair4s Job scheduling algorithm. therefore all cloud nodes that remains idle are all in use and additionally improvement in security challenges and achieves load balancing and quick process of huge information in less quantity of your time and method all kind of jobs whether or not it\\'s massive or little. Here we have a tendency to compare the GFS read write algorithm and Fair4s job scheduling algorithm for file uploading and file downloading; and enhance the C.P.U. utilization and storage utilization. Cloud computing moves the appliance software system and databases to the massive data centres, wherever the management of the information and services might not be totally trustworthy. thus this security drawback is finding by encrypting the information using encryption/decryption algorithm and Fair4s Job scheduling algorithm that solve the problem of utilization of all idle cloud nodes for larger data.
Journal of Information Security, 2022
Hadoop technology is followed by some security issues. At its beginnings, developers paid attention to the development of basic functionalities mostly, and proposal of security components was not of prime interest. Because of that, the technology remained vulnerable to malicious activities of unauthorized users whose purpose is to endanger system functionalities or to compromise private user data. Researchers and developers are continuously trying to solve these issues by upgrading Hadoop's security mechanisms and preventing undesirable malicious activities. In this paper, the most common HDFS security problems and a review of unauthorized access issues are presented. First, Hadoop mechanism and its main components are described as the introduction part of the leading research problem. Then, HDFS architecture is given, and all including components and functionalities are introduced. Further, all possible types of users are listed with an accent on unauthorized users, which are of great importance for the paper. One part of the research is dedicated to the consideration of Hadoop security levels, environment and user assessments. The review also includes an explanation of Log Monitoring and Audit features, and detail consideration of authorization and authentication issues. Possible consequences of unauthorized access to a system are covered, and a few recommendations for solving problems of unauthorized access are offered. Honeypot nodes, security mechanisms for collecting valuable information about malicious parties, are presented in the last part of the paper. Finally, the idea for developing a new type of Intrusion Detector, which will be based on using an artificial neural network, is presented. The detector will be an integral part of a new kind of virtual honeypot mechanism and represents the initial base for future scientific work of authors.
This paper discusses a propose cloud infrastructure that combines On-Demand allocation of resources with improved utilization, opportunistic provisioning of cycles from idle cloud nodes to other processes.Because for cloud computing to avail all the demanded services to the cloud consumers is very difficult. It is a major issue to meet cloud consumer's requirements. Hence On-Demand cloud infrastructure using Hadoop configuration with improved CPU utilization and storage utilization is proposed using splitting algorithm by using Map-Reduce. Henceall cloud nodes which remains idle are all in use and also improvement in security challenges and achieves load balancing and fast processing of large data in less amount of time. Here we compare the FTP and HDFS for file uploading and file downloading; and enhance the CPU utilization and storage utilization. Cloud computing moves the application software and databases to the large data centres, where the management of the data and services may not be fully trustworthy. Therefore this security problem is solve by encrypting the data using encryption/decryption algorithm and Map-Reducing algorithm which solve the problem of utilization of all idle cloud nodes for larger data.
User access limitations are very valuable in Hadoop distributed file systems to access the sensitive and personal data. Even though, user has access to the database, the access limit check is very relevant at the time of MapReduce to control the user and to receive only permissible data. Data nodes do not enforce any access control on its access to the data blocks (read or write). Therefore, Kerberos ticket granting model for user login and user access permissions to MapReduce jobs do not limit the unauthorized data to obtain from the access granted database. In addition, to secure the data during processing, the authentication and authorization of data is required. The problems broadly include a) who will access data, b) how it will be encrypted, and c) stability of data processing while the data are continuously growing. The current study includes the security mechanisms currently available in Hadoop systems, requirements of access control mechanisms, and change of access control ...
2015
Cloud computing having tremendous growth on recent years but it is not segregation on shared clouds. Distributed file systems are key building blocks for cloud computing applications based on the Map Reduce programming paradigm. In such file systems, nodes simultaneously serve computing and storage functions; a file is partitioned into a number of chunks allocated in distinct nodes so that Map Reduce tasks can be performed in parallel over the nodes. Data storage and communication which are to be done in huge amount, in such cases clouds are most provably used. "The cloud", also focuses on increasing the effectiveness of the public resources. Cloud resources are usually not only shared by multiple users but are also vigorously reallocated per demand. This can work for apportioning resources to users .But In the time of apportionment these are indeed .So In this paper we are introducing novel mechanism. We investigate to implement security provided for cloud computing and...
There has been a quick progress in cloud, with the growing amounts of associations turning number of associations relying upon use resources in the cloud; there is a requirement for securing the data of various customers using concentrated resource. Circulated capacity organizations avoid the cost stockpiling organizations dodges the cost exorbitant on programming, staff keeps up and gives better execution less limit cost and flexibility, cloud advantages through web which construct their presentation to limit security vulnerabilities however security is one of the critical weaknesses that balancing incomprehensible relationship to go into appropriated processing environment. The Proposed wear down HADOOP stockpiling strategies, Map reduces approach with synchronization between tasks and this purpose of interest and its impediments.
Cloud computing and Hadoop has become a new distributed storage model for most of the organizations, Industries etc. It provides a pay per use model in which customer has to only pay for the data he is storing on the cloud. However, relying on a single cloud storage provider has resulted in problems like vendor lock in. Therefore multi cloud environment is used to address the security and the data availability problems. In this paper, we proposed a system that uses Hadoop computing platform in multi cloud domain for storing customer's data reliably. Hadoop is used to conquer single point of failure problem which has been main issue in centralized environment as well as to deal with remote uploading.
2017
With the advent of technologies, managing tremendous amount of over flown and exponentially growing data is a major area of concern today. This is particularly in terms of storing and organizing data with security. The exponentially growing data due to Internet of Things (IoT) has led to many challenges for the governmental and non governmental organizations (NGOs). Security threats forced to the private and public organizations to develop their own Hadoop based cloud storage architecture .In Apache Hadoop architecture it creates various clusters of machines and efficiently coordinates the work among them. Hadoop Distributed File System-HDFS and Map Reduce are two important components of Hadoop. HDFS is the primary storage system used by different applications of Hadoop.It enables reliable and extremely rapid computations. HDFS provides rich and high availability of data to different user applications running at the client end. Map Reduce is a software framework for analyzing and tr...
IOSR Journal of Computer Engineering, 2014
Cloud Computing is an emerging technology, it is based on demand service in which shared resources, information, software and other devices are provided according to the clients to the requirements at specific time with the availability of internet. Load balancing is one of the challenging issue in cloud computing. An efficient load balancing makes cloud computing more efficient and improves user satisfaction. It includes fault tolerance, high availability, scalability, flexibility, reduced overhead for users, reduced cost of ownership, on demand services etc. Distributed file systems are key building blocks for cloud computing applications based on the Map Reduce programming paradigm. In such file systems, nodes at the same time serve computing and storage functions. Files can be created, deleted, and appended dynamically. This results in load imbalance in a distributed file system; that is, the file chunks are not distributed uniformly as possible among the nodes.
2014 4th International Conference on Computer and Knowledge Engineering (ICCKE), 2014
Trusted computing and security of services is one of the most challenging topics today and is the cloud computing's core technology that is currently the focus of international IT universe. Hadoop, as an open-source cloud computing and big data framework, is increasingly used in the business world, while the weakness of security mechanism now becomes one of the main problems obstructing its development. This paper first describes the hadoop project and its present security mechanisms, then analyzes the security problems and risks of it, pondering some methods to enhance its trust and security and finally based on previous descriptions, concludes Hadoop's security challenges.
International Journal of Science and Research (IJSR), 2017
Hadoop Framework is used to process big data in parallel fashion. A big data is not only big in size but also it is in different format, different size and with different speeds. To process big data relational database management system is not a suitable one. The hadoop is a most popular framework to process big data. Hadoop framework architecture has many components like name node data, data node, job tracker and task tracker. The performance of hadoop is dependent on how these components execute. The challenge in Hadoop framework is to reduce processing time of job, but these challenges are depend upon various factors like scheduling, performance of map reduce after data encryption, resource allocation, and data encryption. Proposed research is focused on how to overcome these challenge of scheduling, resource allocation, and security. Hadoop data security is also a proposed research area i.e. to find a best suitable encryption algorithm which encrypts hadoop data without affecting hadoop performance.
International Journal for Research in Applied Science and Engineering Technology IJRASET, 2020
Since Big data is so huge that it's become difficult to handle it, so it requires special technology which can handle bigdata. Hadoop is Apache Foundation's Framework which aims to provide efficient storage and analytics of big data; also, it is open system software. Two core technologies are associated with Hadoop i.e. HDFS and Map Reduce. HDFS is abbreviated for Hadoop Distributed File Technology, it's a special file system which provides efficient storage for big data in cluster of commodity hardware and based on stream access pattern. HDFS cluster Architecture is based on distributed file system therefore has Client server architecture. Since in HDFS cluster, there is no way to check the authenticity of client, therefore a method to incorporate Kerberos protocol in between Client and HDFS cluster is purposed to make the system secured. Kerberos is network authentication protocol which provides secured communication between client and server over unsecured network. Moreover, an Agent has been incorporated which is authorized to access the client's buffer and takes data out from it and loads it into HDFS cluster.
This paper discusses a propose cloud infrastructure that combines On-Demand allocation of resources with improved utilization, opportunistic provisioning of cycles from idle cloud nodes to other processes It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.Because for cloud computing to avail all the demanded services to the cloud consumers is very difficult. It is a major issue to meet cloud consumer's requirements. Hence On-Demand cloud infrastructure using map reduce configuration with improved CPU utilization and storage utilization is proposed using Google File System by using Map-Reduce. Hence all cloud nodes which remains idle are all in use and also improvement in security challenges and achieves load balancing and fast processing of large data in less amount of time. Here we compare the FTP and GFS for file uploading and file downloading; and enhance the CPU utilization and storage utilization and fault tolerance,. Cloud computing moves the application software and databases to the large data centres, where the management of the data and services may not be fully trustworthy. Therefore this security problem is solve by encrypting the data using encryption/decryption algorithm and Map-Reducing algorithm which solve the problem of utilization of all idle cloud nodes for larger data.
Contemporary Engineering Sciences, 2015
File storage load can be balanced in the storage nodes avail in the cloud system by using totally distributed load rebalancing algorithm. Large level distributed systems such as cloud applications come with rising challenges on how to transfer and where to store compute data. Cloud computing is a distributed computing over a network. Node concurrently serves as a computing and storage task. In cloud computing environment, files can also be dynamically created, deleted and append. The file chunks are not distributed as equally among the nodes. Lead to load inequity in a distributed file system. The existing distributed file system depends on single node to manage almost all operations such as chunk reallocation of every data block in the file system. As a result it can be bottleneck resource and a single point of failure. A new technique Random Linear Network Coding (RLNC) is employed in the proposed system. RLNC is performed at the opening when the file is stored in the cloud. Using this strategy, a file will be split into different parts and send distinct parts to each chunk server. RSA algorithm applied to calculate the response time file size and deadlock detection. RSA algorithm used to detect the anomalies of the system. Dynamic scheduling algorithm present in the proposed system to overcome to load inequity problem. This algorithm is compared against a centralized approach in a production system. The results indicate that our proposal is considerably outperforms the prior distributed in terms of load inequity factor, movement cost, and algorithmic overhead.
Asian Journal of Research in Computer Science, 2021
In the last few days, data and the internet have become increasingly growing, occurring in big data. For these problems, there are many software frameworks used to increase the performance of the distributed system. This software is used for available ample data storage. One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop. This software creates machine clustering and formatting the work between them. Hadoop consists of two major components: Hadoop Distributed File System (HDFS) and Map Reduce (MR). By Hadoop, we can process, count, and distribute each word in a large file and know the number of affecting for each of them. The HDFS is designed to effectively store and transmit colossal data sets to high-bandwidth user applications. The differences between this and other file systems provided are relevant. HDFS is intended for low-cost hardware and is exceptionally tolerant to defects. Thousands of computers in a vast cluster both have directly associated storage functions and user programmers. The resource scales with demand while being cost-effective in all sizes by distributing storage and calculation through numerous servers. Depending on the above characteristics of the HDFS, many researchers worked in this field trying to enhance the performance and efficiency of the addressed file system to be one of the most active cloud systems. This paper offers an adequate study to review the essential investigations as a trend beneficial for researchers wishing to operate
For cloud computing applications the Distributed file system is used as a key building block which is simply a classical model. In such file system a file is partitioned into a number of chunks allocated in distinct nodes .Each chunk allocates to separate node to perform MapReduce function parallel over each node. In cloud, the central node (master in MapReduce) becomes bottleneck if number of storage nodes, number of files and assesses to that file increases. In this survey paper to overcome the above load imbalance problem the fully distributed load rebalancing algorithm is used to exclude the load on central node and also the movement cost is reduced. In this paper the load misbalancing problem is overcome.
2020
Developing a confident Hadoop essentially a cloud computing is an essential challenge as the cloud. The protection policy can be utilized during various cloud services such as Platform as a Service (PaaS), Infrastructure as a Service (IaaS), and Software as a Service (SaaS) and also can support most requirements in cloud computing. This event motivates the need of a policy which will control these challenges. Hadoop may be a used policy recommended to beat this big data problem which usually utilizes MapReduce design to arrange huge amounts of information of the cloud system. Hadoop has no policy to ensure the privacy and protection of the files saved within the Hadoop Distributed File System (HDFS). Within the cloud, the safety of sensitive data may be a significant problem within which encryption schemes play an avital rule. This paper proposes a hybrid method between pair well-known asymmetric key cryptosystems (RSA and Rabin) to cipher the files saved in HDFS. Therefore, before ...
Cloud computing is an upcoming era in software industry. It's a very vast and developing technology. Distributed file systems play an important role in cloud computing applications based on map reduce techniques. While making use of distributed file systems for cloud computing, nodes serves computing and storage functions at the same time. Given file is divided into small parts to use map reduce algorithms in parallel. But the problem lies here since in cloud computing nodes may be added, deleted or modified any time and also operations on files may be done dynamically. This causes the unequal load distribution of load among the nodes which leads to load imbalance problem in distributed file system. Newly developed distributed file system mostly depends upon central node for load distribution but this method is not helpful in large-scale and where chances of failure are more. Use of central node for load distribution creates a problem of single point dependency and chances of performance of bottleneck are more. As well as issues like movement cost and network traffic caused due to migration of nodes and file chunks need to be resolved. So we are proposing algorithm which will overcome all these problems and helps to achieve uniform load distribution efficiently. To verify the feasibility and efficiency of our algorithm we will be using simulation setup and compare our algorithm with existing techniques for the factors like load imbalance factor, movement cost and network traffic.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.