Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Cloud computing is a distributed computing technology which is the combination of hardware and software and delivered as a service to store, manage and process data. A new system is proposed to allocate resources dynamically for task scheduling and execution. Virtual machines are introduced in the proposed architecture for efficient parallel data processing in the cloud. Various virtual machines are introduced to automatically instantiate and terminate in execution of job. An extended evaluation of MapReduce is also used in this approach.
International Journal of Computer Applications, 2013
Parallel data processing has become more and more reliable phenomenon due to the realization of could computing, especially using IaaS (Infrastructure as a Service) clouds. The cloud service providers such as IBM, Google, Microsoft and Oracle have made provisions for parallel data processing in their cloud services. Nevertheless, the frameworks used as of now are static and homogenous in nature in a cluster environment. The problem with these frameworks is that the resource allocation when large jobs are submitted is not efficient as they take more time for processing besides incurring more cost. In this paper we discuss the possibilities of parallel processing and its challenges. One of the IaaS products meant for parallel processing is presented in this paper. VMs are allocated to tasks dynamically for execution of jobs. With proposed framework we performed parallel job processing which involves Map Reduce, a new programming phenomenon. We also compare this with Hadoop.
IEEE Transactions on Parallel and Distributed Systems, 2011
In recent years ad-hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based on this new framework, we perform extended evaluations of MapReduce-inspired processing jobs on an IaaS cloud system and compare the results to the popular data processing framework Hadoop.
Infrastructure as a Service (IaaS) clouds have emerged as a promising new platform for massively parallel data processing. By eliminating the need for large upfront capital expenses, operators of IaaS clouds offer their customers the unprecedented possibility to acquire access to a highly scalable pool of computing resources on a short-term basis and enable them to execute data analysis applications at a scale which has been traditionally reserved to large Internet companies and research facilities. However, despite the growing popularity of these kinds of distributed applications, the current parallel data processing frameworks, which support the creation and execution of large-scale data analysis jobs, still stem from the era of dedicated, static compute clusters and have disregarded the particular characteristics of IaaS platforms so far. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution. Particular tasks of processing a job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. However, the current algorithms does not consider the resource overload or underutilization during the job execution. In this paper, we have focused on increasing the efficacy of the scheduling algorithm for the real time Cloud Computing services. Our Algorithm utilizes the Turnaround time Utility efficiently by differentiating it into a gain function and a loss function for a single task. The algorithm also assigns high priority for task of early completion and less priority for abortions issues of real time tasks. The algorithm has been implemented on RR method. The out performs existing utility based scheduling algorithms and also compare its performance.
Dynamic resource allocation problem is one of the most challenging problems in the resource management problems. The dynamic resource allocation in cloud computing has attracted attention of the research community in the last few years. Many researchers around the world have come up with new ways of facing this challenge. Ad-hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) cloud. Number of Cloud provider companies has started to include frameworks for parallel data processing in their product which making it easy for customers to access these services and to deploy their programs. The processing frameworks which are currently used have been designed for static and homogeneous cluster setups. So the allocated resources may be inadequate for large parts of the submitted tasks and unnecessarily increase processing cost and time. Again due to opaque nature of cloud, static allocation of resources is possible, but vice-versa in dynamic situations. The proposed new Generic data processing framework is intended to explicitly exploit the dynamic resource allocation in cloud for task scheduling and execution.
2020
One of the most important subject which many researchers depending on it by applying many algorithms and methods is Cloud Computing. Some of these methods were used to enhance performance, speed, and advantage of task-level parallelism and some of these methods used to deal with big data and scheduling. Many others decrease the computation's quantity in the process of implementation; especially decrease the space of the memory. Parallel data processing is one of the common applications of infrastructure, which is classified as a service in cloud computing. The purpose of this paper is to review parallel processing in the cloud. However, the results and methods are inconsistent; therefore, the scheduling concepts give an easy method to use the resources and process the data in parallel and decreasing the overall implementation time of processing algorithms. Overall, this review gives us and open new doors for using a suitable technique in parallel data processing filed. As a result of our work show according to many factors which strategies is better.
2012
The ever growing technology has resulted in the need for storing and processing excessively large amounts of data on cloud. The current volume of data is enormous and is expected to replicate over 650 times by the year 2014, out of which, 85% would be unstructured. This is known as the ‘Big Data’ problem. The techniques of Hadoop, an efficient resource scheduling method and a probabilistic redundant scheduling, are presented for the system to efficiently organize "free" computer storage resources existing within enterprises to provide low-cost high-quality storage services. The proposed methods and system provide valuable reference for the implementation of cloud storage system. The proposed method includes a Linux based cloud.
2015
Abstract- Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction (U.S. National Institute of Standards and Technology (NIST))[1]. Cloud services are popular because they can reduce the cost and complexity of owning and operating computers and networks. Since cloud users do not have to invest in information technology infrastructure, purchase hardware, or buy software licenses, the benefits are low up-front costs, rapid return on investment, rapid deployment, customization, flexible use, and solutions that can make use of new innovations [2]. Big data analytics and cloud computing are Two IT initiatives. Both technologies continue to evolve. Organizations are moving beyond questions of what and how to store big data to addressing how to deriv...
Circuits and Systems, 2016
Nowadays, companies are faced with the task of processing huge quantum of data. As the traditional database systems cannot handle this task in a cost-efficient manner, companies have built customized data processing frameworks. Cloud computing has emerged as a promising approach to rent a large IT infrastructure on a short-term pay-per-usage basis. This paper attempts to schedule tasks on compute nodes so that data sent from one node to the other has to traverse as few network switches as possible. The challenges and opportunities for efficient parallel data processing in cloud environments have been demonstrated and Nephele, the first data processing framework, has been presented to exploit the dynamic resource provisioning offered by the IaaS clouds. The overall utilisation of resources has been improved by assigning specific virtual machine types to specific tasks of a processing job and by automatically allocating or deallocating virtual machines in the course of a job execution. This has led to substantial reduction in the cost of parallel data processing.
International Journal of Advanced Research in Computer Science
Cloud Computing uses Hadoop framework for processing BigData in parallel. The Hadoop Map Reduce programming paradigm used in the context of Big Data, is one of the popular approaches that abstract the characterstics of parallel and distributed computing which comes off as a solution to Big Data. Improving performance of Map Reduce is a major concern as it affects the energy efficiency. Improving the energy efficiency of Map Reduce will have significant impact on energy savings for data centers. There are many parameters that influence the performance of Map Reduce. Various parameters like scheduling, resource allocation and data flow have a significant impact on Map Reduce performance. Cloud Computing leverages Hadoop framework for processing BigData in parallel. Hadoop has certain limitations that could be exploited to execute the job efficiently. Efficient resource allocation remains a challenge in Cloud Computing MapReduce platforms. We propose a methodology which is an enhanced Hadoop architecture that reduces the computation cost associated with BigData analysis.
IAEME PUBLICATION, 2020
Cloud is one of today's platforms, where extensive data processing has become one of the main problems, especially when calculating deadlines and costs. As the data comes from various sources like Facebook, Twitter, YouTube videos, log files, etc. But storage will require processing large amounts of data in the cloud. Therefore, we have identified the data processing problem and tried to solve the data storage problem by considering the use of the map-reduce programming model to handle parallel data processing and processing memory problems, and use the new model of the best scheduling method for Gray. Wolf Optimization Technique and Map-Reduce Framework, in MRF mainly consist of two parts 1.Calculation Part, 2. Reducing Part. In this paper, we will compare whether to use the GWO algorithm of the map-reduce framework. Make full use of the system to get better performance.
International Journal of Grid and Distributed Computing, 2016
Cloud computing is a technology in which the Cloud Service Providers (CSP) provide many virtual servers to the users to store their information in the cloud. The faults occurring on the assignment and dismission of the virtual machines, the processing cost in the allocation of resources must also be considered. The parallel processing of the information on the virtual machines must be done effectively and in an efficient manner. A variety of systems were developed to facilitate Many Task Computing (MTC). These systems aim to hide the issues of parallelism and fault tolerant and they are used in many applications. In this paper, we introduced Nephele, a data processing framework to exploit dynamic resource provisioning offered by IaaS clouds. The performance evaluation of the virtual machines has been evaluated and the allocation and deallocation of job tasks to the specific virtual machines has also been considered. A performance comparison with the well known data processing framework hadoop has been done. Thus this paper tells about the effective and efficient manner of processing the data by parallel processing and allocating the correct resources for the desired task. It also helps to reduce the cost of resource utilization by exploiting the dynamic resource utilization.
Journal of Physics: Conference Series, 2014
In recent years, large scale computer systems have emerged to meet the demands of high storage, supercomputing, and applications using very large data sets. The emergence of Cloud Computing oers the potentiel for analysis and processing of large data sets. Mapreduce is the most popular programming model which is used to support the development of such applications. It was initially designed by Google for building large datacenters on a large scale, to provide Web search services with rapid response and high availability. In this paper we will test the clustering algorithm K-means Clustering in a Cloud Computing. This algorithm is implemented on MapReduce. It has been chosen for its characteristics that are representative of many iterative data analysis algorithms. Then, we modify the framework CloudSim to simulate the MapReduce execution of K-means Clustering on dierent Cloud Computing, depending on their size and characteristics of target platforms. The experiment show that the implementation of K-means Clustering gives good results especially for large data set and the Cloud infrastructure has an inuence on these results.
Because of rapid development of computer and network technologies, development and usage of parallel, distributed, and cloud computing have been widely used for the problem solution which requires the high performance computing power. Developments on parallel computing and their applications are summarized. Parallel computing and their applications are briefly explained. Research studies on distributed computing have been increased because of developments on network technologies and increased data transfer rate per unit time (network bandwidth). Research studies have been focused on cloud computing by using many computers recent years. In this study, parallel, distributed, and cloud computing are examined for various parameters.
Vol. 18 No. 11 NOVEMBER 2020 International Journal of Computer Science and Information Security (IJCSIS), 2020
There is an increasing demand for processing tremendous volumes of data, which promotes the research on systems and techniques to optimize Big Data processing time. Traditional Systems do not offer the required flexibility, scalability and they are often inefficient in processing complex analytical tasks in finite and allowable time and scalable storage to accommodate the growing data. In this paper, we propose a system that improves Big Data processing by implementing a MapReduce program, two Cloud Services and several optimization techniques. The system is implemented in Cloud Computing using Cloud Big Data Platform and compares several frameworks such as Hadoop, Apache Spark, HBase etc. Cloud Computing offers a complete and efficient infrastructure, powerful, scalable, and especially infinite resources during high seasonal demands. The evaluations turn out that the proposed system has more accurate results and the processing time is improved up to 85 times faster than the Traditional System.
—With the rapid growth of emerging applications like social network analysis, semantic Web analysis and bioin-formatics network analysis, a variety of data to be processed continues to witness a quick increase. Effective management and analysis of large-scale data poses an interesting but critical challenge. Recently, big data has attracted a lot of attention from academia, industry as well as government. This paper introduces several big data processing technics from system and application aspects. First, from the view of cloud data management and big data processing mechanisms, we present the key issues of big data processing, including cloud computing platform, cloud architecture, cloud database and data storage scheme. Following the MapReduce parallel processing framework , we then introduce MapReduce optimization strategies and applications reported in the literature. Finally, we discuss the open issues and challenges, and deeply explore the research directions in the future on big data processing in cloud computing environments.
Today, we " re surrounded by data like oxygen. The exponential growth of data first presented challenges to cutting-edge businesses such as Google, Yahoo, Amazon, Microsoft, Facebook, Twitter etc. Data volumes to be processed by cloud applications are growing much faster than computing power. This growth demands new strategies for processing and analyzing information. Hadoop-MapReduce has become a powerful Computation Model addresses to these problems. Hadoop HDFS became more popular amongst all the Big Data tools as it is open source with flexible scalability, less total cost of ownership & allows data stores of any form without the need to have data types or schemas defined. Hadoop MapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. In this paper I have provided an overview, architecture and components of Hadoop, HCFS (Hadoop Cluster File System) and MapReduce programming model, its various applications and implementations in Cloud Environments.
Concurrency and Computation: Practice and Experience, 2015
Efficiently scheduling MapReduce tasks is considered as one of the major challenges that face MapReduce frameworks. Many algorithms were introduced to tackle this issue. Most of these algorithms are focusing on the data locality property for tasks scheduling. The data locality may cause less physical resources utilization in non-virtualized clusters and more power consumption. Virtualized clusters provide a viable solution to support both data locality and better cluster resources utilization. In this paper, we evaluate the major MapReduce scheduling algorithms such as FIFO, Matchmaking, Delay, and multithreading locality (MTL) on virtualized infrastructure. Two major factors are used to test the evaluated algorithms: the simulation time and the energy consumption. The evaluated schedulers are compared, and the results show the superiority and the preference of the MTL scheduler over the other existing schedulers. Also, we present a comparison study between virtualized and non-virtualized clusters for MapReduce tasks scheduling. Q. ALTHEBYAN ET AL.
In recent years ad hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper, we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based on this new framework, we perform extended evaluations of Map Reduce-inspired processing jobs on an IaaS cloud system and compare the results to the popular data processing framework Hadoop.
Infrastructure-as-a-Service (Iaas) is one of the emerging services provided by the Cloud Companies. Today there are so many frameworks for parallel data processing to provide Infrastructureas-a-Service. Some of these framwork use static, homogeneous resource allocation technique, which can't satisfy dynamic nature of cloud computing. Some of the new frameworks like Nephele overcome these drawbacks by using Dynamic resource allocation but still contain some setbacks like Resource under utilization. Nephele doesn't de allocate the Virtual Machine-(VM) which contains some intermediate results even it completed its execution to provide those results to other Virtual machines. Our new framework introduces Alternative Virtual Machine (AVM) to store intermediate results so wastage of resources can be reduced.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.