Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2015, Lecture Notes in Electrical Engineering
…
9 pages
1 file
In this paper, we introduce a task scheduling methodology to help systems to resiliently maintain their availability and reliability. Particularly, this method can quickly improve system recovery from failures as well as achieve an optimized performance while considering the customers' monetary cost and network condition. As comparison is made between our work and some similar existing approaches, apparently, it shows that ours has higher effectiveness and efficiency than the other ones.
International Conference on Computing, Management and Telecommunications, 2014
To achieve high performance, thousands of servers in cloud datacenters coordinate tasks to provide reliable and highly available cloud computing services, especially, in terms of multitasking. Effective mechanisms are now required to prepare for a failure of such computing nodes. A number of studies have been done to address this problem, but it cannot always guarantee an acceptable performance. In this paper, we present a scheduling algorithm, based on cost and bandwidth, which makes efficient recovery possible on heterogeneous computing environments. Our algorithm not only considers the network bandwidth, but also takes into account the monetary cost as well. We justify our proposed work through extensive simulations and compare our work with the existing studies. The results can improve the potential benefit of our approach.
2013
Nowadays thousands of servers in a cloud datacenter coordinate tasks to provide more reliable and highly available cloud computing services, especially in multi-task processing. Therefore, we need mechanisms to prepare for failure of computing nodes. So far, a number of research studies have been carried out, trying to eliminate these problems, yet little have been found efficient. In this paper, we present a multi-task scheduling algorithm that makes recovery from a saved state. The approach can improve execution time including recovery time in case of failure while overhead in the case of no failure was a little in typical scenarios.
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication - ICUIMC '14, 2014
Nowadays, thousands of servers in a cloud datacenter coordinate tasks to provide more reliable and highly available cloud computing services, especially in multi-task processing, as a crucial step to achieve high performance. Therefore, we need effective mechanisms to prepare for a failure of computing nodes. So far, a number of research studies have been carried out, trying to eliminate these problems, yet a little has been found efficient. In this paper, we present a cost-bandwidth based on scheduling algorithm that makes recovery from a saved state faster on heterogeneous computing environments. This algorithm not only considers the network bandwidth but also looks carefully at the monetary cost, which is paid by cloud customers (CCs) for utilizing cloud resources. In order to justify our proposal, we conducted numerous simulations and compared our method with existing ones. The results show that our approach can achieve higher performance, including recovery time in case of failure, while overhead in the case of no failure is a little in typical scenarios.
Cloud computing is a paradigm that focuses on sharing of data and computation over a scalable network of nodes like end users, computers, data centers, and web services. Task scheduling is one of the most famous combinatorial optimization problems, and plays a key role to improve the performance of flexible and reliable systems. Cloud-based application services like social networking, web hosting, and content delivery, deal with large amount of data processing. These applications require large amount of network bandwidth because traffics between nodes are tremendous. As network bandwidth is a limited resource, scheduling policies that reduce bandwidth usage is essential in cloud computing. Task scheduling algorithms based on data locality will reduce the network access, thus reducing bandwidth usage and the job completion time. Balance Reduce Algorithm (BAR) is a heuristic algorithm based on data locality, and minimizes makespan (job completion time) of a job. This paper proposes an improved balance reduce algorithm, an enhancement of BAR algorithm for handling machine failure. For this purpose, we propose an algorithm which is similar to primary backup approach. Compared to existing BAR algorithm, this proposed algorithm will reduce the job completion time effectively when failure happens.
Cloud computing infrastructure encompasses many design challenges. Dealing with unreliability is one of the important design challenges in cloud computing platforms as we have a variety of services available for a variety of clients. In this paper, we present a model for the reliability assessment of the cloud infrastructures (computing nodes mostly virtual machines). This reliability assessment mechanism helps to do the scheduling on cloud infrastructure and perform fault tolerance on the basis of the reliability values acquired during reliability assessment. In our model, every compute instance (virtual machine in PaaS or physical processing node in IaaS) have reliability values associated with them. The system assesses the reliability for different types of applications. We have different mechanism to assess the reliability of general applications and real time applications. For real time applications, we have time based reliability assessment algorithms. All the algorithms are m...
Journal of Telecommunication, Electronic and Computer Engineering, 2017
Based on pay-as-per-usage policy, there is a tremendous use of cloud computing in scientific society like bio-medical, healthcare and online financial applications. Fault tolerance is one of the biggest challenges to guarantee the reliability and availability of critical services. We must make the system to avail by minimizing the impact of failure. In this paper, we conducted a comparative analysis of various approaches for tolerating faults through scheduling in cloud computing environment based on their policies. The goal of this paper is not only used to analyze the existing methods, but also to identify the areas needed for future research.
2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2019
Cloud platforms offer different types of virtual machines which ensure different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in Amazon EC2 cloud, the user pays per hour for on-demand instances while spot instances are unused resources available for a lower price. Despite the monetary advantages, a spot instance can be terminated or hibernated by EC2 at any moment. Using both hibernationprone spot instances (for cost sake) and on-demand instances, we propose in this paper a static scheduling for applications which are composed of independent tasks (bag-of-task) with deadline constraints. However, if a spot instance hibernates and it does not resume within a time which guarantees the application's deadline, a temporal failure takes place. Our scheduling, thus, aims at minimizing monetary costs of bag-of-tasks applications in EC2 cloud, respecting its deadline and avoiding temporal failures. Performance results with task execution traces, configuration of Amazon EC2 virtual machines, and EC2 market history confirms the effectiveness of our scheduling and that it tolerates temporal failures.
Cloud Computing becomes an important aspect in computer science. For systematic use of the cloud, an efficient load balancing algorithm is required for scheduling the tasks in a well organized and logical manner. The Min-Min algorithm is an efficient approach to enhance the total completion time of the tasks. The major shortcoming is, it let load imblanced on the resources. This drawback can be removed by using an improved load balancing Min-Min algorithm (LBIMM). User priority- an another aspect, which plays a vital role in terms of pay-per-use base. Cloud providers offers different type of QoS to put up the demands for different type of users. To provide the guarantees, load balancing algorithm must consider User Priority and Failure Recovery. Availability is considered as the growing and reoccurring concern in software intensive systems. PA-LBIMM considers user priority and seeks to minimize the total completion time.It fails to define, what will happen if a resource fails/crashes? To remove this constraint, a failure recovery policy is proposed in this paper. So that, if a resource fails then the tasks must be rescheduled to achieve minimum completion time. At last, the introduced policy is simulated using Matlab toolbox. The results show that the policy can led to significant rise in performance of the resource utilization.
International Journal of Grid and Distributed Computing
Cloud Computing is a business model that based on "pay as you go" principle. It is used to provide the IT services to the user in the flexible and dynamic manner with minimal management effort. The most important feature of the Cloud Computing is the ability to dynamically schedule the application on the best resource according to the load. According to the work in this paper, a task schedule on the Cloud environment has been proposed. The principle of the proposed algorithm is to allocate the incoming task on the best resource during the runtime of some tasks based on measuring the current situation of each resource with respect to its availability level according to its processing power, cost, and the number of running tasks to know its fitness to receive the incoming task, then choosing the best one to the incoming task. To evaluate the performance of the proposed algorithm, a comparative study has been done between this proposed algorithm, Round Ribbon (RR) algorithm, and Minimum Completion Time (MCT) algorithm. The experimental results show that the proposed algorithm outperforms the RR, and MCT algorithms by reducing make-span and the cost of the running tasks.
Information and Communication Technology for Competitive Strategies
Nowadays, to a large extent, clients look at cloud not just as service provider but also as partner. So, they want cloud to deliver timely and accurate services. Cloud nodes must be reliable in order to provide quality of services as per the customer requirements. Further, physical size of high-performance computing environment is also increasing day by day. Larger the system, more failures are likely to occur that eventually results in the poor reliability of the system which is highly undesirable for the time-critical applications. To deal with the reliability, service provider must know the failure characteristics of the cloud computing nodes in order to better handle the failure using fault-tolerance-aware techniques at the time of scheduling the application tasks. Thus, in this paper, we presented the survey of fault-tolerance-aware techniques which are classified as proactive and reactive fault tolerance. This survey provides the foundation for the researchers to work in the area of fault-tolerance-aware scheduling in order to have better scheduling decisions with the aim to enhance the performance and reliability of application execution. Keywords Reliability ⋅ Fault tolerance ⋅ Virtualization 1 Introduction Cloud is an Internet-based computing paradigm that provides basic services as Infrastructure as a Service (IaaS), Software as a Service (SaaS), Platform as a Service (PaaS) [1]. Different types of cloud providers, i.e., public, private, or hybrids, are responsible for providing above services to user. Nowadays, usage of
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of Intelligent Computing and Information Sciences
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2016
international journal for research in applied science and engineering technology ijraset, 2020
International Journal for Research in Applied Science and Engineering Technology -IJRASET, 2020
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13, 2013
Resilience Assessment and Evaluation of Computing Systems, 2012
Turkish Journal of Computer and Mathematics Education, 2021
International Journal of Computer Networks and Applications (IJCNA), 2023
Neural Computing and Applications, 2016
Scientific Programming, 2016