Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
7 pages
2 files
Cloud computing infrastructure encompasses many design challenges. Dealing with unreliability is one of the important design challenges in cloud computing platforms as we have a variety of services available for a variety of clients. In this paper, we present a model for the reliability assessment of the cloud infrastructures (computing nodes mostly virtual machines). This reliability assessment mechanism helps to do the scheduling on cloud infrastructure and perform fault tolerance on the basis of the reliability values acquired during reliability assessment. In our model, every compute instance (virtual machine in PaaS or physical processing node in IaaS) have reliability values associated with them. The system assesses the reliability for different types of applications. We have different mechanism to assess the reliability of general applications and real time applications. For real time applications, we have time based reliability assessment algorithms. All the algorithms are m...
Lecture Notes in Computer Science, 2013
With virtualization technology, Cloud computing utilizes resources more efficiently. A physical server can deploy many virtual machines and operating systems. However, with the increase in software and hardware components, more failures are likely to occur in the system. Hence, one should understand failure behavior in the Cloud environment in order to better utilize the cloud resources. In this work, we propose a reliability model and estimate the mean time to failure and failure rate based on a system of k nodes and s virtual machines under four scenarios. Results show that if the failure of the hardware and/or the software in the system exhibits a degree of dependency, the system becomes less reliable, which means that the failure rate of the system increases and the mean time to failure decreases. Additionally, an increase in the number of nodes decreases the reliability of the system.
International Journal of Computer Applications, 2014
Performance of cloud computing depends on effective utilization of resources and reliability. With resource allocation algorithms such as banker's algorithm resource utilization can be done in an effective manner in cloud computing. With reliability we can estimate the fault tolerance capability of a system. Reliability improvement is largely dependent on the availability of operational profile that statistically models the pattern in which the system is more likely to be used in the operating environment. System is less reliable if it exhibits a degree of hardware and software dependency and more reliable if hardware and software failure occur independently. In Cloud computing environment, hundreds of thousands of systems are hosted that consume cloud computing services. These services have of lots of hardware, software platform and infrastructure support, each of which though carefully engineered are still capable of failure. These failure rates and complexity of database make cloud less reliable. In this paper, we have proposed a reliability model that estimates the mean time to failure and failure rate based on delayed exponential distribution. Through this model, we study the effect of older and newer systems on cloud computing reliability that consumes the cloud computing services.
Ain Shams Engineering Journal, 2018
The growing use of cloud computing in various fields makes the dependability of clouds a major concern in both industry and academia, especially for real-time applications. In cloud computing, the processing is done on remote cloud nodes; therefore, the chances of errors occurring are increased because of the loose control over the remote nodes and the unexpected latency. Hence, fault tolerance in cloud computing is important to ensure the reliability and availability of real-time applications. The contribution of this paper is twofold: first, it proposes a comprehensive framework that adopts a number of fault tolerance techniques to improve the dependability of cloud environments to host real-time applications while achieving reliability and availability requirements. Second, a fault tolerant real-time scheduling algorithm has been developed to minimize the rate of missing deadlines, makespan, and the degree of load imbalance.
Computing, 2019
Cloud environment uses data center with a huge number of computational resources, and the probability of failing any of the resources increases with scale. Failures cause unavailability of services, which affects the reliability of the system. It is essential to consider the reliability issue for application deployment in the cloud, considering the failure of the resources. In this work, we address the reliability aware scheduling of tasks with hard deadlines in the cloud environment. We design, analyze and provide solutions for two special cases of the problem where (a) tasks have a common deadline on the machines with equal failure rate, and (b) tasks with equal execution time. For the general case of the problem, we propose two-phase heuristic approaches, one is the task ordering, and other is tasks mapping to machines. The performance of different task orderings and task mapping approaches is evaluated through simulation using synthetic and real traces. Based on the simulation result, the earliest due date ordering of tasks and mapping of the current task to the most reliable machine along with long task dropping performs better in general settings. We observe that task repetition and replication further improve the performance of the heuristics.
Computer and Information Science (ICIS), 2013 IEEE/ACIS 12th International Conference on, 2013
Cloud computing is widely referred as the next generation of computing systems. Reliability is a key metric for assessing performance in such systems. Redundancy and diversity are prevalent approaches to enhance reliability in Cloud Computing Systems (CCS). Proper resource allocation is an alternative approach to reliability improvement in such systems. In contrast to redundancy, appropriate resource allocation can improve system reliability without imposing extra cost. On the other hand, contemplating reliability irrespective of Quality of Service (QoS) requirements may be undesirable in most of CCSs. In this paper, we focus on resource allocation approach and introduce an analytical model in order to analyze system reliability besides considering application and resource constraints. Task precedence structure and QoS are taken into account as the application constraints. Memory and storage limitation of each server as well as maximum communication load on each link are considered ...
2016
Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (networks, servers, storage, applications, and services). Although the rapid demand of highly scalable environment, cloud computing technology is associated with real time applications. It support highly scalable virtual environment which will help to augment the scalability, availability and reliability .cloud computing supports distributed system architecture. Due to distributed approach, every work has to be done on virtual machine. Cloud computing has many advantages like ubiquitous network access, location independent resource pooling, rapid elasticity, pay per use, virtualization and, flexibility. But still it has to face many challenges like security, data migration, interoperability, data availability, performance and reliability. In this research paper Fault Tolerance techniques along with some other techniques were discussed. The ado...
Information and Communication Technology for Competitive Strategies
Nowadays, to a large extent, clients look at cloud not just as service provider but also as partner. So, they want cloud to deliver timely and accurate services. Cloud nodes must be reliable in order to provide quality of services as per the customer requirements. Further, physical size of high-performance computing environment is also increasing day by day. Larger the system, more failures are likely to occur that eventually results in the poor reliability of the system which is highly undesirable for the time-critical applications. To deal with the reliability, service provider must know the failure characteristics of the cloud computing nodes in order to better handle the failure using fault-tolerance-aware techniques at the time of scheduling the application tasks. Thus, in this paper, we presented the survey of fault-tolerance-aware techniques which are classified as proactive and reactive fault tolerance. This survey provides the foundation for the researchers to work in the area of fault-tolerance-aware scheduling in order to have better scheduling decisions with the aim to enhance the performance and reliability of application execution. Keywords Reliability ⋅ Fault tolerance ⋅ Virtualization 1 Introduction Cloud is an Internet-based computing paradigm that provides basic services as Infrastructure as a Service (IaaS), Software as a Service (SaaS), Platform as a Service (PaaS) [1]. Different types of cloud providers, i.e., public, private, or hybrids, are responsible for providing above services to user. Nowadays, usage of
2012 Ieee International Conference on Communications, 2012
Modern day data centers coordinate hundreds of thousands of heterogeneous tasks and aim at delivering highly reliable cloud computing services. Although offering equal reliability to all users benefits everyone at the same time, users may find such an approach either too inadequate or too expensive to fit their individual requirements, which may vary dramatically. In this paper, we propose a novel method for providing reliability as an elastic and on-demand service. Our scheme makes use of peer-to-peer checkpointing and allows user reliability levels to be jointly optimized based on an assessment of their individual requirements and total available resources in the data center. We show that the joint optimization can be efficiently solved by a distributed algorithm using dual decomposition. The solution improves resource utilization and presents an additional source of revenue to data center operators. Our validation results suggest a significant improvement of reliability over existing schemes.
International Journal of Computer Networks and Applications (IJCNA), 2023
Cloud computing has emerged as the feasible paradigm to satisfy the computing requirements of highperformance applications by an ideal distribution of tasks to resources. But, it is problematic when attaining multiple scheduling objectives such as throughput, makespan, and resource use. To resolve this problem, many Task Scheduling Algorithms (TSAs) are recently developed using single or multiobjective metaheuristic strategies. Amongst, the TS based on a Multi-objective Grey Wolf Optimizer (TSMGWO) handles multiple objectives to discover ideal tasks and assign resources to the tasks. However, it only maximizes the resource use and throughput when reducing the makespan, whereas it is also crucial to optimize other parameters like the utilization of the memory, and bandwidth. Hence, this article proposes a hybrid TSA depending on the linear matching method and backfilling, which uses the memory and bandwidth requirements for effective TS. Initially, a Long Short-Term Memory (LSTM) network is adopted as a meta-learner to predict the task runtime reliability. Then, the tasks are divided into predictable and unpredictable queues. The tasks with higher expected runtime are scheduled by a plan-based scheduling approach based on the Tuna Swarm Optimization (TSO). The remaining tasks are backfilled by the VIKOR technique. To reduce resource use, a particular fraction of CPU cores is kept for backfilling, which is modified dynamically depending on the Resource Use Ratio (RUR) of predictable tasks among freshly submitted tasks. Finally, a general simulation reveals that the proposed algorithm outperforms the earlier metaheuristic, plan-based, and backfilling TSAs.
Asia-Pacific Journal of Information Technology and Multimedia, 2016
Cloud computing is an important infrastructure for distributed systems with the main objective of reducing the use of resources. In a cloud environment, users may face thousands of resources to run each task. However, allocation of resources to tasks by the user is an impossible endeavor. Accurate scheduling of system resources results in their optimal use as well as an increase in the reliability of cloud computing. This study designed a system based on fuzzy logic and followed by an introduction of an efficient and precise algorithm for scheduling resources for improving the reliability of cloud computing. Waiting and turnaround times of the proposed method were compared to those of previous works. In the proposed method, the waiting time is equal to 26.99 and the turnaround time is equal to 82.99. According to the results, the proposed method outperforms other methods in terms of waiting time and turnaround time as well as accuracy.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
IJSR, 2023
Journal of Telecommunication, Electronic and Computer Engineering, 2017
International Journal of Intelligent Systems and Applications, 2016
IEEE Transactions on Cloud Computing, 2016
The Journal of Supercomputing
Cluster Computing, 2019
Resilience Assessment and Evaluation of Computing Systems, 2012