Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
S. Ghosh et al. (1998) presented a novel approach for providing fault tolerance for sets of independent, periodic tasks with rate-monotonic scheduling. We extend this approach to tasks that share logical or physical resources (and hence require synchronization). We show that if the simple rate-monotonic dispatch is replaced by stack scheduling (T. Baker, 1991), the worst-case blocking overhead of the stack resource policy and the worst-case retry overhead for fault tolerance are not additive, but rather only the maximum of the two overheads is incurred
Real-time Systems, 1998
Due to the critical nature of the tasks in hard real-time systems, it is essential that faults be tolerated. In this paper, we present a scheme which can be used to tolerate faults during the execution of preemptive real-time tasks. We describe a recovery scheme which can be used to re-execute tasks in the event of single and multiple transient faults and discuss conditions that must be met by any such recovery scheme. We then extend the original Rate Monotonic Scheduling (RMS) scheme and the exact characterization of RMS to provide tolerance for single and multiple transient faults. We derive schedulability bounds for sets of real-time tasks given the desired level of fault tolerance for each task or subset of tasks. Finally, we analyze and compare those bounds with existing bounds for non-fault-tolerant and other variations of RMS.
Embedded and Real-Time …, 2008
Real-time systems typically have to satisfy complex requirements, mapped to the task attributes, eventually guaranteed by the underlying scheduler. These systems consist of a mix of hard and soft tasks with varying criticality, as well as associated fault tolerance requirements. Additionally, the relative criticality of tasks could undergo changes during the system evolution. Time redundancy techniques are often preferred in embedded applications and, hence, it is extremely important to devise appropriate methodologies for scheduling real-time tasks under failure assumptions.
Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, 2002
Periodic and aperiodic tasks co-exist in many realtime systems. The periodic tasks typically arise from sensor data or control loops, while the aperiodic tasks generally arise from arbitrary events. Their time constraints need to be met even in the presence of faults. Considering the unpredictability of aperiodic tasks, this paper proposes a fault-tolerant reservation-based strategy (FTRB) to schedule aperiodic tasks by utilizing the processor time left unused by periodic tasks. The least upper bound of reserved processor time is derived analytically such that all available processor time may be exploited for servicing aperiodic tasks. Any newly arrived aperiodic task is scheduled on the first-fit processor by using an extended dynamic schedulability criterion. A primary/backup approach is used to schedule the primary and backup copy of each task on different processors to tolerate a processor failure. Our analysis and simulation results show that the processors can achieve high utilization and that the on-line implementation of aperiodic task scheduling is feasible.
2007 IEEE International Parallel and Distributed Processing Symposium, 2007
The recent introduction of multicore system-on-a-chip architectures for embedded systems opens a new range of possibilities for both increasing the processing power and improving the fault-robustness of real-time embedded applications. Fault-tolerance and performance are often contrasting requirements. Techniques to improve robustness to hardware faults are based on replication of hardware and/or software. Conversely, techniques to improve performance are based on exploiting inherent parallelism of multiprocessor architectures. In this paper, we propose a technique that allows the user to trade-off parallelism with fault-tolerance in a multicore hardware architecture. Our technique is based on a combination of hardware mechanisms and real-time operating system mechanisms. In particular, we apply hierarchical scheduling techniques to efficiently support fault-tolerant, fault-silent and non-fault-tolerant tasks in the same system.
2001
We address the problem of off-line fault tolerant scheduling of an algorithm onto a multiprocessor architecture with distributed memory and provide a generic algorithm which solves this problem. We take into account two kinds of failures: fail-silent and omission. The basic technique we use is the replication of operations and data communications. We then discuss the principles which govern the execution of schedulings with replication under the state-machine and the primary/backup arbitrations between replicas. We also show how to compute the execution date for each operation and the timeouts which are used for detecting failures. We end with a heuristic which, using this calculus, computes a possibly non optimal scheduling by finding plain schedulings for each failure pattern and then combining them into a scheduling with replication.
Proceedings 19th IEEE Real-Time Systems Symposium (Cat. No.98CB36279), 1998
ultra-dependable real-time computing architectures have been specialised to meet the requirements of a particular application domain. Over the last two years, a consortium of European companies and academic institutions has been investigating the design and development of a Generic Upgradable Architecture for Real-time Dependable Systems (GUARDS). The architecture aims to be tolerant of permanent and temporary, internal and external, physical faults and should provide confinement or tolerance of software design faults. GUARDS critical applications are intended to be replicated across the channels which provide the primary hardware fault containment regions. In this paper, we present our approach to real-time scheduling of the GUARDS architecture. We use an extended response-time analysis to predict the timing properties of replicated real-time transactions. Consideration is also given to the scheduling of the inter-channel communications network.
IEEE Transactions on Parallel and Distributed Systems, 1999
AbstractÐHard-real-time systems require predictable performance despite the occurrence of failures. In this paper, fault tolerance is implemented by using a novel duplication technique where each task scheduled on a processor has either an active backup copy or a passive backup copy scheduled on a different processor. An active copy is always executed, while a passive copy is executed only in the case of a failure. First, the paper considers the ability of the widely-used Rate-Monotonic scheduling algorithm to meet the deadlines of periodic tasks in the presence of a processor failure. In particular, the Completion Time Test is extended so as to check the schedulability on a single processor of a task set including backup copies. Then, the paper extends the well-known Rate-Monotonic First-Fit assignment algorithm, where all the task copies, included the backup copies, are considered by Rate-Monotonic priority order and assigned to the first processor in which they fit. The proposed algorithm determines which tasks must use the active duplication and which can use the passive duplication. Passive duplication is preferred whenever possible, so as to overbook each processor with many passive copies whose primary copies are assigned to different processors. Moreover, the space allocated to active copies is reclaimed as soon as a failure is detected. Passive copy overbooking and active copy deallocation allow many passive copies to be scheduled sharing the same time intervals on the same processor, thus reducing the total number of processors needed. Simulation studies reveal a remarkable saving of processors with respect to those needed by the usual active duplication approach in which the schedule of the non-fault-tolerant case is duplicated on two sets of processors.
2010
This article studies the scheduling of critical embedded systems, which consist of a set of communicating periodic tasks with constrained deadlines. Currently, tasks are usually sequenced manually, partly because available scheduling policies do not ensure the determinism of task communications. Ensuring this determinism requires scheduling policies supporting task precedence constraints (which we call dependent tasks), which are used to force the order in which communicating tasks execute. We propose fixed priority scheduling policies for different classes of dependent tasks: with simultaneous or arbitrary release times, with simple precedences (between tasks of the same period) or extended precedences (between tasks of different periods). We only consider policies that do not require synchronization mechanisms (like semaphores). This completely prevents deadlocks or scheduling anomalies without requiring further proofs.
Proceedings 16th IEEE Real-Time Systems Symposium
We present a scheme to guarantee that the execution of real-time tasks can tolerate transient and intermittent faults assuming any queue-based scheduling technique. The scheme is based on reserving suficient slack in a schedule such that a task can be re-executed before its deadline without compromising guarantees given t o other tasks. Only enough slack is reserved in the schedule to guarantee fault tolerance if at most one fault occurs within a time interval. This results in increased schedulability and a very low percentage of deadline misses even i f no restriction is placed on the fault separation. W e provide two algorithms to solve the problem of adding fault tolerance t o a queue of real-time tasks. The first is a dynamic programming optimal solution and the second is a greedy heuristic which closely approzimates the optimal.
Real-time Systems, 2001
Consider a distributed real-time program which is executed on a system with a limited set of hardware resources. Assume the program is required to satisfy some timing constraints, despite the occurrence of anticipated hardware failures. For efficient use of resources, scheduling decisions must be taken at run-time, considering deadlines, the load and hardware failures. The paper demonstrates how to reason about such dynamically scheduled programs in the framework of a timed process algebra and modal logic. The algebra provides a uniform process encoding of programs, hardware and schedulers, with an operational semantics of a process depending on the assumptions about faults. The logic specifies the timing properties of a process and verifies them via this faultaffected semantics, establishing fault-tolerance. The approach lends itself to application of existing tools and results supporting reasoning in process algebras and modal logics.
Journal of Systems Architecture, 1998
Many time-critical applications require predictable performance. Tasks corresponding to these applications have deadlines to be met despite the presence of faults. Failures can happen either due to processor faults or due to task errors. To tolerate both processor and task failures, the copies of every task have to be mutually excluded in space and also in time in the schedule. We assume that each task has two versions, namely, primary copy and backup copy. We believe that the position of the backup copy in the task queue with respect to the position of the primary copy (distance) is a crucial parameter which affects the performance of any fault-tolerant dynamic scheduling algorithm. To study the effect of distance parameter, we make fault-tolerant extensions to the well-known myopic scheduling algorithm [Ramamritham et al. IEEE Trans. Parallel Distr. sys. 1 (2) (1990) 184] which is a dynamic scheduling algorithm capable of handling resource constraints among tasks. We have conducted an extensive simulation to study the effect of distance parameter on the schedulability of the fault-tolerant myopic scheduling algorithm.
1996
We present a model for application-level fault tolerance for parallel applications. The objective is to achieve high reliability with minimal impact on the application. Our approach is based on a full replication of all parallel application components in a distributed wide-area environment in which each replica is independently scheduled in a different site. A system architecture for coordinating the replicas is described. The fault tolerance mechanism is being added to a wide-area scheduler prototype in the Legion parallel processing system. A performance evaluation of the fault tolerant scheduler and a comparison to the traditional means of fault tolerance, checkpoint-recovery, is planned. 1
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006
In this paper we tackle the problem of scheduling a periodic real-time system on identical multiprocessor platforms, moreover the tasks considered may fail with a given probability. For each task we compute its dupli- cation rate in order to (1) given a maximum tolerated probability of failure, minimize the size of the platform such at least one replica of each job meets its dead- line (and does not fail) using a variant of EDF namely EDF(k) or (2) given the size of the platform, achieve the best possible reliability with the same constraints. Thanks to our probabilistic approach, no assumption is made on the number of failures which can occur. We propose several approaches to duplicate tasks and we show that we are able to find solutions always very close to the optimal one.
2003
Our goal is to automatically obtain a distributed and fault-tolerant embedded system: distributed because the system must run on a distributed architecture; fault-tolerant because the system is critical. Our starting point is a source algorithm, a target distributed architecture, some distribution constraints, some indications on the execution times of the algorithm operations on the processors of the target architecture, some indications on the communication times of the data-dependencies on the communication links of the target architecture, a number Npf of fail-silent processor failures that the obtained system must tolerate, and finally some real-time constraints that the obtained system must satisfy. In this article, we present a scheduling heuristic which, given all these inputs, produces a fault-tolerant, distributed, and static scheduling of the algorithm on the architecture, with an indication whether or not the real-time constraints are satisfied. The algorithm we propose consist of a list scheduling heuristic based active replication strategy, that allows at least Npf +1 replicas of an operation to be scheduled on different processors, which are run in parallel to tolerate at most Npf failures. Due to the strategy used to schedule operations, simulation results show that the proposed heuristic improve the performance of our method, both in the absence and in the presence of failures.
2001
This paper investigates fault-tolerance issues in real-time distributed embedded systems. Our goal is to propose solutions to automatically produce distributed and faulttolerant code. We rst characterize the systems considered by giving the main assumptions about the physical and logical architecture of these systems. In particular, we consider only processor failures, with a fail-stop behavior. Then, we give a state of the art of the techniques used for fault-tolerance. We also brie y present the Algorithm Architecture Adequation method (AAA), used to obtain automatically distributed code. The heart of AAA is a scheduling heuristic that produces automatically a static distributed schedule of a given algorithm onto a given distributed architecture. Our idea is to adapt the AAA method so that it produces automatically a static distributed and fault-tolerant schedule. For this purpose, we discuss several tracks of software implemented fault-tolerance within the AAA method. We present in details two new scheduling heuristics that achieve this goal.
Microprocessors and Microsystems, 1997
Many time-critical applications require predictable performance and tasks in these applications have deadlines to be met even in the presence of faults. Three different approaches have evolved for fault-tolerant scheduling of real-time tasks in multiprocessor systems-Triple Modular Redundancy (TMR), Primary Backup (PB), and Imprecise Computation (IC). In TMR approach, the fault detection is by voting, whereas in PB and IC approaches, it is by acceptance test. The different methods employed for error detection in the three approaches often make one approach preferable to the other in certain applications. Also, some applications can have tasks which require more than one fault-tolerant approach. Hence, it is necessary to have a single fault-tolerant scheduling algorithm which supports different fault-tolerant approaches. Moreover, the redundancy introduced in terms of executing more versions of a task reduces the number of tasks meeting their deadlines (guarantee ratio). In this paper, we address these two issues: (i) by proposing a scheduling algorithm which supports all three faulttolerant approaches, and (ii) by proposing guarantee ratio improving techniques such as the distance concept and task parallelization, and better algorithms for reclaiming of unused resources at run-time.
IEEE Transactions on Parallel and Distributed Systems, 1997
Real-time systems are being increasingly used in several applications which are time critical in nature. Fault-tolerance is an important requirement of such systems, due to the catastrophic consequences of not tolerating faults. In this paper, we study a scheme that provides fault-tolerance through scheduling in real-time multiprocessor systems. We schedule multiple copies of dynamic, aperiodic, nonpreemptive tasks in the system, and use two techniques that we call deallocation and overloading to achieve high acceptance ratio (percentage of arriving tasks scheduled by the system). This paper compares the performance of our scheme with that of other fault-tolerant scheduling schemes, and determines how much each of deallocation and overloading affects the acceptance ratio of tasks. The paper also provides a technique that can help real-time system designers determine the number of processors required to provide fault-tolerance in dynamic systems. Lastly, a formal model is developed for the analysis of systems with uniform tasks.
ACS/IEEE International Conference on Computer Systems and Applications, 2003. Book of Abstracts., 2003
This work studies the development of integrated fault-tolerant scheduling algorithms. The proposed algorithms ensure ultra-reliable execution of tasks where both hardware and software failures are considered, and system performance improvement. Also, the proposed algorithms have the capability for on-line system-level fault diagnosis.
Lecture Notes in Computer Science, 2004
In this paper, we propose a feedback-based combined scheduling algorithm with fault tolerance for applications that have both periodic tasks and aperiodic tasks in real-time uniprocessor systems. Each periodic task is assumed to have a primary copy and a backup copy. By using the rate monotonic scheduling and deferrable server algorithm, we create two servers, one for serving aperiodic tasks and the other for executing backup copies of periodic tasks. The goal is to maximize the schedulability of aperiodic tasks while keeping the recovery rate of periodic tasks close to 100%. Our algorithm uses feedback control technique to balance the CPU allocation between the backup server and the aperiodic server. Our simulation studies show that the algorithm can adapt the parameters of the servers to recover the failed periodic tasks.
Proceedings 11th International Parallel Processing Symposium, 1997
This paper presents a new fault-tolerant scheduling algorithm for multiprocessor hard-real-time systems. The so called partitioning method is used to schedule a set of tasks in a multiprocessor system. Fault-tolerance is achieved by using a combined duplication technique where each task scheduled on a processor has either an active or a passive copy scheduled on a different processor. Simulation experiments reveal a saving of processors with respect to those needed by the usual approach of duplicating the schedule of the non-fault-tolerant case.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.