Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2003
…
9 pages
1 file
Several approaches for handling the multi-processor scheduling problem have been presented over the last decades, such as scheduling techniques, clustering techniques and task merging techniques. However, the increasing gap between processor speed and communication speed demand more accurate models of parallel computation and efficient algorithms to work under the restrictions of these models.
1994
In this paper, we explore the problem of scheduling parallel programs using task duplication for messagepassing multicomputers. Task duplication means scheduling a parallel program by redundantly executing some of the tasks on which other tasks of the program critically depend. This can reduce the start times of tasks waiting for messages from tasks residing in other processors. There have been a few scheduling algorithms using task duplication. We discuss two such previously reported algorithms and describe their differences, limitations and suitability for different environments. A new algorithm is proposed which outperforms both of these algorithms, and is more efficient for low as well as high values of communication-to-computation ratios. The algorithm takes into account arbitrary computation and communication costs. All three algorithms are tested by scheduling some of the commonly encountered graph structures.
1996
In this paper, we survey algorithms that allocate a parallel program represented by an edge-weighted directed acyclic graph (DAG), also called a task graph or macrodataflow graph, to a set of homogeneous processors, with the objective of minimizing the completion time. We analyze 21 such algorithms and classify them into four groups. The first group includes algorithms that schedule the DAG to a bounded number of processors directly. These algorithms ;we called the bounded number of processors (€3") scheduling algorithms. The algorithms in the second group schedule the DAG to an unbounded number of clusters and are called the unbounded number of clusters (UNC) scheduling algorithms. The algorithms in the third group schedule the DAG usingtask duplication and are called the task duplication based (TDB) scheduling algorithms. The algorithms in the fourth group perform allocation and mapjping on arbitrary processor network topologies. These algorithms are called the arbitrary processor network (APN) scheduling algorithms. The design philosophies and principles behind these algorithms are discussed, and the performance of all of the algorithms is evaluated and compared against each other on a unified basis by using various scheduling parameters. start by classifying these algorithms into the following four groups:
IEEE Transactions on Parallel and Distributed Systems, 1998
One of the main obstacles in obtaining high performance from message-passing multicomputer systems is the inevitable communication overhead which is incurred when tasks executing on different processors exchange data. Given a task graph, duplication-based scheduling can mitigate this overhead by allocating some of the tasks redundantly on more than one processor. In this paper, we focus on the problem of using duplication in static scheduling of task graphs on parallel and distributed systems. We discuss five previously proposed algorithms and examine their merits and demerits. We describe some of the essential principles for exploiting duplication in a more useful manner and, based on these principles, propose an algorithm which outperforms the previous algorithms. The proposed algorithm generates optimal solutions for a number of task graphs. The algorithm assumes an unbounded number of processors. For scheduling on a bounded number of processors, we propose a second algorithm which controls the degree of duplication according to the number of available processors. The proposed algorithms are analytically and experimentally evaluated and are also compared with the previous algorithms.
ACM SIGARCH Computer Architecture News, 1984
The paper describes a technique for estimating the minimum execution time of an algorithm or a mix of algorithms on a distributed processing system. Bottlenecks that would have to be removed to further reduce the execution time are identified. The main applications are for the high level design of special purpose distributed processing systems. The distributed systems are modelled by P, a set of nonidentical processors and R, a set of resources that the processors can use. The algorithms are modelled by T, an ordered set of tasks. The problem of optimally assigning the processors to the tasks while meeting the resource constraints is NP-complete. However, a heuristic using maximum weighted matchings on graphs has been devised that is extremely fast and comes reasonably close to the optimal solutions.
Proceedings 11th International Parallel Processing Symposium, 1997
This paper demonstrates the effectiveness of the twophase method of scheduling, in which task clustering is performed prior to the actual scheduling process. Task clustering determines the optimal or near-optimal number of processors on which to schedule the task graph. In other words, there is never a need to use more processors (even though they are available) than the number of clusters produced by the task clustering algorithm. The paper also indicates that when task clustering is performed prior to scheduling, load balancing (LB) is the preferred approach for cluster merging. LB is fast, easy to implement, and produces significantly better final schedules than communication traffic minimizing (CTM). In summary, the two-phase method consisting of task clustering and load balancing is a simple yet highly effective strategy for scheduling task graphs on distributed memory parallel architectures.
Programming models for distributed systems often construct a task graph for the program to be executed on a distributed system of processors. While the topology of the task graph can be constructed from the program structure, often the task execution times and data transfer costs between tasks depend on the input data, or more specifically, on the particular problem instance. Though this indicates that the optimal schedule of a task graph cannot be determined until the input data is available, it is possible to estimate the worst case processor requirement for the optimal schedule of a program solely from the topology of its task graph. In this paper, we study the problem of estimating worst case processor requirements for scheduling (with cloning) layered task graphs based on their topology. We show that computing an accurate processor bound for layered graphs is NP hard (even for two layers) and present a polynomial time algorithm which computes an upper bound on the processor requirement. We show that the algorithm provides tight bounds for several common classes of layered task graphs.
IEEE Transactions on Parallel and Distributed Systems, 1994
We present a low complexity heuristic named the Dominant Sequence Clustering algorithm (DSC) for scheduling parallel tasks on an unbounded number of completely connected processors. The performance of DSC is comparable or even better on average than many other higher complexity algorithms. We assume no task duplication and nonzero communication overhead between processors. Finding the optimum solution for arbitrary directed acyclic task graphs (DAGs) is NP-complete. DSC nds optimal schedules for special classes of DAGs such as fork, join, coarse grain trees and some ne grain trees. It guarantees a performance within a factor of two of the optimum for general coarse grain DAGs. We compare DSC with three higher complexity general scheduling algorithms, the MD by Wu and Gajski [19], the ETF by Hwang, Chow, Anger and Lee [12] and Sarkar's clustering algorithm . We also give a sample of important practical applications where DSC has been found useful.
Journal of Computer Systems, Networks, and Communications, 2008
Distributed computing systems [DCSs] offer the potential for improved performance and resource sharing. To make the best use of the computational power available, it is essential to assign the tasks dynamically to that processor whose characteristics are most appropriate for the execution of the tasks in distributed processing system. We have developed a mathematical model for allocating "M" tasks of distributed program to "N" multiple processors (M > N) that minimizes the total cost of the program. Relocating the tasks from one processor to another at certain points during the course of execution of the program that contributes to the total cost of the running program has been taken into account. Phasewise execution cost [EC], intertask communication cost [ITCT], residence cost [RC] of each task on different processors, and relocation cost [REC] for each task have been considered while preparing a dynamic tasks allocation model. The present model is suitable for arbitrary number of phases and processors with random program structure.
2011
Finding an efficient schedule for a task graph on several processors is a trade-off between maximising concurrency and minimising interprocessor communication. Task duplication is a technique that has been employed to reduce or avoid interprocessor communication. Certain tasks are duplicated on several processors to produce the data locally and avoid the communication among processors. Most of the algorithms using task duplication have been proposed for the classic scheduling model, which allows concurrent communication and ignores contention for communication resources. It is increasingly recognised that this classic model is unrealistic and does not permit creating accurate and efficient schedules. The recently proposed contention model introduces contention awareness into task scheduling by assigning the edges of the task graph to the links of the communication network. It is intuitive that scheduling under such a model benefits even more from task duplication, yet no such algorithm has been proposed as it is not trivial to duplicate tasks under the contention model. This paper proposes a contention-aware task duplication scheduling algorithm. We investigate the fundamentals for task duplication in the contention model and propose an algorithm that is based on state-of-the-art techniques found in task duplication and contention-aware algorithms. An extensive experimental evaluation demonstrates the significant improvements to the speedup of the produced schedules.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.