A3: On Balancing of Dynamic Networks

Henning Meyerhenke; Stefan Schamberger

A3: On Balancing of Dynamic Networks

Abstract

Load balancing is an important prerequisite to efficiently execute dynamic computations on parallel computers. In this context, this project has focussed on two topics: balancing dynamically generated work load cost efficiently in a network and partitioning graphs to equally distribute connected tasks on the processing nodes while reducing the communication overhead. We summarize new insights and results in these areas.

One of the most significant causes for performance degradation of scientific and engineering applications on high performance computing systems is the uneven distribution of the computational work to the resources of the system. This effect, which is known as load imbalance, is even more noticeable in the case of irregular applications and heterogeneous distributed systems. This motivated the parallel and distributed computing research community to focus on methods that provide good load balancing for scientific and engineering applications running on (heterogeneous) distributed systems. Efficient load balancing and scheduling methods are employed for scientific applications from various fields, such as mechanics, materials, physics, chemistry, biology, applied mathematics, etc. Such applications typically employ a large number of computational methods in order to simulate complex phenomena, on very large scales of time and magnitude. These simulations consist of routines that perform repetitive computations (in the form of DO/FOR loops) over very large data sets, which, if not properly implemented and executed, may suffer from poor performance. The number of repetitive computations in the simulation codes is not always constant. Moreover, the computational nature of these simulations may be in fact irregular, leading to the case when one computation takes (unpredictably) more time than others. For successful and timely results, large scale simulations require the use of large scale computing systems, which often are widely distributed and highly heterogeneous. Moreover, large scale computing systems are usually shared among multiple users, which causes the quality and quantity of the available resources to be highly unpredictable. There are numerous load balancing methods in the literature for different parallel architectures. The most recent of these methods typically follow the master-worker paradigm, where a single coordinator (master) is responsible for making all the scheduling decisions based on information provided by the workers. Depending on the application requirements, the scheduling policy and the computational environment, the benefits of this paradigm may be limited as follows: (1) its efficiency may not scale as the number of processors increases, and (2) it is quite probable that the scheduling decisions are made based on outdated information, especially on systems where the workload changes rapidly. In an effort to address these limitations, we propose a distributed (master-less) load balancing scheme, in which the scheduling decisions are made by the workers in a distributed fashion. We implemented this method along with other two master-worker schemes (a previously existing one and a recently modified one) for three different scientific computational kernels. In order to validate the usefulness and efficiency of the proposed scheme, we conducted a series of comparative performance tests with the two master-worker schemes for each computational

Log In

A3: On Balancing of Dynamic Networks

Sign up for access to the world's latest research

Abstract

Related papers