0% found this document useful (0 votes)

173 views46 pages

Load Balancing

This document discusses various techniques for load balancing distributed systems. It describes static and dynamic load balancing, and covers algorithms like recursive bisection, diffusion-based approaches, and dimension exchange. The key goals of load balancing are to ensure all processors have work and to distribute load efficiently while minimizing communication overhead. Dynamic techniques periodically rebalance as loads change, while decentralized approaches avoid centralized bottlenecks.

Uploaded by

Lore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

173 views46 pages

Load Balancing

Uploaded by

Lore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 46

CS 584

Load Balancing
Goal: All processors working all the
time

Efficiency of 1
Distribute the load (work) to meet the goal

Two types of load balancing

Static
Dynamic

Load Balancing
The load balancing problem can be
reduced to the bin-packing problem

NP-complete

For simple cases, we can do well, but

Heterogeneity
Different types of resources
Processor
Network, etc.

Evaluation of load
balancing
Efficiency

Are the processors always working?

How much processing overhead is associated
with the load balance algorithm?

Communication

Does load balance introduce or affect the

communication pattern?
How much communication overhead is
associated with the load balance algorithm?
How many edges are cut in communication
graph?

Partitioning Techniques
Regular grids (-: Easy :-)

striping
blocking
use processing power to divide load more
fairly

Generalized Graphs

Levelization
Scattered Decomposition
Recursive Bisection

Levelization
Begin with a boundary

Number these nodes 1

All nodes connected to a level 1

node are labeled 2, etc.
Partitioning is performed

determine the number of nodes per processor

count off the nodes of a level until exhausted
proceed to the next level

Levelization

Levelization
Want to insure nearest neighbor
comm.
If p is # processors and n is # nodes.
Let ri be the sum of the number of
nodes in contiguous levels i and i + 1
Let r = max{r1, r2, , rn}
Nearest neighbor communication is
assured if n/p > r

Scattered Decomposition
Used for highly irregular grids
Partition load into a large number r of
rectangular clusters such that r >> p
Each processor is given a disjoint set
of r/p clusters.
Communication overhead can be a
problem for highly irregular problems.

Recursive Bisection
Recursively divide the domain in
two pieces at each step.
3 Methods

Recursive Coordinate Bisection

Recursive Graph Bisection
Recursive Spectral Bisection

Recursive Coordinate
Bisection
Divide the domain
based on the
physical coordinates
of the nodes.
Pick a dimension and
divide in half.
RCB uses no connectivity information

lots of edges crossing boundaries

partitions may be disconnected

Some new research based on graph

separators overcomes some problems.

Ineritial Bisection
Often, coordinate bisection is susceptible
to the orientation of the mesh
Solution: Find the principle axis of the
communication graph

Graph Theory Based

Algorithms
Geometric algorithms are generally
low quality

they dont take into account

connectivity

Graph theory algorithms apply

what we know about generalized
graphs to the partitioning problem
Hopefully, they reduce the cut size

Greedy Bisection
Start with a vertex of
the smallest degree

least number of edges

Mark all its neighbors

Mark all its neighbors
neighbors, etc.
The first n/p marked
vertices form one
subdomain
Apply the algorithm
on the remaining

Recursive Graph Bisection

Based on graph
distance rather than
coordinate distance.
Determine the two
furthest separated
nodes
Organize and partition
nodes according to
their distance from
extremities.

Computationally
expensive

Can use approximation

methods.

Recursive Spectral
Bisection
Uses the discrete Laplacian
Let A be the adjacency matrix
Let D be the diagonal matrix where

D[i,i] is the degree of node I

LG = A - D

Recursive Spectral
Bisection
LG is negative semidefinite
Its largest eigenvalue is zero and the
corresponding eigenvector is all ones.
The magnitude of the second largest
eigenvalue gives a measure of the
connectivity of the graph.
Its corresponding eigenvector gives a
measure of distances between nodes.

Recursive Spectral
Bisection
The eigenvector corresponding to
the second largest eigenvalue is
the Fiedler vector.
Calculation of the Fiedler vector is
computationally intensive.
RSB yields connected partitions
that are very well balanced.

Example

RSB
299 edges cut

RCB 529 edges cut

RGB 618 edges cut

Global vs Local Partitioning

Global methods produce a good
partitioning
Local methods can then be used to
improve the partitioning

The Kernighan-Lin
algorithm
Swap pairs of nodes to decrease the cut
Will allow intermediate increases in the cut size to
avoid certain local minima
Loop

choose the pair of nodes with largest benefit of swapping

logically exchange them (not for real)
lock those nodes
until all nodes are locked

Find the sequence of swaps that yields the largest

accumulated benefit
Perform the swaps for real

The Kernihan-Lin
Algorithm

Helpful-Sets
Two Steps

Find a set of nodes in one partition and

move it to the other partition to decrease
the cut size
Rebalance the load

The set of nodes moved must be helpful

Helpfulness of node is equal to the
change in cut size if the node is moved

Helpful-Sets

All these sets are

2 - helpful

Helpful-Sets Algorithm

The Helpful-Sets Algorithm

Theory

If there is a bisection and if its cut size is not

too small then there exists a small 4helpful set in one side or the other
This 4-helpful set can be moved and will
reduce the cut by 4
If imbalance is not too large and cut of
unbalanced partition is not too small then
it is possible to rebalance without increasing
the cut size by more than 2

Apply the theory iteratively until too

small condition is met.

Multi-level Hybrid Methods

For very large graphs, time to
partition can be extremely costly
Reduce time by coarsening the graph

shrink a large graph to a smaller one

that has similar characteristics

Coarsen by

heavy edge matching

simple partitioning heuristics

Multi-level Hybrid Methods

Comparisons

(x.xx) run time in seconds

ML Multilevel (spectral on coarse KL on intermedia
IN Inertial
Party 5 or 6 different methods

Graph
airfoil

|v|
4253

|e|
12289

ML
85
(0.08)

Chaco
IN
94
(0.00)

crack

10240

30380

211
(0.16)

377
(0.01)

218
(0.05)

196
(0.14)

243
(0.10)

208
(0.44)

wave

156317 10593319542
(3.64)

9834
(0.19)

9660
(1.61)

9801
(3.50)

10361
(2.84)

9614
(11.93)

22579

13643
(0.06)

9897
(0.06)

8869
(3.45)

8869
(11.52)

lh
1443
20148
total edge weight

36376
487380 (0.33)

mat

73752

17617189359
(1.80)

DEBR

10485762097149100286
(48.99)

IN+KL
83
(0.02)

Metis
PMetis
85
(0.04)

all
94
(0.04)

all+HS
83
(0.15)

9555
(2.04)

Party

101674 172204 94272

(988.39) (16.63) (577.97)

(0.

Dynamic Load Balancing

Load is statically partitioned initially
Adjust load when an imbalance is
detected.
Objectives

rebalance the load

keep edge cut minimized (communication)
avoid having too much overhead

Dynamic Load Balancing

Consider adaptive algorithms
After an interval of computation

mesh is adjusted according to an

estimate of the discretization error
coarsened in areas
refined in others

Mesh adjustment causes load

imbalance

Dynamic Load Balancing

After refinement, node 1 ends up with more work

Centralized DLB
Control of the load is centralized
Two approaches

Master-worker (Task scheduling)

Tasks are kept in central location
Workers ask for tasks
Requires that you have lots of tasks with weak locality

requirements. No major communication between

workers

Load Monitor
Periodically, monitor load on the processors
Adjust load to keep optimal balance

Repartitioning
Consider: dynamic situation is simply a
sequence of static situations
Solution: repartition the load after each

some partitioning algorithms are very quick

Issues

scalability problems
how different are current load distribution
and new load distribution
data dependencies

Decentralizing DLB
Generally focused on work pool
Two approaches

Hierarchy

Fully distributed

Fully Distributed DLB

Lower overhead than centralized
schemes.
No global information

Load is locally optimized

Propagation is slow
Load balance may not be as good as
centralized load balance scheme

Three steps

Flow calculation (How much to move)

Mesh node selection (Which work to move)
Actual mesh node migration

Flow calculation
View as a network flow problem

Add source and sink nodes

Connect source to all nodes
edge value is current load

Connect sink to all nodes

edge value is mean load
processor communication graph

Flow calculation
Many network flow algorithms

more intense than necessary

not parallel

Use simpler, more scalable algorithms

Random Matchings

pick random neighboring processes

exchange some load
eventually you may get there

Diffusion
Each processor balances its load with
all its neighbors

How much work should I have?

wtp1 wtp

t
t

(
w

w
pq p q )

q ,{ p , q}F

How much to send on an edge?

l tpq1 pq ( wtp wqt )

Repeat until all load is balanced

log(1 / )

O
2
1

steps

Diffusion
Convergence to load balance can be
slow
Can be improved with over-relaxation

Monitor what is sent in each step

Determine how much to send based on
current imbalance and how much was sent
in previous steps

Diffuses load in

log(
1
/

O
1 2

steps

Dimension Exchange
Rather than communicate with all neighbors
each round, only communicate with one

Comes from dimensions of hypercube

Use edge coloring for general graphs

Exchange load with neighbor along a

dimension

l = (li + lj)/2

Will converge in d steps if hypercube

Some graphs may need different factor to
converge faster

l = li * a + lj * (1 a)

Diffusion & Dimension

Exchange
Can view

diffusion as a Jacobi method

dimension exchange as Gauss-Seidel

Can use multi-level variants

Divide the processor communication

graph in half
Determine the load to shift across the
cut
Recursively rebalance each half

Mesh node selection

Must identify which mesh nodes to
migrate

minimize edge cut and overhead

Very dependent on problem

Shape & size of partition may play a role
in accuracy

Aspect ratio maintenance

Move items that are further away from
center of gravity.

Load Balancing Schemes

(Who do I request work from?)

Asynchronous Round Robin

each processor maintains target

Ask from target then increment target

Global Round Robin

target is maintained by master node

Random Polling

randomly select a donor

each processor has equal probability

Hilbert and Z-Curve in AMR
No ratings yet
Hilbert and Z-Curve in AMR
32 pages
Graph Partitioning Algorithms: CME342 - Parallel Methods in Numerical Analysis
No ratings yet
Graph Partitioning Algorithms: CME342 - Parallel Methods in Numerical Analysis
47 pages
Lecture 8
No ratings yet
Lecture 8
18 pages
Graph Partitioning for Mobile Networks
No ratings yet
Graph Partitioning for Mobile Networks
28 pages
Optimization for Graph Partitioning
No ratings yet
Optimization for Graph Partitioning
9 pages
Project 06
No ratings yet
Project 06
14 pages
Domain Decomposition in Finite Element Analysis
No ratings yet
Domain Decomposition in Finite Element Analysis
24 pages
Bert 3a Parallel Algorithms Distributed Memory
No ratings yet
Bert 3a Parallel Algorithms Distributed Memory
10 pages
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
No ratings yet
Lecture 4: Principles of Parallel Algorithm Design (Part 4)
27 pages
Energy Efficient Scheme For Large Scale Wireless Sensor Networks With Multiple Sinks
No ratings yet
Energy Efficient Scheme For Large Scale Wireless Sensor Networks With Multiple Sinks
52 pages
Graph Algorithms Explained
No ratings yet
Graph Algorithms Explained
10 pages
Graph Partitioning Implementation Strategy Pattern Name
No ratings yet
Graph Partitioning Implementation Strategy Pattern Name
12 pages
Course:-Cad For Vlsi Partitioning Algorithm (K-L Algorithm and F-M Algorithm)
No ratings yet
Course:-Cad For Vlsi Partitioning Algorithm (K-L Algorithm and F-M Algorithm)
9 pages
Graph Theory: Types, Applications, and Algorithms
No ratings yet
Graph Theory: Types, Applications, and Algorithms
5 pages
WINSEM2022 23 CSE4001 ETH VL2022230503182 Reference Material I 02
No ratings yet
WINSEM2022 23 CSE4001 ETH VL2022230503182 Reference Material I 02
28 pages
Federated Learning
No ratings yet
Federated Learning
36 pages
Algorithms
No ratings yet
Algorithms
8 pages
Introduction to Graph Partitioning Techniques
No ratings yet
Introduction to Graph Partitioning Techniques
5 pages
Applications of Graphs and Networks
No ratings yet
Applications of Graphs and Networks
34 pages
Dsap l04 PDF
No ratings yet
Dsap l04 PDF
63 pages
Nearest Neighbor Load Balancing Analysis
No ratings yet
Nearest Neighbor Load Balancing Analysis
3 pages
DS Unit - 4
No ratings yet
DS Unit - 4
11 pages
BCS 042 RJT Notes
No ratings yet
BCS 042 RJT Notes
25 pages
Advances in Graph Partitioning
No ratings yet
Advances in Graph Partitioning
42 pages
Resource Management
No ratings yet
Resource Management
35 pages
Graph
No ratings yet
Graph
62 pages
Advanced Graph Theory
No ratings yet
Advanced Graph Theory
11 pages
05 Notes
No ratings yet
05 Notes
30 pages
Unit 2 - 2.2 (Basic Algorithms)
No ratings yet
Unit 2 - 2.2 (Basic Algorithms)
8 pages
Graph
No ratings yet
Graph
2 pages
Daa Unti Iii
No ratings yet
Daa Unti Iii
10 pages
PHD Synthesis
No ratings yet
PHD Synthesis
4 pages
Graph
No ratings yet
Graph
90 pages
MA252 - Combinatorial Optimisation
No ratings yet
MA252 - Combinatorial Optimisation
9 pages
DAA 4th Unit Notes
No ratings yet
DAA 4th Unit Notes
22 pages
Graphs - Complete DSA Guide
No ratings yet
Graphs - Complete DSA Guide
26 pages
VLSI Circuit Partitioning Guide
100% (1)
VLSI Circuit Partitioning Guide
24 pages
(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF
No ratings yet
(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF
135 pages
Con Currency Mapping
No ratings yet
Con Currency Mapping
40 pages
Graph MF RA Exam Prep Notes Yx
No ratings yet
Graph MF RA Exam Prep Notes Yx
25 pages
Multi-Dimensional Balanced Graph Partitioning Via Projected Gradient Descent
No ratings yet
Multi-Dimensional Balanced Graph Partitioning Via Projected Gradient Descent
14 pages
Branch and Bound
No ratings yet
Branch and Bound
49 pages
Presentat ION: Presented by Muneeb, Huzaifa, Dayyan, Jawad & Sharjeel
No ratings yet
Presentat ION: Presented by Muneeb, Huzaifa, Dayyan, Jawad & Sharjeel
32 pages
L03 Geometric Decomposition
No ratings yet
L03 Geometric Decomposition
27 pages
Unit IV College Notes
No ratings yet
Unit IV College Notes
12 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
52 pages
11ApproximationAlgorithms 2x2
No ratings yet
11ApproximationAlgorithms 2x2
15 pages
Basic Graph
No ratings yet
Basic Graph
8 pages
Graph
No ratings yet
Graph
31 pages
Recap
No ratings yet
Recap
10 pages
Advanced Algorithms Course. Lecture Notes. Part 4: Using Linear Programming For Approximation Algorithms
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 4: Using Linear Programming For Approximation Algorithms
5 pages
Sarthak Tomar53 Unit-4 DAA
No ratings yet
Sarthak Tomar53 Unit-4 DAA
9 pages
Unit 4
No ratings yet
Unit 4
68 pages
Graph
No ratings yet
Graph
54 pages
25 Approx
No ratings yet
25 Approx
70 pages
Lecture6 of The Mafis Hgadd. Uyddddexcfdds
No ratings yet
Lecture6 of The Mafis Hgadd. Uyddddexcfdds
54 pages
GT Finall Report
No ratings yet
GT Finall Report
7 pages
Optimization Paradigms Lecture Notes
No ratings yet
Optimization Paradigms Lecture Notes
118 pages
IEEE 802 Standards Overview
No ratings yet
IEEE 802 Standards Overview
5 pages
IEEE 802 Standards Overview
No ratings yet
IEEE 802 Standards Overview
5 pages
Database Design Implementation Guide
No ratings yet
Database Design Implementation Guide
7 pages
Advanced Database Systems Moderation
No ratings yet
Advanced Database Systems Moderation
2 pages
Internal vs External Sorting Algorithms
No ratings yet
Internal vs External Sorting Algorithms
53 pages
Computer Awareness 1 PDF
No ratings yet
Computer Awareness 1 PDF
17 pages
NeurIPS 2022 A Unified Framework For Deep Symbolic Regression Paper Conference
No ratings yet
NeurIPS 2022 A Unified Framework For Deep Symbolic Regression Paper Conference
11 pages
Sprites: Sprite Animation
No ratings yet
Sprites: Sprite Animation
5 pages
Ashutosh Project (PR)
No ratings yet
Ashutosh Project (PR)
56 pages
Universiti Teknologi Mara Faculty of Architecture, Planning and Surveying Centre of Studies For Surveying Science and Geomatics Semester Sept 18 - Jan 19
No ratings yet
Universiti Teknologi Mara Faculty of Architecture, Planning and Surveying Centre of Studies For Surveying Science and Geomatics Semester Sept 18 - Jan 19
8 pages
IOS Operating System
No ratings yet
IOS Operating System
14 pages
Azure 500
No ratings yet
Azure 500
3 pages
Smart Data Backup Management System
No ratings yet
Smart Data Backup Management System
51 pages
IoT Interoperability and Security Challenges
No ratings yet
IoT Interoperability and Security Challenges
7 pages
SN Site ID City: AK0313 AK0316 AK0320 AK0326 AK0327 AK0331 AK0318 AK0317
No ratings yet
SN Site ID City: AK0313 AK0316 AK0320 AK0326 AK0327 AK0331 AK0318 AK0317
9 pages
Directory Structure
100% (1)
Directory Structure
47 pages
Full Download Mastering Machine Learning With Spark 2 X Harness The Potential of Machine Learning Through Spark 1st Edition Alex Tellez PDF
100% (29)
Full Download Mastering Machine Learning With Spark 2 X Harness The Potential of Machine Learning Through Spark 1st Edition Alex Tellez PDF
55 pages
Albert Bernard Cheryl Birthday Puzzle
No ratings yet
Albert Bernard Cheryl Birthday Puzzle
3 pages
Fable - Walkthrough
No ratings yet
Fable - Walkthrough
110 pages
Archclass 8 Advance Imaging (Solved)
No ratings yet
Archclass 8 Advance Imaging (Solved)
2 pages
Kaizen Blitz for Project Team Efficiency
No ratings yet
Kaizen Blitz for Project Team Efficiency
3 pages
Advances in Well Intervention Technology
No ratings yet
Advances in Well Intervention Technology
10 pages
Lab: Windows Host Attack Investigation
No ratings yet
Lab: Windows Host Attack Investigation
8 pages
Report Paper
No ratings yet
Report Paper
14 pages
JETI DS-12 Manual (English)
No ratings yet
JETI DS-12 Manual (English)
151 pages
Configuring The Web Service Client in The SAP NetWeaver Administrator
No ratings yet
Configuring The Web Service Client in The SAP NetWeaver Administrator
7 pages
Sons of Liberty
100% (1)
Sons of Liberty
99 pages
Global Notice Inviting Tender (Nit) - E-Tender
No ratings yet
Global Notice Inviting Tender (Nit) - E-Tender
1 page
Modbus RTU Protocol Guide
No ratings yet
Modbus RTU Protocol Guide
18 pages
Software Test Engineer Resume - Prakash Narkhede
100% (1)
Software Test Engineer Resume - Prakash Narkhede
2 pages
Y10 03 CT15 Activities Solutions
No ratings yet
Y10 03 CT15 Activities Solutions
4 pages
Ambarella CV52S Product Brief 15OCT2021
No ratings yet
Ambarella CV52S Product Brief 15OCT2021
2 pages
Schedule A - Battery Energy Storage System (BESS) Specification
No ratings yet
Schedule A - Battery Energy Storage System (BESS) Specification
37 pages
Bus Ticket Instructions
No ratings yet
Bus Ticket Instructions
5 pages
338138devops For Web Development Mitesh Soni Download
100% (1)
338138devops For Web Development Mitesh Soni Download
26 pages

Load Balancing

Uploaded by

Load Balancing

Uploaded by

CS 584

Two types of load balancing

For simple cases, we can do well, but

Are the processors always working?

Does load balance introduce or affect the

Number these nodes 1

All nodes connected to a level 1

determine the number of nodes per processor

Recursive Coordinate Bisection

lots of edges crossing boundaries

Some new research based on graph

Graph Theory Based

they dont take into account

Graph theory algorithms apply

least number of edges

Mark all its neighbors

Recursive Graph Bisection

Can use approximation

D[i,i] is the degree of node I

RCB 529 edges cut

RGB 618 edges cut

Global vs Local Partitioning

choose the pair of nodes with largest benefit of swapping

Find the sequence of swaps that yields the largest

Find a set of nodes in one partition and

The set of nodes moved must be helpful

All these sets are

The Helpful-Sets Algorithm

If there is a bisection and if its cut size is not

Apply the theory iteratively until too

Multi-level Hybrid Methods

shrink a large graph to a smaller one

heavy edge matching

Multi-level Hybrid Methods

(x.xx) run time in seconds

101674 172204 94272

Dynamic Load Balancing

rebalance the load

Dynamic Load Balancing

mesh is adjusted according to an

Mesh adjustment causes load

Dynamic Load Balancing

After refinement, node 1 ends up with more work

Master-worker (Task scheduling)

requirements. No major communication between

some partitioning algorithms are very quick

Fully Distributed DLB

Load is locally optimized

Flow calculation (How much to move)

Add source and sink nodes

Connect sink to all nodes

more intense than necessary

Use simpler, more scalable algorithms

pick random neighboring processes

How much work should I have?

How much to send on an edge?

l tpq1 pq ( wtp wqt )

Monitor what is sent in each step

Comes from dimensions of hypercube

Exchange load with neighbor along a

Will converge in d steps if hypercube

Diffusion & Dimension

diffusion as a Jacobi method

Can use multi-level variants

Divide the processor communication

Mesh node selection

minimize edge cut and overhead

Very dependent on problem

Aspect ratio maintenance

Load Balancing Schemes

Asynchronous Round Robin

each processor maintains target

Global Round Robin

target is maintained by master node

randomly select a donor

You might also like