EE382V Fall 2006
VLSI Physical Design Automation
Lecture 4. Circuit Partitioning (II)
Prof. David Pan
[email protected] Office: ACES 5.434
10/18/08 1
Recap of Kernighan-Lin’s Algorithm
❁Pair-wise exchange of nodes to reduce cut size
❁Allow cut size to increase temporarily within a pass
Compute the gain of a swap
Repeat u• v•
Perform a feasible swap of max gain
Mark swapped nodes “locked”; v• u•
Update swap gains;
locked
Until no feasible swap;
Find max prefix partial sum in gain sequence g1, g2, …, gm
Make corresponding swaps permanent.
❁Start another pass if current pass reduces the cut size
(usually converge after a few passes)
2
Fiduccia-Mattheyses Algorithm
“A Linear-time Heuristics
for Improving Network Partitions”
19th DAC, pages 175-181, 1982.
10/18/08 3
Features of FM Algorithm
• Modification of KL Algorithm:
– Can handle non-uniform vertex weights (areas)
– Allow unbalanced partitions
– Extended to handle hypergraphs
– Clever way to select vertices to move, run much faster.
4
Problem Formulation
• Input: A hypergraph with
– Set vertices V. (|V| = n)
– Set of hyperedges E. (total # pins in netlist = p)
– Area au for each vertex u in V.
– Cost ce for each hyperedge in e.
– An area ratio r.
• Output: 2 partitions X & Y such that
– Total cost of hyperedges cut is minimized.
– area(X) / (area(X) + area(Y)) is about r.
• This problem is NP-Complete!!!!!
5
Ideas of FM Algorithm
• Similar to KL:
– Work in passes.
– Lock vertices after moved.
– Actually, only move those vertices up to the maximum partial
sum of gain.
• Difference from KL:
– Not exchanging pairs of vertices.
Move only one vertex at each time.
– The use of gain bucket data structure.
6
Gain Bucket Data Structure
+pmax
Max Cell Cell
Gain # #
-pmax
1 2 n
7
FM Partitioning:
Moves are made based on object gain.
Object Gain: The amount of change in cut crossings
that will occur if an object is moved from
its current partition into the other partition
-1 0 2
- each object is assigned a
gain
- objects are put into a sorted 0
gain list 0 -
-2
- the object with the highest gain
from the larger of the two sides
is selected and moved.
- the moved object is "locked" 0 0
- gains of "touched" objects are -2
recomputed
-1
- gain lists are resorted
1
-1
1
8
FM Partitioning:
-1 0 2
0
0 -
-2
0 0
-2
-1
1
-1
1
9
-1 -2 -2
0
-2 -
-2
0 0
-2
-1
1
-1
1
10
-1 -2 -2
0
-2 -
-2
0 0
-2
-1
1 1
-1
11
-1 -2 -2
0
-2 -
-2
0 0
-2
-1
1
1
-1
12
-1 -2 -2
0
-2 -
-2
0 -2
-2
1 -1
-1
-1
13
-1 -2 -2
-2 -
-2 0
0 -2
-2
1 -1
-1
-1
14
-1 -2 -2
-2 -
-2 0
0 -2
-2
1 -1
-1
-1
15
-1 -2 -2
-2 1
-2
0
-2 -2
-2
1 -1
-1
-1
16
-1 -2 -2
-2 1
-2
0
-2 -2
1 -2
-1
-1
-1
17
-1 -2 -2
-2 1
-2
0
-2 -2
1 -2
-1
-1
-1
18
-1 -2 -2
-2 1
-2
0
-2 -1
-2
-2
-3
-1
-1
19
-1 -2 -2
1
-2
-2
0
-2 -1
-2
-2
-3
-1
-1
20
-1 -2 -2
1
-2
-2
0
-2 -1
-2
-2
-3
-1
-1
21
-1 -2 -2
-1
-2
-2
-2
-2 -1
-2
-2
-3
-1
-1
22
Time Complexity of FM
• For each pass,
– Constant time to find the best vertex to move.
– After each move, time to update gain buckets is proportional
to degree of vertex moved.
– Total time is O(p), where p is total number of pins
• Number of passes is usually small.
23
Extension by Krishnamurthy
“An Improved Min-Cut Algorithm for
Partitioning VLSI Networks”,
IEEE Trans. Computer,
33(5):438-446, 1984.
10/18/08 24
Tie-Breaking Strategy
• For each vertex, instead of having a gain bucket, a
gain vector is used.
• Gain vector is a sequence of potential gain values
corresponding to numbers of possible moves into the
future.
• Therefore, rth entry looks r moves ahead.
• Time complexity is O(pr), where r is max # of look-
ahead moves stored in gain vector.
• If ties still occur, some researchers observe that LIFO
order improves solution quality.
25
Ratio Cut Objective
by Wei and Cheng
“Towards Efficient Hierarchical Designs by
Ratio Cut Partitioning”,
ICCAD, pages 1:298-301, 1989.
10/18/08 26
Ratio Cut Objective
• It is not desirable to have some pre-defined ratio on
the partition sizes.
• Wei and Cheng proposed the Ratio Cut objective.
• Try to locate natural clusters in circuit and force the
partitions to be of similar sizes at the same time.
• Ratio Cut RXY = CXY/(|X| x |Y|)
• A heuristic based on FM was proposed.
27
Sanchis Algorithm
“Multiple-way Network Partitioning”,
IEEE Trans. Computers,
38(1):62-81, 1989.
10/18/08 28
Multi-Way Partitioning
• Dividing into more than 2 partitions.
• Algorithm by extending the idea of FM +
Krishnamurthy.
29
Partitioning:
Simulated Annealing
10/18/08 30
State Space Search Problem
• Combinatorial optimization problems (like partitioning)
can be thought as a State Space Search Problem.
• A State is just a configuration of the combinatorial
objects involved.
• The State Space is the set of all possible states
(configurations).
• A Neighbourhood Structure is also defined (which
states can one go in one step).
• There is a cost corresponding to each state.
• Search for the min (or max) cost state.
31
Greedy Algorithm
• A very simple technique for State Space Search
Problem.
• Start from any state.
• Always move to a neighbor with the min cost (assume
minimization problem).
• Stop when all neighbors have a higher cost than the
current state.
32
Problem with Greedy Algorithms
• Easily get stuck at local minimum.
• Will obtain non-optimal solutions.
Cost
State
• Optimal only for convex (or concave for maximization)
funtions.
33
Greedy Nature of KL & FM
• KL and FM are almost greedy algorithms.
Pass 1 Pass 2
Cut Value
Partitions
• Purely greedy if we consider a pass as a “move”.
Move 1
Move 2 A B
Cut Value
A Move
B A
Partitions
34
Simulated Annealing
• Very general search technique.
• Try to avoid being trapped in local minimum by making
probabilistic moves.
• Popularize as a heuristic for optimization by:
– Kirkpatrick, Gelatt and Vecchi, “Optimization by Simulated
Annealing”, Science, 220(4598):498-516, May 1983.
35
Basic Idea of Simulated Annealing
• Inspired by the Annealing Process:
– The process of carefully cooling molten metals in order to
obtain a good crystal structure.
– First, metal is heated to a very high temperature.
– Then slowly cooled.
– By cooling at a proper rate, atoms will have an increased
chance to regain proper crystal structure.
• Attaining a min cost state in simulated annealing is
analogous to attaining a good crystal structure in
annealing.
36
The Simulated Annealing Procedure
Let t be the initial temperature.
Repeat
Repeat
– Pick a neighbor of the current state randomly.
– Let c = cost of current state.
Let c’ = cost of the neighbour picked.
– If c’ < c, then move to the neighbour (downhill move).
– If c’ > c, then move to the neighbour with probablility e-(c’-c)/t
(uphill move).
Until equilibrium is reached.
Reduce t according to cooling schedule.
Until Freezing point is reached.
37
Things to decide when using SA
• When solving a combinatorial problem,
we have to decide:
– The state space
– The neighborhood structure
– The cost function
– The initial state
– The initial temperature
– The cooling schedule (how to change t)
– The freezing point
38
Common Cooling Schedules
• Initial temperature, Cooling schedule, and freezing
point are usually experimentally determined.
• Some common cooling schedules:
– t = αt, where α is typically around 0.95
– t = e-βt t, where β is typically around 0.7
– ......
39
Paper by Johnson, Aragon, McGeoch and
Schevon on Bisectioning using SA
“Optimization by Simulated Annealing:
An Experimental Evaluation Part I,
Graph Partitioning”,
Operations Research, 37:865-892, 1989.
10/18/08 40
The Work of Johnson, et al.
• An extensive empirical study of Simulated Annealing
versus Iterative Improvement Approaches.
• Conclusion: SA is a competitive approach, getting
better solutions than KL for random graphs.
Remarks:
– Netlists are not random graphs, but sparse graphs with local
structure.
– SA is too slow. So KL/FM variants are still most popular.
– Multiple runs of KL/FM variants with random initial solutions
may be preferable to SA.
41
The Use of Randomness
• For any partitioning problem:
All solutions (State space)
G
A Good solutions
• Suppose solutions are picked randomly.
• If |G|/|A| = r, Pr(at least 1 good in 5/r trials) = 1-(1-r)5/r
• If |G|/|A| = 0.001, Pr(at least 1 good in 5000 trials) = 1-
(1-0.001)5000 = 0.9933
42
Adding Randomness to KL/FM
• In fact, # of good states are extremely few. Therefore,
r is extremely small.
• Need extremely long time if just picking states
randomly (without doing KL/FM).
• Running KL/FM variants several times with random
initial solutions is a good idea.
Good Initial States
Cut Value
Good States
Partitions
43
Some Other Approaches
• KL/FM-SA Hybrid: Use KL/FM variant to find a good
initial solution for SA, then improve that solution by SA
at low temperature.
• Tabu Search
• Genetic Algorithm
• Spectral Methods (finding Eigenvectors)
• Network Flows
• Quadratic Programming
• ......
44
Partitioning:
Multi-Level Technique
10/18/08 45
Multi-Level Partitioning
46
Multilevel Hypergraph Partitioning:
Applications in VLSI Domain
G. Karypis, R. Aggarwal, V. Kumar and S. Shekhar,
DAC 1997.
10/18/08 47
Coarsening Phase
• Edge Coarsening
• Hyper-edge Coarsening (HEC)
• Modified Hyperedge Coarsening (MHEC)
48
Uncoarsening and Refinement Phase
1. FM:
• Based on FM with two simplifications:
– Limit number of passes to 2
– Early-Exit FM (FM-EE), stop each pass if k vertex moves do not
improve the cut
4. HER (Hyperedge Refinement)
• Move a group of vertices between partitions so that an
entire hyperedge is removed from the cut
49
hMETIS Algorithm
• Software implementation available for free download
from Web
• hMETIS-EE20
– 20 random initial partitons
– with 10 runs using HEC for coarsening
– with 10 runs using MHEC for coarsening
– FM-EE for refinement
• hMETIS-FM20
– 20 random initial partitons
– with 10 runs using HEC for coarsening
– with 10 runs using MHEC for coarsening
– FM for refinement
50
Experimental Results
• Compared with five previous algorithms
• hMETIS-EE20 is:
– 4.1% to 21.4% better
– On average 0.5% better than the best of the 5 algorithms
– Roughly 1 to 15 times faster
• hMETIS-FM20 is:
– On average 1.1% better than hMETIS-EE20
– Improve the best-known bisections for 9 out of 23 test circuits
– Twice as slow as hMETIS-EE20
51