ECE 6775
High-Level Digital Design Automation
Fall 2024
Scheduling
Announcements
▸Lab 2 due tomorrow
▸Lab 3 will be released soon
▸Tuesday lecture (Oct 1) rescheduled to Friday,
Oct 4 at 4:30pm, Rhodes 310
▸Neural network tutorial on Thursday (Oct 3) by
Jordan Dotzel
– Useful for Lab 4 and final project
1
LLVM: A Foundational IR for Modern Compilers
▸ Formerly Low Level Virtual Machine
– Brainchild of Chris Lattner and Vikram Adve back in 2000
– ACM Software System Award in 2012
– LLVM is widely used in industry and academia (including the HLS tool in
our class)
▸ The core of LLVM is the SSA-base IR
– Language independent, target independent, easy to use
– RISC-like virtual instructions, unlimited registers, exception handling, etc.
▸ Provides modular & reusable components for building compilers
– Components are ideally language/target independent
– Allows choice of the right component for the job
– Many high-quality libraries (components) with clean interfaces
• Optimizations, analyses, modular code generator, profiling, link time optimization,
ARM/X86/PPC/SPARC code generator …
• Tools built from the libraries: C/C++/ObjC compiler, modular optimizer, linker, debugger,
LLVM JIT …
2
Example: An LLVM Loop
loop:
%i.1 = phi i5 [ 0, %bb0 ], [ %i.2, %loop ]
%AiAddr = getelementptr float* %A, i32 %i.1
for (i=0; i<N; ++i) call void %foo(float %AiAddr, %pair* %P)
foo(A[i], &P);
%i.2 = add i5 %i.1, 1
%tmp = icmp eq i5 %i.1, 16
br i1 %tmp, label %loop, label %outloop
▸ High-level information exposed in the code
– Explicit dataflow through SSA form
– Explicit control-flow graph
– Explicit language-independent type-information
– Explicit typed pointer arithmetic
• Preserve array subscript and structure indexing
source: http://llvm.org 3
Agenda
▸ Unconstrained scheduling
– ASAP and ALAP
▸ Constrained scheduling
– Resource constrained scheduling (RCS)
– Exact formulations with integer linear programming (ILP)
4
Recap: A Typical HLS Flow
if (condition) {
…
High-level Programming } else {
t1 = a + b;
Languages
t2 = c * d;
(C/C++, OpenCL, SystemC, ...) t3 = e + f;
t4 = t1 * t2;
z = t4 – t3;
}
+ *
+
Parsing
*
Intermediate BB1
Representation (IR) -
T F
Transformations BB2 BB3
BB4 Control data flow graph
(CDFG)
Allocation
S0
S0 d
b a
Scheduling Binding S1
S1
– *
S2 S2
z
RTL
generation 3 cycles
Finite state machines with datapath
5
Importance of Scheduling
▸ Scheduling is a central problem in HLS
– Introduces clock boundaries to untimed (or partially timed)
input specification
– Has significant impact on the quality of results
• Frequency
• Latency
• Throughput
• Area
• Power
…
6
Scheduling: Untimed to Timed
Untimed Combinational Sequential Pipelined
for Latency for Area for Throughput
in1 in2 in3 in4 in1 in2 in3 in4 in1 in2 in3 in4 in
+ +
1 2 3 4
+ + +
REG
+ +
1 2 3
out1 out1 +
Control-Data +
1 2
Flow Graph +
(CDFG) out1
out
out1= f (in1,in2,in3,in4) tclk » 3 * d add tclk ≈ dadd + dsetup tclk » d add + d setup
T1 = 1 / tclk T2 = 1 / (3* tclk ) T3 = 1 / tclk
A1 = 3 * Aadd A2 = Aadd + 2 * Areg A3 = 3 * Aadd + 6 * Areg
7
Scheduling Input
▸ Control data flow graph (CDFG)
– Generated by a compiler front end xl = x+dx;
ul = u-3*x*u*dx-3*y*dx
from a high-level specification
yl = y+u*dx
– Nodes: basic blocks & operations c = xl<a;
– Directed edges: data & control x = xl; u = ul; y = yl;
dependencies
▸ Without control flow, the basic ´ ´ ´ ´ +
structure is a data flow graph ´ ´ + <
(DFG) -
-
8
Scheduling Output
▸ Scheduling: map operations to states
▸ Each clock cycle corresponds to a state in the FSM
– Commonly referred to as control step (c-step)
s1 v1, v2
clk
1 ´ v1 ´ v2 s2 v3, v6, v10
2 ´ v3 ´ v6 + v10 clk
3 - v4 ´ v7 ´ v8 s3 v4, v7, v8
- v5 + v9 < v11 clk
4
s4 v5, v9, v11
DFG State transition diagram
(STG), i.e., FSM
9
Unconstrained Scheduling
▸ Only consideration: dependence
▸ As soon as possible (ASAP)
– Schedule an operation to the earliest possible step
▸ As late as possible (ALAP)
– Schedule an operation to the earliest possible step, without
increasing the total latency
10
ASAP Schedule Assumption for simplicity:
No combinational chaining of
a multiple operations in one cycle
× (each operation occupies the full cycle)
b
+
+
c
−
d
× Y = ((a*b)+c)+(d*e)-(f+g)
e
f
+
g
1 2 3 4
control step
ASAP(G(V, E)): The start time for each
V’ = Topological_Sort(G) operation is the least
foreach vi in V’ : one allowed by the
// Primary inputs (PIs) to first cycle dependencies
if vi Î PIs: ti = 1
// Assume no chaining & single-cycle operations
else: ti = maxjÎpred(i) {tj + 1}; // (vj , vi ) Î E
11
ALAP Schedule Assumption for simplicity:
No combinational chaining of
a multiple operations in one cycle
× (each operation occupies the full cycle)
b
+ +
c
−
d
× Y = ((a*b)+c)+(d*e)-(f+g)
e
f
+
g
1 2 3 4
control step
ALAP(G(V, E), L): // L is the latency bound The end time of each
V’ = Reverse_Topological_Sort(G) operation is the latest
foreach vi in V’ : one allowed by the
// Primary outputs (POs) to last cycle dependencies and the
if vi Î POs: ti = L latency constraint
// Assume no chaining & single-cycle operations
else: ti = minkÎsucc(i) {tk} – 1; // (vi , vk ) Î E
12
Constrained Scheduling in HLS
▸ Constrained scheduling
– General case NP-hard
– Resource-constrained scheduling (RCS)
• Minimize latency given constraints on area or resources
– Time-constrained scheduling (TCS)
• Minimize resources subject to bound on latency
▸ Exact methods
– Integer linear programming (ILP)
– Hu’s algorithm for a very restricted problem
▸ Heuristics
– List scheduling
– Force-directed scheduling
– SDC-based scheduling
…
13
Linear Programming
▸ Linear programming (LP) solves the problem of
maximizing or minimizing a linear objective function
subject to linear constraints
– Efficiently solvable both in theory and in practice
▸ Integer linear programming (ILP): in addition to linear
constraints and objective, the values for the variables
must be integer
– NP-Hard in general (A special case, 0-1 ILP)
– Modern ILP solvers can handle problems with nontrivial size
▸ Enormous number of problems can be expressed in LP
or ILP
14
Canonical Form of ILP
maximize c1x1+c2x2+…+cnxn // objective function
subject to // linear constraints
a11x1+a12x2+…+a1nxn £ b1
a21x1+a22x2+…+a2nxn £ b2
….
am1x1+am2x2+…+amnxn £ bm
xi ≥ 0 x’s are the variables (to be solved);
xi Î Z a’s, b’s, and c’s are constants
Vector form
maximize cTx // c = (c1, c2, …, cn), x = (x1, x2, …, xn)
subject to (s.t.)
Ax ≤ b // A is a mxn matrix; b = (b1, b2, …, bn)
x≥ 0
xi Î Z
15
Example: Course Selection Problem
▸ A student is about to finalize course selection for the
coming semester, given the following requirement
– Minimum credits per semester: 8
Schedule Credits Est. workload
(per week)
1. Metaverse MW 2:00-3:30pm 3 8 hrs
2. How to start a start-up TT 2:00-3:00pm 2 4 hrs
3. Linear programming (LP) MW 9:00-11:00am 4 10 hrs
4. Analog circuits TT 1:00-3:00pm 4 12 hrs
Question: Which courses should this student choose to
minimize workload?
16
ILP Formulation for Course Selection
Time CRs Work
▸ Define decision variables 1. Metaverse MW 2- 3 8 hrs
(i = 1, 2, 3, 4): 3:30pm
2. Start-up TT 2-3pm 2 4 hrs
1 if course i is taken
xi 3. LP MW 9-
11am
4 10 hrs
0 otherwise 4. Analog TT 1-3pm 4 12 hrs
▸ Total expected work hours: 8x1+4x2+10x3+12x4
▸ Total credits taken: 3x1+2x2+4x3+4x4
▸ Account for the schedule conflict: x2+x4 ≤ 1
▸ Complete ILP formulation (in canonical form):
minimize 8x1+4x2+10x3+12x4
s.t. 3x1+2x2+4x3+4x4 ≥ 8
x2+x4 ≤ 1
xi Î {0,1} 17
Resource Constrained Scheduling (RCS)
▸ When functional units are limited
– Each functional unit can only perform one operation at each
clock cycle
• e.g., if there are only K adders, no more than K additions can
be executed in the same c-step
▸ A typical resource-constrained scheduling problem
for DFG
– Given the number of functional units of each type, minimize
latency (in cycles)
– NP-hard
18
ILP Formulation of RCS: Binary Variables
▸Binary decision variables 𝑥!,#
– 𝑥!,# = 1 if operation 𝑖 starts at step 𝑘, otherwise = 0
• 1 ≤ 𝑖 ≤ 𝑁, 1 ≤ 𝑘 ≤ 𝐿
• 𝑁 is the total number of operations
• 𝐿 is the given upper bound on latency
19
ILP Formulation of RCS: Derived Variables
▸Binary decision variables 𝑥!,#
– 𝑥!,# = 1 if operation 𝑖 starts at step 𝑘, otherwise = 0
• 1 ≤ 𝑖 ≤ 𝑁, 1 ≤ 𝑘 ≤ 𝐿
• 𝑁 is the total number of operations
• 𝐿 is the given upper bound on latency
▸Derived integer variables 𝑡!
&
𝑡! = $ 𝑘 & 𝑥!,#
#$%
𝑡! indicates the actual start time of operation 𝑖
20
ILP Formulation of RCS: Constraints (1)
▸ Unique start times: an operation must start at one and
only one of the available steps
∀𝑖 ∈ 1, 𝑁 ∶ / 𝑥!,# = 1
#
▸ Dependence must be satisfied (assuming no chaining)
∀ 𝑖, 𝑗 ∈ 𝐸 ∶ 𝑡$ ≥ 𝑡! + 𝑑! → / 𝑘 6 𝑥$,# ≥ / 𝑘 6 𝑥!,# + 𝑑!
# #
Operation 𝑗 must not start before 𝑖 completes if 𝑗 depends on 𝑖
𝑑! : latency of operation 𝑖
• 𝑑! = 1 means a single-cycle operation
• 𝑑! > 1 indicates a multi-cycle operation
21
Start Time vs. Active Time(s)
▸ When 𝑑! = 1, the following are the same
– Does operation i start at step k?
– Is operation i actively running at step k?
– Same equality check: if xik is 1
▸ When di > 1, the following questions are different
– Does operation 𝑖 start at step 𝑘? Simply check if xik is 1
– Is operation 𝑖 active at step 𝑘? Check if the following holds
#
?
/ 𝑥!,% = 1
%&#'(! )*
22
Is Operation 𝒊 Still Active at Step 𝒌 ?
▸ Is operation 9 active (running) at step 6? assuming d9 = 3
$ ?
if and only if x9,6 + x9,5 + x9,4 equals 1 , 𝑥)," =1
"#$%&'(
4 4 4
5 5 5 v9
6 v9 6 v9 6
x9,6=1 x9,5=1 x9,4=1
▸ Notes
– Only one (if any) of the above three cases can happen
– To meet resource constraints, we must check the same equality
for ALL steps, and ALL operations of that type
23
ILP Formulation of RCS: Constraints (2)
▸ Physical resource limits
– 𝑅 denotes the total number of different resource types (RT)
– 𝑅𝑇 𝑖 ∈ [1, 𝑅] is the resource type of operation 𝑖
– 𝑎+ is the number of physically available resources of type 𝑟
At any step 𝑘, the total number of active operations of the same
type 𝑟 must not exceed 𝑎+ , the number of physically available
resources for type 𝑟
#
∀𝑘 ∈ 1, 𝐿 , ∀𝑟 ∈ 1, 𝑅 ∶ / / 𝑥!,% ≤ 𝑎7
!:-. ! &+ %&#'(! )*
Sum over operations If operation 𝑖 is active
of resource 𝑟 at step 𝑘
The above summation counts the number of
operations active at step k that use resource r
24
ILP Formulation of RCS: Putting It Together
▸ Unique start times
∀𝑖 ∈ 1, 𝑁 ∶ / 𝑥!,# = 1
#
▸ Dependence must be satisfied (assuming no chaining)
∀ 𝑖, 𝑗 ∈ 𝐸 ∶ 𝑡$ ≥ 𝑡! + 𝑑! → / 𝑘 6 𝑥$,# ≥ / 𝑘 6 𝑥!,# + 𝑑!
# #
▸ Physical resource limits
#
∀𝑘 ∈ 1, 𝐿 , ∀𝑟 ∈ 1, 𝑅 ∶ / / 𝑥!,% ≤ 𝑎7
!:-. ! &+ %&#'(! )*
25
ILP Formulation of RCS: Objective Function
▸ For simplicity, we introduce a pseudo node vsink to serve
as a unique sink of the DFG
– This node depends on all the original primary output nodes
▸ To minimize the overall latency, we simply minimize the
start time of the sink node
´ v1 ´ v2
𝒎𝒊𝒏 𝑡0!1#
´ v3 ´ v6 + v10
- v4 ´ v7 ´ v8
- v5 + v9 < v11 /
𝒎𝒊𝒏 / 𝑘 6 𝑥0!1#,#
#&*
sink
vsink
26
Use of ASAP and ALAP
▸ In general, the following helps the ILP solver run faster
– Minimize # of variables and constraints
– Simplify the constraints
▸ We can write the ILP without ASAP/ALAP, but using
ASAP and ALAP can simplify the constraints
1 1 ´ v1 ´ v2
´ v1 ´ v2 ´ v6 ´ v8 + v10
2 ´ v3 ´ v7 + v9 < v11 2 ´ v3 ´ v6
3 - v4 3 - v4 ´ v7 ´ v8 + v10
4 - v5 4 - v5 + v9 < v11
ASAP schedule ALAP schedule
27
ILP Formulation: Unique Start Time Constraints
xil = 0 for l < tiS and l > tiL
(tiS = ASAP(vi ), tiL = ALAP (vi ))
▸ Without using ASAP and ALAP ▸ Using ASAP and ALAP
𝑥(,( + 𝑥(,,+𝑥(,&+𝑥(,- = 1 𝑥(,( = 1
𝑥,,( + 𝑥,,,+𝑥,,&+𝑥,,- = 1 𝑥,,( = 1
… …
𝑥$,( + 𝑥$,,+𝑥$,&+𝑥$,- = 1 𝑥$,( + 𝑥$,, = 1
… …
𝑥),( + 𝑥),,+𝑥),&+𝑥),- = 1 𝑥),,+𝑥),&+𝑥),- = 1
… …
1 ´ v1 ´ v2 ´ v6 ´ v8 + v10 1 ´ v1 ´ v2
2 ´ v3 ´ v7 + v9 < v11 2 ´ v3 ´ v6
assume L=4
3 - v4 3 - v4 ´ v7 ´ v8 + v10
4 - v5 4 - v5 + v9 < v11
28
ASAP schedule ALAP schedule
ILP Formulation: Dependence Constraints
▸ Using ASAP and ALAP, the non-trivial inequalities are:
(assuming no chaining and single-cycle ops)
2𝑥2,3 + 3𝑥2,4 − 𝑥5,* − 2𝑥5,3 ≥ 1
4𝑥6,7 − 2𝑥2,3 − 3𝑥2,4 ≥ 1
2𝑥8,3 + 3𝑥8,4 + 4𝑥8,7 − 𝑥9,* − 2𝑥9,3 − 3𝑥9,4 ≥ 1
2𝑥**,3 + 3𝑥**,4 + 4𝑥**,7 − 𝑥*:,* − 2𝑥*:,3 − 3𝑥*:,4 ≥ 1
1 ´ v1 ´ v2 ´ v6 ´ v8 + v10 1 ´ v1 ´ v2
2 ´ v3 ´ v7 + v9 < v11 2 ´ v3 ´ v6
assume L=4
3 - v4 3 - v4 ´ v7 ´ v8 + v10
4 - v5 4 - v5 + v9 < v11
29
ASAP schedule ALAP schedule
ILP Formulation: Resource Constraints
▸ Resource constraints (assuming 2 multipliers and 1 ALU)
𝑥*,* + 𝑥3,*+𝑥5,*+𝑥9,* ≤ 2
𝑥4,3 + 𝑥5,3+𝑥2,3+𝑥9,3 ≤ 2
𝑥2,4+𝑥9,4 ≤ 2
𝑥*:,* ≤ 1
𝑥8,3+𝑥*:,3+𝑥**,3 ≤ 1
𝑥7,4+ 𝑥8,4+𝑥*:,4+𝑥**,4 ≤ 1
𝑥6,7+𝑥8,7+𝑥**,7 ≤ 1
1 ´ v1 ´ v2 ´ v6 ´ v8 + v10 1 ´ v1 ´ v2
2 ´ v3 ´ v7 + v9 < v11 2 ´ v3 ´ v6
assume L=4
3 - v4 3 - v4 ´ v7 ´ v8 + v10
4 - v5 4 - v5 + v9 < v11
30
ASAP schedule ALAP schedule
ILP Summary
▸ Pros: versatile modeling ability
– Can be extended to handle almost every design aspect
• Resource allocation
• Module selection
• Area, power, etc.
▸ Cons: computationally expensive
– #variables = O(#nodes * #steps)
– 0-1 variables: need extensive search to find optimal solution
31
Next Lecture
▸More scheduling algorithms
32
Acknowledgements
▸These slides contain/adapt materials developed
by
– Ryan Kastner (UCSD)
33