0% found this document useful (0 votes)
8 views34 pages

Lecture10 - High-Level Digital Design Automation

The document outlines the course ECE 6775 on High-Level Digital Design Automation for Fall 2024, including scheduling techniques such as ASAP and ALAP, and introduces LLVM as a foundational intermediate representation for compilers. It discusses the importance of scheduling in high-level synthesis (HLS) and covers both unconstrained and constrained scheduling methods, including resource-constrained scheduling (RCS) and integer linear programming (ILP) formulations. Additionally, it provides examples of scheduling input and output, emphasizing the significance of optimizing resource usage and minimizing latency in digital design automation.

Uploaded by

leprelepre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views34 pages

Lecture10 - High-Level Digital Design Automation

The document outlines the course ECE 6775 on High-Level Digital Design Automation for Fall 2024, including scheduling techniques such as ASAP and ALAP, and introduces LLVM as a foundational intermediate representation for compilers. It discusses the importance of scheduling in high-level synthesis (HLS) and covers both unconstrained and constrained scheduling methods, including resource-constrained scheduling (RCS) and integer linear programming (ILP) formulations. Additionally, it provides examples of scheduling input and output, emphasizing the significance of optimizing resource usage and minimizing latency in digital design automation.

Uploaded by

leprelepre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

ECE 6775

High-Level Digital Design Automation


Fall 2024

Scheduling
Announcements

▸Lab 2 due tomorrow

▸Lab 3 will be released soon

▸Tuesday lecture (Oct 1) rescheduled to Friday,


Oct 4 at 4:30pm, Rhodes 310

▸Neural network tutorial on Thursday (Oct 3) by


Jordan Dotzel
– Useful for Lab 4 and final project

1
LLVM: A Foundational IR for Modern Compilers
▸ Formerly Low Level Virtual Machine
– Brainchild of Chris Lattner and Vikram Adve back in 2000
– ACM Software System Award in 2012
– LLVM is widely used in industry and academia (including the HLS tool in
our class)

▸ The core of LLVM is the SSA-base IR


– Language independent, target independent, easy to use
– RISC-like virtual instructions, unlimited registers, exception handling, etc.

▸ Provides modular & reusable components for building compilers


– Components are ideally language/target independent
– Allows choice of the right component for the job
– Many high-quality libraries (components) with clean interfaces
• Optimizations, analyses, modular code generator, profiling, link time optimization,
ARM/X86/PPC/SPARC code generator …
• Tools built from the libraries: C/C++/ObjC compiler, modular optimizer, linker, debugger,
LLVM JIT …

2
Example: An LLVM Loop
loop:
%i.1 = phi i5 [ 0, %bb0 ], [ %i.2, %loop ]
%AiAddr = getelementptr float* %A, i32 %i.1
for (i=0; i<N; ++i) call void %foo(float %AiAddr, %pair* %P)
foo(A[i], &P);
%i.2 = add i5 %i.1, 1
%tmp = icmp eq i5 %i.1, 16
br i1 %tmp, label %loop, label %outloop

▸ High-level information exposed in the code


– Explicit dataflow through SSA form
– Explicit control-flow graph
– Explicit language-independent type-information
– Explicit typed pointer arithmetic
• Preserve array subscript and structure indexing

source: http://llvm.org 3
Agenda

▸ Unconstrained scheduling
– ASAP and ALAP

▸ Constrained scheduling
– Resource constrained scheduling (RCS)
– Exact formulations with integer linear programming (ILP)

4
Recap: A Typical HLS Flow
if (condition) {

High-level Programming } else {
t1 = a + b;
Languages
t2 = c * d;
(C/C++, OpenCL, SystemC, ...) t3 = e + f;
t4 = t1 * t2;
z = t4 – t3;
}
+ *
+
Parsing
*
Intermediate BB1
Representation (IR) -
T F
Transformations BB2 BB3

BB4 Control data flow graph


(CDFG)
Allocation

S0
S0 d
b a
Scheduling Binding S1
S1
– *
S2 S2
z
RTL
generation 3 cycles
Finite state machines with datapath
5
Importance of Scheduling

▸ Scheduling is a central problem in HLS


– Introduces clock boundaries to untimed (or partially timed)
input specification

– Has significant impact on the quality of results


• Frequency
• Latency
• Throughput
• Area
• Power

6
Scheduling: Untimed to Timed

Untimed Combinational Sequential Pipelined


for Latency for Area for Throughput
in1 in2 in3 in4 in1 in2 in3 in4 in1 in2 in3 in4 in

+ +
1 2 3 4
+ + +
REG
+ +
1 2 3

out1 out1 +

Control-Data +
1 2

Flow Graph +
(CDFG) out1
out

out1= f (in1,in2,in3,in4) tclk » 3 * d add tclk ≈ dadd + dsetup tclk » d add + d setup
T1 = 1 / tclk T2 = 1 / (3* tclk ) T3 = 1 / tclk
A1 = 3 * Aadd A2 = Aadd + 2 * Areg A3 = 3 * Aadd + 6 * Areg

7
Scheduling Input

▸ Control data flow graph (CDFG)


– Generated by a compiler front end xl = x+dx;
ul = u-3*x*u*dx-3*y*dx
from a high-level specification
yl = y+u*dx
– Nodes: basic blocks & operations c = xl<a;
– Directed edges: data & control x = xl; u = ul; y = yl;
dependencies

▸ Without control flow, the basic ´ ´ ´ ´ +


structure is a data flow graph ´ ´ + <
(DFG) -
-

8
Scheduling Output

▸ Scheduling: map operations to states


▸ Each clock cycle corresponds to a state in the FSM
– Commonly referred to as control step (c-step)

s1 v1, v2
clk
1 ´ v1 ´ v2 s2 v3, v6, v10
2 ´ v3 ´ v6 + v10 clk

3 - v4 ´ v7 ´ v8 s3 v4, v7, v8
- v5 + v9 < v11 clk
4
s4 v5, v9, v11
DFG State transition diagram
(STG), i.e., FSM
9
Unconstrained Scheduling

▸ Only consideration: dependence

▸ As soon as possible (ASAP)


– Schedule an operation to the earliest possible step

▸ As late as possible (ALAP)


– Schedule an operation to the earliest possible step, without
increasing the total latency

10
ASAP Schedule Assumption for simplicity:
No combinational chaining of
a multiple operations in one cycle
× (each operation occupies the full cycle)
b
+
+
c

d
× Y = ((a*b)+c)+(d*e)-(f+g)
e
f
+
g
1 2 3 4
control step

ASAP(G(V, E)): The start time for each


V’ = Topological_Sort(G) operation is the least
foreach vi in V’ : one allowed by the
// Primary inputs (PIs) to first cycle dependencies
if vi Î PIs: ti = 1
// Assume no chaining & single-cycle operations
else: ti = maxjÎpred(i) {tj + 1}; // (vj , vi ) Î E
11
ALAP Schedule Assumption for simplicity:
No combinational chaining of
a multiple operations in one cycle
× (each operation occupies the full cycle)
b
+ +
c

d
× Y = ((a*b)+c)+(d*e)-(f+g)
e
f
+
g
1 2 3 4
control step

ALAP(G(V, E), L): // L is the latency bound The end time of each
V’ = Reverse_Topological_Sort(G) operation is the latest
foreach vi in V’ : one allowed by the
// Primary outputs (POs) to last cycle dependencies and the
if vi Î POs: ti = L latency constraint
// Assume no chaining & single-cycle operations
else: ti = minkÎsucc(i) {tk} – 1; // (vi , vk ) Î E
12
Constrained Scheduling in HLS

▸ Constrained scheduling
– General case NP-hard
– Resource-constrained scheduling (RCS)
• Minimize latency given constraints on area or resources
– Time-constrained scheduling (TCS)
• Minimize resources subject to bound on latency

▸ Exact methods
– Integer linear programming (ILP)
– Hu’s algorithm for a very restricted problem

▸ Heuristics
– List scheduling
– Force-directed scheduling
– SDC-based scheduling

13
Linear Programming

▸ Linear programming (LP) solves the problem of


maximizing or minimizing a linear objective function
subject to linear constraints
– Efficiently solvable both in theory and in practice

▸ Integer linear programming (ILP): in addition to linear


constraints and objective, the values for the variables
must be integer
– NP-Hard in general (A special case, 0-1 ILP)
– Modern ILP solvers can handle problems with nontrivial size

▸ Enormous number of problems can be expressed in LP


or ILP

14
Canonical Form of ILP
maximize c1x1+c2x2+…+cnxn // objective function
subject to // linear constraints
a11x1+a12x2+…+a1nxn £ b1
a21x1+a22x2+…+a2nxn £ b2
….
am1x1+am2x2+…+amnxn £ bm
xi ≥ 0 x’s are the variables (to be solved);
xi Î Z a’s, b’s, and c’s are constants

Vector form
maximize cTx // c = (c1, c2, …, cn), x = (x1, x2, …, xn)
subject to (s.t.)
Ax ≤ b // A is a mxn matrix; b = (b1, b2, …, bn)
x≥ 0
xi Î Z
15
Example: Course Selection Problem
▸ A student is about to finalize course selection for the
coming semester, given the following requirement
– Minimum credits per semester: 8

Schedule Credits Est. workload


(per week)
1. Metaverse MW 2:00-3:30pm 3 8 hrs

2. How to start a start-up TT 2:00-3:00pm 2 4 hrs

3. Linear programming (LP) MW 9:00-11:00am 4 10 hrs

4. Analog circuits TT 1:00-3:00pm 4 12 hrs

Question: Which courses should this student choose to


minimize workload?

16
ILP Formulation for Course Selection
Time CRs Work

▸ Define decision variables 1. Metaverse MW 2- 3 8 hrs


(i = 1, 2, 3, 4): 3:30pm
2. Start-up TT 2-3pm 2 4 hrs
1 if course i is taken
xi 3. LP MW 9-
11am
4 10 hrs

0 otherwise 4. Analog TT 1-3pm 4 12 hrs

▸ Total expected work hours: 8x1+4x2+10x3+12x4


▸ Total credits taken: 3x1+2x2+4x3+4x4
▸ Account for the schedule conflict: x2+x4 ≤ 1

▸ Complete ILP formulation (in canonical form):


minimize 8x1+4x2+10x3+12x4
s.t. 3x1+2x2+4x3+4x4 ≥ 8
x2+x4 ≤ 1
xi Î {0,1} 17
Resource Constrained Scheduling (RCS)

▸ When functional units are limited


– Each functional unit can only perform one operation at each
clock cycle
• e.g., if there are only K adders, no more than K additions can
be executed in the same c-step

▸ A typical resource-constrained scheduling problem


for DFG
– Given the number of functional units of each type, minimize
latency (in cycles)
– NP-hard

18
ILP Formulation of RCS: Binary Variables

▸Binary decision variables 𝑥!,#


– 𝑥!,# = 1 if operation 𝑖 starts at step 𝑘, otherwise = 0
• 1 ≤ 𝑖 ≤ 𝑁, 1 ≤ 𝑘 ≤ 𝐿
• 𝑁 is the total number of operations
• 𝐿 is the given upper bound on latency

19
ILP Formulation of RCS: Derived Variables

▸Binary decision variables 𝑥!,#


– 𝑥!,# = 1 if operation 𝑖 starts at step 𝑘, otherwise = 0
• 1 ≤ 𝑖 ≤ 𝑁, 1 ≤ 𝑘 ≤ 𝐿
• 𝑁 is the total number of operations
• 𝐿 is the given upper bound on latency

▸Derived integer variables 𝑡!


&
𝑡! = $ 𝑘 & 𝑥!,#
#$%

𝑡! indicates the actual start time of operation 𝑖


20
ILP Formulation of RCS: Constraints (1)
▸ Unique start times: an operation must start at one and
only one of the available steps

∀𝑖 ∈ 1, 𝑁 ∶ / 𝑥!,# = 1
#

▸ Dependence must be satisfied (assuming no chaining)


∀ 𝑖, 𝑗 ∈ 𝐸 ∶ 𝑡$ ≥ 𝑡! + 𝑑! → / 𝑘 6 𝑥$,# ≥ / 𝑘 6 𝑥!,# + 𝑑!
# #
Operation 𝑗 must not start before 𝑖 completes if 𝑗 depends on 𝑖

𝑑! : latency of operation 𝑖
• 𝑑! = 1 means a single-cycle operation
• 𝑑! > 1 indicates a multi-cycle operation

21
Start Time vs. Active Time(s)
▸ When 𝑑! = 1, the following are the same
– Does operation i start at step k?
– Is operation i actively running at step k?
– Same equality check: if xik is 1

▸ When di > 1, the following questions are different


– Does operation 𝑖 start at step 𝑘? Simply check if xik is 1
– Is operation 𝑖 active at step 𝑘? Check if the following holds
#
?
/ 𝑥!,% = 1
%&#'(! )*

22
Is Operation 𝒊 Still Active at Step 𝒌 ?

▸ Is operation 9 active (running) at step 6? assuming d9 = 3


$ ?
if and only if x9,6 + x9,5 + x9,4 equals 1 , 𝑥)," =1
"#$%&'(

4 4 4
5 5 5 v9
6 v9 6 v9 6

x9,6=1 x9,5=1 x9,4=1

▸ Notes
– Only one (if any) of the above three cases can happen
– To meet resource constraints, we must check the same equality
for ALL steps, and ALL operations of that type
23
ILP Formulation of RCS: Constraints (2)
▸ Physical resource limits
– 𝑅 denotes the total number of different resource types (RT)
– 𝑅𝑇 𝑖 ∈ [1, 𝑅] is the resource type of operation 𝑖
– 𝑎+ is the number of physically available resources of type 𝑟
At any step 𝑘, the total number of active operations of the same
type 𝑟 must not exceed 𝑎+ , the number of physically available
resources for type 𝑟
#
∀𝑘 ∈ 1, 𝐿 , ∀𝑟 ∈ 1, 𝑅 ∶ / / 𝑥!,% ≤ 𝑎7
!:-. ! &+ %&#'(! )*

Sum over operations If operation 𝑖 is active


of resource 𝑟 at step 𝑘

The above summation counts the number of


operations active at step k that use resource r
24
ILP Formulation of RCS: Putting It Together
▸ Unique start times

∀𝑖 ∈ 1, 𝑁 ∶ / 𝑥!,# = 1
#

▸ Dependence must be satisfied (assuming no chaining)

∀ 𝑖, 𝑗 ∈ 𝐸 ∶ 𝑡$ ≥ 𝑡! + 𝑑! → / 𝑘 6 𝑥$,# ≥ / 𝑘 6 𝑥!,# + 𝑑!
# #

▸ Physical resource limits


#
∀𝑘 ∈ 1, 𝐿 , ∀𝑟 ∈ 1, 𝑅 ∶ / / 𝑥!,% ≤ 𝑎7
!:-. ! &+ %&#'(! )*

25
ILP Formulation of RCS: Objective Function
▸ For simplicity, we introduce a pseudo node vsink to serve
as a unique sink of the DFG
– This node depends on all the original primary output nodes

▸ To minimize the overall latency, we simply minimize the


start time of the sink node

´ v1 ´ v2
𝒎𝒊𝒏 𝑡0!1#
´ v3 ´ v6 + v10
- v4 ´ v7 ´ v8
- v5 + v9 < v11 /
𝒎𝒊𝒏 / 𝑘 6 𝑥0!1#,#
#&*
sink
vsink

26
Use of ASAP and ALAP

▸ In general, the following helps the ILP solver run faster


– Minimize # of variables and constraints
– Simplify the constraints

▸ We can write the ILP without ASAP/ALAP, but using


ASAP and ALAP can simplify the constraints

1 1 ´ v1 ´ v2
´ v1 ´ v2 ´ v6 ´ v8 + v10

2 ´ v3 ´ v7 + v9 < v11 2 ´ v3 ´ v6

3 - v4 3 - v4 ´ v7 ´ v8 + v10

4 - v5 4 - v5 + v9 < v11

ASAP schedule ALAP schedule


27
ILP Formulation: Unique Start Time Constraints
xil = 0 for l < tiS and l > tiL
(tiS = ASAP(vi ), tiL = ALAP (vi ))

▸ Without using ASAP and ALAP ▸ Using ASAP and ALAP


𝑥(,( + 𝑥(,,+𝑥(,&+𝑥(,- = 1 𝑥(,( = 1
𝑥,,( + 𝑥,,,+𝑥,,&+𝑥,,- = 1 𝑥,,( = 1
… …
𝑥$,( + 𝑥$,,+𝑥$,&+𝑥$,- = 1 𝑥$,( + 𝑥$,, = 1
… …
𝑥),( + 𝑥),,+𝑥),&+𝑥),- = 1 𝑥),,+𝑥),&+𝑥),- = 1
… …

1 ´ v1 ´ v2 ´ v6 ´ v8 + v10 1 ´ v1 ´ v2
2 ´ v3 ´ v7 + v9 < v11 2 ´ v3 ´ v6
assume L=4
3 - v4 3 - v4 ´ v7 ´ v8 + v10
4 - v5 4 - v5 + v9 < v11
28
ASAP schedule ALAP schedule
ILP Formulation: Dependence Constraints
▸ Using ASAP and ALAP, the non-trivial inequalities are:
(assuming no chaining and single-cycle ops)

2𝑥2,3 + 3𝑥2,4 − 𝑥5,* − 2𝑥5,3 ≥ 1


4𝑥6,7 − 2𝑥2,3 − 3𝑥2,4 ≥ 1
2𝑥8,3 + 3𝑥8,4 + 4𝑥8,7 − 𝑥9,* − 2𝑥9,3 − 3𝑥9,4 ≥ 1
2𝑥**,3 + 3𝑥**,4 + 4𝑥**,7 − 𝑥*:,* − 2𝑥*:,3 − 3𝑥*:,4 ≥ 1

1 ´ v1 ´ v2 ´ v6 ´ v8 + v10 1 ´ v1 ´ v2
2 ´ v3 ´ v7 + v9 < v11 2 ´ v3 ´ v6
assume L=4
3 - v4 3 - v4 ´ v7 ´ v8 + v10
4 - v5 4 - v5 + v9 < v11
29
ASAP schedule ALAP schedule
ILP Formulation: Resource Constraints
▸ Resource constraints (assuming 2 multipliers and 1 ALU)
𝑥*,* + 𝑥3,*+𝑥5,*+𝑥9,* ≤ 2
𝑥4,3 + 𝑥5,3+𝑥2,3+𝑥9,3 ≤ 2
𝑥2,4+𝑥9,4 ≤ 2

𝑥*:,* ≤ 1
𝑥8,3+𝑥*:,3+𝑥**,3 ≤ 1
𝑥7,4+ 𝑥8,4+𝑥*:,4+𝑥**,4 ≤ 1
𝑥6,7+𝑥8,7+𝑥**,7 ≤ 1

1 ´ v1 ´ v2 ´ v6 ´ v8 + v10 1 ´ v1 ´ v2
2 ´ v3 ´ v7 + v9 < v11 2 ´ v3 ´ v6
assume L=4
3 - v4 3 - v4 ´ v7 ´ v8 + v10
4 - v5 4 - v5 + v9 < v11
30
ASAP schedule ALAP schedule
ILP Summary

▸ Pros: versatile modeling ability


– Can be extended to handle almost every design aspect
• Resource allocation
• Module selection
• Area, power, etc.

▸ Cons: computationally expensive


– #variables = O(#nodes * #steps)
– 0-1 variables: need extensive search to find optimal solution

31
Next Lecture

▸More scheduling algorithms

32
Acknowledgements

▸These slides contain/adapt materials developed


by
– Ryan Kastner (UCSD)

33

You might also like