York University
CSE
CSE4210 Architecture and Hardware for DSP
Chapter 5 Unfolding
York University
CSE
Unfolding
Unfolding is a transformation technique to change the program into another program such that one iteration in the new program describes more than one iteration in the original program. Unfolding, AKA loop unrolling in CSE4201. Unfolding factor of j means that one iteration in the new program describes j iterations in the old one.
York University
CSE
Unfolding
Also used to design bit parallel and word parallel architectures from bit serial and word serial architecture.
y (n) = ay (n 9) + x(n) y (2k ) = ay (2k 9) + x(2k ) y (2k + 1) = ay (2k 8) + x(2k + 1)
y (2k ) = ay (2k 9) + x(2k ) = ay (2(k 5) + 1) + x(2k ) y (2k + 1) = ay (2k 8) + x(2k + 1) = ay (2(k 4) + 0) + x(2k + 1)
York University
CSE
unfolding
y (n) = ay (n 9) + x(n)
y (2k ) = ay (2(k 5) + 1) + x(2k ) y (2k + 1) = ay (2(k 4) + 0) + x(2k + 1)
York University
CSE
Unfolding
In the unfolded system, each delay is Jslow. That means if the input to a delay element is the signal x(kJ+m), the output is x((k-1)J+m) = x(kJ-J+m)
York University
CSE
Algorithm for Unfolding
For each node U in the original DFG, draw the J nodes U0, U1, UJ-1 For each edge U V with w delays in the original DFG, draw the J edges Ui i + w V(i+w)%J with delays for i = 0,1,..,J-1 J For the input nodes, Ai corresponds to input signal x(jk+i) For J>w, an edge with w delays will result in J-w edges with zero delay and w edges with 1 delay
York University
CSE
Examples
U V
York University
CSE
Example
U V
York University
CSE
Unfolding
Prove that the unfolded graph preserves dependence of the DSP program
York University
CSE
Properties
Unfolding preserves the number of delays in a DFG w + w + 1 + ... + w + J 1 = w
J J J
York University
CSE
Properties
J-unfolding of a loop with wl delays in the original DFG leads to
gcd(wl , J) loops in the unfolded DFG each loop in J-unfolded DFG contains J/gcd(wl ,J) copies of each node that appears in loop l each loop in J-unfolded DFG contains wl /gcd(wl ,J) delays
York University
CSE
Properties
Unfolding a DFG with iteration bound T results in a J-unfolded DFG with iteration bound T = JT
York University
CSE
Properties
Property 5.4.1
Consider a path with w delays in the original DFG. J-unfolding of this path leads to (J-w) paths with no delays and w path with 1 delay each, when w<J Any path in the original DFG containing J or more delays leads to J paths with 1 or more delays in each path. A path in the original DFG with J or more delays cannot create a critical path in the J-unfolded DFG.
Corollary 5.4.1
York University
CSE
Properties
Any feasible clock cycle period that can be obtained by retiming the J-unfolded DFG, GJ, can be achieved by retiming the original DFG, G, directly and then unfolding it by unfolding factor J. i.e. (Gu)r = (Gr)u Proof:
York University
CSE
Applications
Unfolding can be used in
Sample period reduction Parallel processing
York University
CSE
Sample Period Reduction
Any implementation of a DSP program can never achieve an iteration period less than T Some times we can not achieve that lower bound (2 reasons)
When one node has a computation time greater than T (can not be split) T is a fraction
York University
CSE
Example
York University
CSE
Example
York University
CSE
Word Level Parallel Processing
We start with a DSP at the word level We can use unfolding to replicate the design J times (J unfolding). Example
York University
CSE
10
York University
CSE
Bit Level Parallel Processing
Bit level parallel processing can increase the speed (reduce the sample time) be processing more than one bit at a time. Digit serial parallel processing is when we process W bits at a time where W is the word length. Usually involves switching (multiplexers)
York University
CSE
Bit Serial Adder
Consider the following adder, W=4
11
York University
CSE
Edges with Switches
Assumptions
The word-length W is a multiple of the unfolding factor J, i.e., W=WJ All edges into and out of the switches have no delays
Write the switching instance as
u Wl + u = J (W ' l + ) + (u % J ) J
Step2: Draw an edge with no delays in the unfolded graph from the node Uu%J to the node Vu%J, which is switched at time instance (W ' l + u ) .
York University
CSE
Example
12
York University
CSE
Example
York University
CSE
ExampleBinary adder
J=4
J=2
13
York University
CSE
Example Binary adder
14