Register-Transfer Level (RTL) Design
Recall
Chapter 2: Combinational Logic Design
First step: Capture behavior (using equation or truth table) Remaining steps: Convert to circuit
Chapter 3: Sequential Logic Design
First step: Capture behavior (using FSM) Remaining steps: Convert to circuit
Capture behavior
RTL Design (the method for creating custom processors)
First step: Capture behavior (using highlevel state machine, to be introduced) Remaining steps: Convert to circuit
Convert to circuit
RTL Design Method
Step 1: Laser-Based Distance Measurer
T (in seconds) laser D Object of interest 2D = T sec * 3*108 m/sec
sensor
Example of how to create a high-level state machine to describe desired processor behavior Laser-based distance measurement pulse laser, measure time T to sense reflection
Laser light travels at speed of light, 3*108 m/sec Distance is thus D = T sec * 3*108 m/sec / 2
3
Step 1: Laser-Based Distance Measurer
T (in seconds) laser from button B L to laser
sensor
D to display
16
Laser-based distance measurer
S from sensor
Inputs/outputs
B: bit input, from button to begin measurement L: bit output, activates laser S: bit input, senses laser reflection D: 16-bit output, displays computed distance
Step 1: Laser-Based Distance Measurer
from button B L Laserbased distance measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
to display D 16
to laser
from sensor
S0
a
L = 0 (laser off) D = 0 (distance = 0)
Step 1: Create high-level state machine Begin by declaring inputs and outputs Create initial state, name it S0
Initialize laser to off (L=0) Initialize displayed distance to 0 (D=0)
Step 1: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) B (button not pressed)
a
from button B
L Laserbased distance measurer
to laser
to display
16
from sensor
S0 L=0 D=0
S1 B (button pressed)
Add another state, call S1, that waits for a button press
B stay in S1, keep waiting B go to a new state S2
Q: What should S2 do?
A: Turn on the laser
a
Step 1: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) B
from button B L Laserbased distance measurer to laser to display D 16 S from sensor
S0 L=0 D=0
S1
S2 L=1 (laser on)
S3 L=0 (laser off)
a
Add a state S2 that turns on the laser (L=1) Then turn off laser (L=0) in a state S3 Q: What do next? A: Start timer, wait to sense reflection
a
Step 1: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits)
to displ ay from but ton B Lase r-based distan ce measu rer L to laser D 16 S from sensor
S (no reflection) S (reflection) ?
a
S0 L=0 D=0
S1
S2 L=1
S3
Dctr = 0 (reset cycle count)
L=0 Dctr = Dctr + 1 (count cycles)
Stay in S3 until sense reflection (S) To measure time, count cycles for which we are in S3
To count, declare local register Dctr Increment Dctr each cycle in S3 Initialize Dctr to 0 in S1. S2 would have been O.K. too
8
Step 1: Laser-Based Distance Measurer
from but ton B Lase r-based distan ce measu rer L to laser
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) B S0 L=0 D=0 S1 Dctr = 0 S2 L=1 S
to displ ay
16
from sensor
S3
S4
L=0 D = Dctr / 2 Dctr = Dctr + 1 (calculate D)
Once reflection detected (S), go to new state S4
Calculate distance Assuming clock frequency is 3x108, Dctr holds number of meters, so D=Dctr/2
After S4, go back to S1 to wait for button again
9
Step 2: Create a Datapath
Datapath must
Implement data storage Implement data computations
Look at high-level state machine, do three substeps
(a) Make data inputs/outputs be datapath inputs/outputs (b) Instantiate declared registers into the datapath (also instantiate a register for each data output) (c) Examine every state and transition, and instantiate datapath components and connections to implement any data computations
Instantiate: to introduce a new component into a design.
10
Step 2: Laser-Based Distance Measurer
(a) Make data Local Registers: Dctr (16 bits) inputs/outputs be datapath B S inputs/outputs (b) Instantiate declared S4 S0 S1 S2 S3 registers into the B S datapath (also L=0 Dctr = 0 L=1 L=0 D = Dctr / 2 instantiate a D=0 Dctr = Dctr + 1 (calculate D) register for each a data output) Datapath (c) Examine every Dreg_clr state and Dreg_ld transition, and clear clear I Dctr_clr instantiate Dct r: 16-bit Dreg: 16-bit count D c tr_c n t load up-count er regist er datapath Q Q components and connections to implement any 16 data computations
D
Inputs: B, S (1 bit each)
Outputs: L (bit), D (16 bits)
11
Step 2: Laser-Based Distance Measurer
(c) (continued) Examine every state and transition, and instantiate datapath components and connections to implement any data computations
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) B S
S0 L=0 D=0
Datapath
S1 Dctr = 0
S2 L=1
S3
S4
L=0 D = Dctr / 2 Dctr = Dctr + 1 (calculate D)
a
Dreg_clr Dreg_ld Dctr_clr Dctr_cnt clear count Q 16 Dct r: 16-bit up -count er clear load
>>1 16 I Q Dreg: 16-bit regist er
16
D
12
Step 3: Connecting the Datapath to a Controller
from button B Controller Dreg_clr Dreg_ld Dctr_clr Dctr_cnt D to display 16 300 MHz Clock Datapath S L to laser from sensor
Laser-based distance measurer example Easy just connect all control signals between controller and datapath
Datapath Dreg_clr Dreg_ld Dctr_clr Dctr_cnt
>>1
16
clear count Q Dctr: 16-bit up-counter clear load I Dreg: 16-bit register Q 16 D
16
13
Step 4: Deriving the Controllers FSM
B from butt on Cont roller Dreg_clr Dreg_ld Dctr_clr Dctr_cnt D t o displ ay 16 300 M Hz Clock Datapath S L to laser from sensor
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) B S
S0 L=0 D=0
S1 Dctr = 0
S2 L=1
S3
S4
L=0 D = Dctr / 2 Dctr = Dctr + 1 (calculate D)
Inputs: B, S FSM has same Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt structure as highB level state machine
S
a
Inputs/outputs all bits now Replace data operations by bit operations using datapath
S0
S1
S2
S3
S4
L=0 Dreg_clr = 1 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser off) (clear D reg)
L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 1 Dctr_cnt = 0 (clear count)
L=1 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser on)
L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 1 (laser off) (count up)
L=0 Dreg_clr = 0 Dreg_ld = 1 Dctr_clr = 0 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting)14
Step 4: Deriving the Controllers FSM
B S S0 L=0 Dreg_clr = 1 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser off) (clear D reg) S1 L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 1 Dctr_cnt = 0 (clear count) B S2 L=1 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser on) S3 S S4 L=0 Dreg_clr = 0 Dreg_ld = 1 Dctr_clr = 0 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting)
L=0 Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 1 (laser off) (count up)
Using shorthand of outputs not assigned implicitly assigned 0
Inputs: B, S
Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
B S
a
S0 L=0 Dreg_clr = 1 (laser off) (clear D reg)
S1 Dctr_clr = 1 (clear count)
S2 L=1 (laser on)
S3
S4 Dreg_ld = 1 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting)
L=0 Dctr_cnt = 1 (laser off) (count up)
15
Step 4
from button Controller B L Dreg_clr Dreg_ld Dctr_clr Dctr_cnt to display D 16
300 MHz Clock
Datapath
to laser from sensor
Datapath
>>1
Dreg_clr Dreg_ld Dctr_clr Dctr_cnt clear count Dctr: 16-bit up-counter Q clear load 16
16 I Dreg: 16-bit register Q 16
Inputs: B, S
Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt B S
S0 L=0 Dreg_clr = 1 (laser off) (clear D reg)
S1 Dctr_clr = 1 (clear count)
S2 L=1 (laser on)
S3
L=0 Dctr_cnt = 1 (laser off) (count up)
Implement S4 FSM as state register and Dreg_ld = 1 Dctr_cnt = 0 logic (Ch3) to (load D reg with Dctr/2) complete the (stop counting) design
16
RTL Example: Video Compression Sum of Absolute
Only difference: ball moving
Frame 1 Frame 2
Differences
Frame 1 Frame 2
Digitized
Digitized
Digitized
Difference of
frame 1
frame 2
frame 1
2 from 1
1 Mbyte (a)
1 Mbyte
1 Mbyte (b )
0.01 Mbyte
Video is a series of frames (e.g., 30 per second) Most frames similar to previous frame
Compression idea: just send difference from previous frame
Just send difference
17
RTL Example: Video Compression Sum of Absolute
compare
Frame 1 Frame 2
Differences
Assume each pixel is represented as 1 byte (actually, a color picture might have 3 bytes per pixel, for intensity of red, green, and blue components of pixel)
Need to quickly determine whether two frames are similar enough to just send difference for second frame
Compare corresponding 16x16 blocks
Treat 16x16 block as 256-byte array
Compute the absolute value of the difference of each array item Sum those differences if above a threshold, send complete frame for second frame; if below, can use difference method (using another technique, not described)
18
RTL Example: Video Compression Sum of Absolute
Differences
256-byte array 256-byte array
A SAD
B go
sad
integer
!(i<256)
Want fast sum-of-absolute-differences (SAD) component
When go=1, sums the differences of element pairs in arrays A and B, outputs that sum
19
RTL Example: Video Compression Sum of Absolute
Differences
A
SAD B sad
Inputs: A, B (256 byte memory); go (bit) Outputs: sad (32 bits) Local registers: sum, sad_reg (32 bits); i (9 bits)
S0 go
go
!go
sum = 0 i=0
a
S0: wait for go S1: initialize sum and index S2: check if done (i>=256) S3: add difference to sum, increment index S4: done, write to output sad_reg
S1
(i<256)
!(i<256)
S2 i<256 sum=sum+abs(A[i]-B[i]) S3 i=i+1 S4
sad_reg = sum
20
RTL Example: Video Compression Sum of Absolute
Differences
Inputs: A, B (256 byte memory); go (bit) Outputs: sad (32 bits) Local registers: sum, sad_reg (32 bits); i (9 bits) AB_addr i_lt_256 i_inc <256 9 i A_data B_data 8 8
S0 go S1
(i<256)
!go sum = 0 i=0
i_clr
sum_ld sum_clr
8 32
S2
sum
abs
8
i<256 !(i<256) sum=sum+abs(A[i]-B[i]) sad_reg_ld S3 i=i+1
32 32
sad_reg 32 sad
!(i<256) sad_reg=sum S4(i_lt_256)
Datapath
Step 2: Create datapath
21
RTL Example: Video Compression Sum of Absolute
Differences
go go i_inc sum=0 sum_clr=1 i=0 i_clr=1 i_clr sum_ld sum_clr sad_reg_ld sum 32 32 sad_reg 32 sad 32 abs 8 i AB_ rd i_lt_256 S0 go S1 <256 9 AB_addr A_data B_data 8 8
S2 i<256 i_lt_256 sum=sum+abs(A[i]-B[i]) S3 sum_ld=1; AB_rd=1 i=i+1 i_inc=1
!(i<256)
S4
!(i<256) !(i<256) (i_lt_256) Controller
sad_reg=sum sad_reg_ld=1 (i_lt_256)
Step 3: Connect to controller Step 4: Replace high-level state machine by FSM
22
RTL Example: Video Compression Sum of Absolute
Differences
Comparing software and custom circuit SAD
Circuit: Two states (S2 & S3) for each i, 256 is 512 clock cycles Software: Loop (for i = 1 to 256), but for each i, must move memory to local registers, subtract, compute absolute value, add to sum, !(i<256) increment i say about 6 cycles per array item 256*6 = 1536 cycles !(i<256) Circuit is about 3 times (300%) (i_lt_256) faster
(i<256)
S2 i<256 sum=sum+abs(A[i]-B[i]) S3 i=i+1
23
Control vs. Data Dominated RTL Design
Designs often categorized as control-dominated or datadominated
Control-dominated design Controller contains most of the complexity Data-dominated design Datapath contains most of the complexity General, descriptive terms no hard rule that separates the two types of designs Laser-based distance measurer control dominated SAD circuit mix of control and data Now lets do a data dominated design
24
Data Dominated RTL Design Example: FIR Filter
Filter concept
Suppose X is data from a temperature sensor, and particular input sequence is 180, 180, 181, 240, 180, 181 (one per clock cycle) That 240 is probably wrong!
Could be electrical noise
X 12 digital filter 12
Filter should remove such noise in its output Y Simple filter: Output average of last N values
Small N: less filtering Large N: more filtering, but less sharp output
clk
25
Data Dominated RTL Design Example: FIR Filter
FIR filter
Finite Impulse Response Simply a configurable weighted sum of past input values y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Above known as 3 tap Tens of taps more common Very general filter User sets the constants (c0, c1, c2) to define specific filter
X 12 clk digital filter 12 Y
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
RTL design
Step 1: Create high-level state machine
But there really is none! Data dominated indeed.
Go straight to step 2
26
Data Dominated RTL Design Example: FIR Filter
Step 2: Create datapath
Begin by creating chain of xt registers to hold past values of X Suppose sequence is: 180, 181, 240
X 12 clk digital filter 12 Y
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
240 180 181
180 181
180
27
Data Dominated RTL Design Example: FIR Filter
Step 2: Create datapath (cont.)
Instantiate registers for c0, c1, c2 Instantiate multipliers to compute c*x values
x(t) xt0 X clk
a
X 12 clk digital filter 12
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
c0
3-tap FIR filter x(t -1) c1 xt1
x(t -2) xt2
c2
*
Y
28
Data Dominated RTL Design Example: FIR Filter
Step 2: Create datapath (cont.)
Instantiate adders
X 12 clk digital filter 12 Y
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
3-tap FIR filter x(t) xt0 X clk c0 x(t -1) xt1 c1 x(t -2) xt2 c2
* +
* +
29
Data Dominated RTL Design Example: FIR Filter
Step 2: Create datapath (cont.)
Add circuitry to allow loading of particular c register
CL Ca1 Ca0 C x(t) xt0 X clk c0 x(t-1) xt1 c1 x(t-2) xt2 c2
a
X 12 clk digital filter 12
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
3-tap FIR filter 3 2x4 2 1 0 e
* +
* +
*
yreg Y
30
Data Dominated RTL Design Example: FIR Filter
Step 3 & 4: Connect to controller, Create FSM
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
No controller needed Extreme data-dominated example (Example of an extreme control-dominated design an FSM, with no datapath)
Comparing the FIR circuit to a software implementation
Circuit
Assume adder has 2-gate delay, multiplier has 20-gate delay Longest past goes through one multiplier and two adders
20 + 2 + 2 = 24-gate delay
100-tap filter, following design on previous slide, would have about a 34-gate delay: 1 multiplier and 7 adders on longest path
Software
100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per multiplication, 2 per addition. Say 10-gate delay per instruction. (100*2 + 100*2)*10 = 4000 gate delays
Circuit is more than 100 times faster (10,000% faster).
31