ELEC-H-473
Logic Circuits
Dragomir Milojevic
Université libre de Bruxelles
2023
Microprocessors & computers: top-down view
• Distributed system – many different & independent sub-systems
that process information concurrently (in parallel); various parts
of the system will produce some information based on inputs, and
exchange this information with other sub-systems
• Combinatorial & sequential digital logic circuits are used to
materialize Boolean algebra according to Shannon switching
theory and to practically implement the above sub-systems
• Complementary Metal Oxide Semiconductor (CMOS) process
technology is used to implement logic circuits and Application
Specific Integrated Circuit (ASIC) (transistors used as switches)
• Very Large Scale of Integration (VLSI) is enabled with scaled
CMOS technologies; today CMOS logic functional density allows
to implement millions of logic gates per cm2 of IC area
• System-on-Chip (SoC) paradigm: ASICs are seen as assembly of
many functional blocks to cope with complexity of today’s CPUs
ELEC-H-473 Th01 2/63
Challenges today
• In the past, individual ingredients: system architecture, design
(HW) and CMOS processing technology (top-down) were kept
well isolated one from the other in “divide & conquer” approach
• Today academia & industry are unifying theses views: to better
optimize systems we can’t isolate different ingredients any more
• Check for following concepts currently in vogue in the field:
. Device-Technology Co-Optimization (DTCO) – How to concurrently
optimize: a) the device, i.e. basic transistor architecture & b)
processing technology used to effectively manufacture this device
. System-Technology Co-Optimization (STCO) – How to
concurrently optimize system architecture (i.e. computer
organization) & processing technology (how do you effectively
manufacture device & the complete circuit)
. Advanced IC packaging & heterogeneous system integration – how
to enable and design IC packages with multiple dies using
optimized CMOS process for each functionality
ELEC-H-473 Th01 3/63
Our focus
• Tomorrow computing systems will be designed holistically, i.e., by
taking all ingredients simultaneously into account: from CMOS
processing technology, device (transistor) & logic components
design (gates), functional block micro-architecture, SoC assembly
& implementation, up to SW design, so algorithm, program
development in some language and generation of the executable
• This is extremely ambitious ... but there is no other way out!
• Main driver of this industry in the past, CMOS technology, is
today reaching it’s limits (that good old Moore’s law)
• To still enable system performance improvements, we need to
chase sub-optimality at all levels of system design
• To enable holistic design of computer systems, we require in-depth
view of all layers above: and this is what we will try to do here!
ELEC-H-473 Th01 4/63
Today
1. Boolean logic
2. Combinatorial circuits
3. Sequential circuits
4. Typical logic circuits in microprocessors
5. Modeling and implementation of logic circuits today
1. Boolean logic
ELEC-H-473 Th01 6/63
Formal definition
• B – is a set of 2 elements, TRUE & FALSE or {0, 1}
• 0 – or, prime symbol (used in this notes) denotes complement
operator (changes 0 to 1 and inversely); other notation include a
horizontal bar over the element of B, so 0 = 1 and 1 = 0
• · – dot denotes AND operator
• + – plus denotes OR operator
• Boolean algebra is then a quadruple {B,0 , ·, +}
• Important to keep in mind that + and · are not arithmetic
operators, but Boolean operators AND and OR
• Variable x is Boolean variable if it takes only values from B, so it
can be either 0 or 1
. Note that in the above “or” is exclusive, a Boolean variable can’t
be 0 and 1 at the same time – law of excluded middle – either this
proposition or its negation is true, but not both
ELEC-H-473 Th01 7/63
Boolean function
• Boolean function f is a function whose k arguments are Boolean
variables
• Result of function f evaluation belongs to set B, noted as:
f : {0, 1}k → 0, 1
• The number of arguments k in the above is a non-negative
integer called arity of the logic function; for small k it is possible
to enumerate all possible operators
. 0-ary Nullary – a Boolean constant that can be 0 or 1
. 1-ary Unary – complement or logical NOT operator
. 2-ary Binary – AND, OR that are not the only binary operators;
◦ Can you explain why we can enumerate all operators?
◦ How many different operators you could define for each case?
◦ How many more operators we get for k + 1 compared to k?
ELEC-H-473 Th01 8/63
Boolean operators
• Boolean operators can be defined using truth tables in which
function is evaluated for all possible combinations of arguments:
thus the truth table has k columns, and 2k lines
• For AND, OR and XOR table has 2 columns and 4 lines:
AND OR XOR
x y x ·y x +y x ⊕y
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0
• Note that OR operator is inclusive, the result is also true when
both variables are true; in exclusive or (XOR) the result is true only
if exclusively one of the variables is true
ELEC-H-473 Th01 9/63
Axioms and single argument theorems
• Using definitions of three Boolean operators, we define 4 axioms
of Boolean algebra; index a is used for · (AND) & b for + (OR):
A1a 0·0 = 0 A1b 1+1 = 1
A2a 1·1 = 1 A2b 0+0 = 0
A3a 0·1 = 1·0 = 0 A3b 1+0 = 0+1 = 1
A4a 00 = 1 A4b 10 = 0
• Following theorems hold for one Boolean variable x:
T1a x ·0 = 0 T1b x +1 = 1 Null element
T2a x ·1 = 1·x = x T2b x +0 = 0+x = x Identity
T3a x ·x = x T3b x +x = x Idempotent
T4a (x 0 )0 = x T4b NA Double inv
T5a x · x0 = 0 T5b x + x0 = 1 Inverse
• Above theorems can be proven by simply substituting the values
of the Boolean variable x in different expressions
ELEC-H-473 Th01 10/63
Theorems with 2 and 3 variables
• Following theorems hold for 2 & 3 variables x, y , z
T6a x ·y = y ·x T6b x +y = y +x Commutative
T7a (x · y ) · z = x · (y · z) T7b (x + y ) + z = x + (y + z) Associative
T8a x · (y + z) = (x · y ) + (x · z) T8b x + (y · z) = (x + y ) · (x + z) Distributive
T9a x · (x + y ) = x T9b x + (x · y ) = x Absorption
T10a (x · y ) + (x · y 0 ) = x T10b (x + y ) · (x + y 0 ) = x Combining
T11a (x · y )0 = x 0 + y 0 T11b (x + y )0 = x 0 · y 0 DeMorgan’s
• Above theorems can be proven using Truth Tables, axioms or
already proven theorems; there could be multiple & valid ways to
prove a given theorem, although some may be simpler than other
• Example:
x + (x · y ) = (x · 1) + (x · y ) Th2a
= x · (1 + y ) Th8a
= x · (1) Th1b
= x Th2a
ELEC-H-473 Th01 11/63
Duality Principle
• Dual of a logic expression is obtained by changing all + operators
with · operators, and by changing all 0 with 1 (and vice versa, so
we replace · with +, and 1 with 0)
• For example:
(x · y 0 · z) + (x · y · z 0 ) + (y · z) + 0
has a dual:
(x + y 0 + z) · (x + y + z 0 ) · (y + z) · 1
• Attention !!! The above does not state that a Boolean expression
is equivalent to its dual; example:
. F1 = x · 0 = 0 is true
. So dual of F1 , the F2 = x + 1 = 1 is true too
. But F1 6= F2 since F1 = 0 and F2 = 1 and 0 6= 1
ELEC-H-473 Th01 12/63
Functional completeness
• One can enumerate all Boolean operators; if we do this for 2
arguments we get 16 possible operators (why 16?)
• A Functionally Complete Set (FCS) of Boolean operators can
express all possible truth tables (Boolean functions) only by
combining its members; examples of FCS:
. FCS1(AND, NOT) – 3rd operator OR can be calculated as a
function of these two
. Even more interesting – singleton FCS – composed of a single
element; known singleton FCSs: FCS2(NAND) & FCS3(NOR)
• FCS are great in practical digital electronics: build few gates (and
even only one in case of singletons!), and all other gates and or
logic functions could be built using only these; this can seriously
simplify digital circuit design
ELEC-H-473 Th01 13/63
Logic circuits with singleton FCS
• Following table indicates how it is possible to build basic logic
operators NOT, AND and OR from only binary NAND or NOR gates
Source op. Dest op. Wiring
NOT NAND x 0 = (xx)0 = (x1)0
NOT NOR x 0 = (x + x)0 = (x + 0)0
AND NAND xy = ((xy )0 )0
AND NOR xy = ((xy )0 )0 = (x 0 + y 0 )0
OR NAND x + y = ((x + y )0 )0 = (x 0 y 0 )0
OR NOR x + you = ((x + y )0 )0
ELEC-H-473 Th01 14/63
Boolean logic & logic circuits
• Two valued Boolean algebra describe switching circuits operation
• Switch can be open (no current flow) or closed (current flows);
ideal switches change from open to close & inversely instantly; 2
types of switches: Normally Open (NO) & Normally Closed (NC)
NO NC
• NO switch is associated with Boolean variable x, & NC with
inverted Boolean x 0 ; switches are assembled in a circuit to reflect
Boolean equation; what is the Boolean equation of the circuit?
x y
u v
ELEC-H-473 Th01 15/63
2. Combinatorial circuits
ELEC-H-473 Th01 16/63
Definition of combinatorial systems
• Outputs Oi are Boolean functions of inputs Ij only:
Oi = fi (Ij ) i ∈ 0, 1, . . . , n − 1 j ∈ 0, 1, . . . , m − 1
• Inputs are random variables, they can change at any moment;
however if two or more inputs need to change at the same time,
they will do so instantly, and at the same time (for now)
• Multiple (hopefully different!) logic functions are independent
one from the other, even if they use the same arguments (inputs)
• When implemented with electronic circuits, fi logic functions will
be concurrent, i.e., they will run in parallel
• Main property of combinatorial circuits:
Same input combination will always produce same output
Combinatorial systems are simplest form of automatons in theory of automata
ELEC-H-473 Th01 17/63
Representation of combinatorial systems
• Most of the time: Truth Tables, Logic Functions, & Schematics
• We call Minterm a combination of all input variables for which
O = 1; Maxterm is the combination of all inputs for which O = 0
. For truth table of function F (a, b):
I10 a b F
0 0 0 1
1 0 1 1
2 1 0 0
3 1 1 0
. a0 b 0 , a0 b are Minterms; ab 0 , ab are Maxterms; 1st column is the
decimal equivalent (I10 base 10) of an input combination, a is Most
Significant Bit (MSB), & b Least Significant Bit (LSB)
. The choice is arbitrary but must be respected once fixed
• Logic function is then represented as:
Sum of Minterms Product of Maxterms
or
F = a0 b 0 + a0 b F = (a0 + b)(a0 + b 0 )
ELEC-H-473 Th01 18/63
Four input truth table, logic expression & circuit
Dec. a b c d F a b c d
0 0 0 0 0 1
1 0 0 0 1 1
2 0 0 1 0 0
3 0 0 1 1 0
4 0 1 0 0 0
5 0 1 0 1 0
6 0 1 1 0 1
7 0 1 1 1 0 F
8 1 0 0 0 1
9 1 0 0 1 1
10 1 0 1 0 0
11 1 0 1 1 0
12 1 1 0 0 0
13 1 1 0 1 0
14 1 1 1 0 1
15 1 1 1 1 0
F = a0 b 0 c 0 d 0 + a0 b 0 c 0 d + a0 bcd 0 + ab 0 c 0 d 0 + ab 0 c 0 d + abcd 0
ELEC-H-473 Th01 19/63
Logic function transformation
• Expressing logic functionality using sum of Minterms (or product
of Maxterms) is not optimal
• Expression on the previous slide can be simplified using axioms &
theorems from previous section (mostly Inverse and Null element):
F1 = a0 b 0 c 0 d 0 + a0 b 0 c 0 d + a0 bcd 0 + ab 0 c 0 d 0 + ab 0 c 0 d + abcd 0
F2 = a0 b 0 c 0 (d 0 + d) + (a0 + a)bcd 0 + ab 0 c 0 (d 0 + d)
F3 = a0 b 0 c 0 + bcd 0 + ab 0 c 0
F4 = (a0 + 1)b 0 c 0 + bcd 0
F5 = b 0 c 0 + bcd 0
• Final expression uses significantly less logic gates!, thus saving
significant circuit resources → 2, instead of 6 AND gates, plus
one 2-inputs, instead of one 6-inputs OR gate
ELEC-H-473 Th01 20/63
Logic optimization & forms
• Expressions F1 and F5 are logically equivalent, so :
a0 b 0 c 0 d 0 + a0 b 0 c 0 d + a0 bcd 0 + ab 0 c 0 d 0 + ab 0 c 0 d + abcd 0 = b 0 c 0 + bcd 0
• Sum-of-Minterms (F1 ) is called Canonical Disjunctive Normal
Form (CDNF); CDNF is unique expression for a given truth table
• Transformed expression F5 called Sum-of-Products (SoP), or
Product-of-Sums (PoS), is still a sum, but minterms are replaced
with simple Boolean terms using arbitrary number of variables;
different SoPs could be derived from a given Truth Table
. Products & sums in the above are Boolean operators AND & OR, and
not standard arithmetic operators; don’t be confused
• Optimized SoPs, have better Performance, Power, Area and Cost
(PPAC) parameters, key goal in digital electronics
• Logic optimization is crucial for practical circuit implementation
ELEC-H-473 Th01 21/63
Practical logic optimization methods
• Application of axioms & theorems may be applicable to simple
expressions, for more inputs this could be error prone & time
consuming → we need better methods
• Karnaugh maps – graphical method, efficient up to 5 variables
cd
00 01 11 10
00 1 1 0 0
E.g. using K-Maps:
01 0 0 0 1
ab
11 0 0 1 F = b 0 c 0 + bcd 0
10 1 1 0 0
• Exhaustive methods such as Quinne-McCluskey, Espresso, etc.
have been proposed to allow logic optimization automation
ELEC-H-473 Th01 22/63
Two-level logic 1/3
• We know that any SoP can be implemented using a set of AND and
one OR logic gate (inverse for an PoS)
• Each AND gate correspond to 1 SoP term; if there is no logic
redundancy, only 1 AND gate is active at a time for a given input
combination; note that in optimized circuits multiple input
combinations could affect the same AND gate: b 0 c 0 gate will set output
to 1 for a = 1 and a = 0 as long as b = c = 0
a b c d
F
• Input signals will “travel” through one AND and one OR gate; because
of this, SoP (PoS) circuits are called two-level logic, “levels” referring
to number of gates that signal has to cross
ELEC-H-473 Th01 23/63
Delay in two-level logic circuits 2/3
• If we consider practical circuit, implemented using actual electronic
components that have non negligible RLC values for transistors &
wires, there will be some delay from input to output
a b c d
F
AND delay OR delay
• If we neglect wire delays & keep only logic gate delays, when input
changes, the new value for the output will appear 2 gate delays later
• Important remark – no matter the logic function complexity (i.e. the
number of inputs), the SoP performance should be the same: 2 gate
delays → this would be GREAT news ... if only it could be true (you
know that in life and math “there is no free lunch”!)
ELEC-H-473 Th01 24/63
Two-level logic pitfalls and solution 3/3
• Where is the problem? Logic gate delay, power & area are
non-linear function of number of inputs (many other parameters
too, out of scope for now); illustration using 45nm CMOS tech
• Comparing AND gates with 2, 3 & 4 Timing: 2, 3 & 4 inputs AND, X1 drive strength
inputs, 4 being maximum number of 1
inputs for a gate in this library 0.8
• Timing for single load & different 0.6
input transition times normalized to
0.4
max load → limited number of inputs
controls the increase in gate delay 0.2
• Area of 3, 4 inputs is respectively 1.6 0
1 2 3 4 5 6 7
AND2_X1 AND3_X1 AND4_X1
& 1.9 bigger than 2-inputs AND
• Consequence: SoPs can be efficiently implemented only for VERY
simple logic circuits with maximum 4 – 5 inputs
ELEC-H-473 Th01 25/63
Multi-level logic – logic factoring
• What to do then with more complex logic expressions? Rather than
using big, power hungry & slow gates, we decompose logic function
into smaller functions that use gates with limited number of inputs
(say < 5) and connect these logic functions in series
O = f (Ij ) = fz (z, . . . , f2 (d, (f1 (c, f0 (a, b))))
• Logic computation now has multiple levels since input signal will have
to travel through more than 2 logic gates, hence the name
f1
f f2
fz
delay delay delay
• Delay in logic circuit now increases linearly with the number of levels:
computation is “stretched in time” with smaller individual delays
ELEC-H-473 Th01 26/63
Complete combinatorial logic system view
• Each output correspond to a specific Boolean logic function;
different logic functions work in parallel (concurrently)
• Individual logic functions are a b c d
optimized, and may contain
multi-level logic, the number of f1 O1
logic levels depend on function
complexity on optimization targets f2 O2
• In real circuits the output of the
system as a whole is valid only
after the logic function with
longest delay computes the output; fn On
observing the output before will
give us previous (old) output value
ELEC-H-473 Th01 27/63
3. Sequential circuits
ELEC-H-473 Th01 28/63
Sequential logic circuits
• Outputs (Oi ) depend on inputs (Ij ) and system state (Sk );
multiple Boolean values/expressions
• Future system states (Sk+ ) are calculated using logic functions of
inputs AND previous states (Sk− ) represented using internal (or
state) variables; hence two sets of Boolean equations:
Oi = f 1i (Ij , Sk− ) Inputs
Sk+ = f 2k (Ij , Sk− ) CL
i ∈ 0, 1, . . . , n − 1
j ∈ 0, 1, . . . , m − 1
delay
k ∈ 0, 1, . . . , o − 1 Present Future
• Sequential logic circuit can be seen as a Combinatorial Logic (CL)
circuit with one (or more) feedback(s) for state variables, all
sequential systems have this feedback!
ELEC-H-473 Th01 29/63
Properties
• In sequential logic systems the outputs can be different for same input
combination → they can express more complex behavior
• This is possible thanks to the notion of state “stored” in the feedback
wire that exhibits some kind of memorizing behavior
• While all possible input values can be enumerated for a fixed number
of inputs (combinatorial logic), we can’t do the same with sequential
systems, even for a small number of inputs; example:
. Design a system that will identify a given sequence of bits, serialized
on a single-bit input line
. The number of states is at least the number of bits in the sequence
(plus eventually one state to indicate that sequence is wrong)
. For large bit sequences, the system will have many states
• Number of states must be bounded, you should know why?
• Sequential logic circuits are often abstracted using Finite State
Machine (FSM) model of computation, next in line in complexity in
theory of automata (so, after combinatorial systems)
ELEC-H-473 Th01 30/63
System outputs
• Figure on slide p. 29 omitted to represent outputs on purpose
• This is because we distinguish 2 sequential logic circuits classes:
. Moore machines – the output is calculated as function of state
variables only (present); we thus need to only decode the state
. Mealy machines – the output is calculated as function of state
variables and the inputs; same state can have different outputs
• Mealy machines potentially enable better state optimization,
resulting in simpler circuits; not given, since system dependent
Inputs Inputs
Outputs
CL CL Outputs
delay delay
Present Future Present Future
Moore Mealy
ELEC-H-473 Th01 31/63
Sequential logic circuits representation 1/3
• State tables – define future state as function of current state
(first column in the table) and new input combination
• State tables differentiate stable states (in bold), & transitions; the
system remains in stable state as long as input doesn’t change;
transitions occur when system is in stable state & input changes
Moore Mealy
Sk+ ab Sk+ ab
Sk− 00 01 11 10 Z Sk− 00 01 11 10
1 1 1 3 2 0 1 1/1 1/0 3 2
2 2 1 4 2 1 2 2/0 1 4 2/1
3 1 3 3 4 1 3 1 3/1 3/0 4
4 2 3 4 4 0 4 2 3 4/1 4/0
• Important hypothesis: transitions are instantaneous; in real
systems this can’t be true; however same behavior can be achieved
if the system forbids input variables to change during transitions
ELEC-H-473 Th01 32/63
Sequential logic circuits representation 2/3
• State diagrams – derived from graph theory; stable states are
represented using graph vertices; edges represent transitions with
logical conditions on inputs marked as edge weights; used rarely
outside textbooks, since cumbersome for complex systems
00, 01
00 10
11 01
01, 11
3 2 00, 10
10 00
01 11
10, 11
ELEC-H-473 Th01 33/63
Sequential logic circuits representation 3/3
• Boolean logic equations – derived from the state table that uses
an arbitrary alphabet, for example Sk+ , Sk− ∈ Σ{a, b, c, d, . . .}
• Because we want to implement a sequential system with Boolean
logic, we need to map states into binary equivalents – state
encoding where unique binary code is assigned to each state
. We need at least log2 m bits to encode m states; each bit will
correspond to one state (Boolean) variable
• Designer picks one out of four possible (standardized) memory
elements (D,T,SR,JK flip-flops) and then derives logic equations
that control these memories; in the example below a system with
two state variables, JK memory & output logic function
J1 = ab, J2 = y1 a0 + y10 a
K 1 = a0 b 0 K 2 = y1 b 0 + y10 b
Z = b 0 y1 y2 + ay10 y2
ELEC-H-473 Th01 34/63
Race conditions
• In systems with two state variables or more, it is possible that
more than one state variable change at the same time
• In real circuits, propagation delays in gates & wires will cause
different arrival times for state variables at the input of the Logic
Circuit, on the left side of the circuit → race conditions
• If one state variable arrives before the
other, LC may “interpret” input wrongly,
& end in wrong destination state Logic
0→0 0→1
circuit
• Example: state transition from 00→11
0→1 0→1
results in 01, and not in11 as it should
• Race conditions are major problem in
systems with feedbacks since they will
cause malfunction, they must be avoided
at any price! in practical systems
ELEC-H-473 Th01 35/63
Asynchronous circuits
• One way of dealing with race conditions on state variable
feedbacks is to build asynchronous logic circuits
• In such circuits no state transition between two stable states will
ever involve more than one state variable to change at a time
• This can be done in different ways: using different state encoding,
state table modifications (but by still preserving the machine
behavior) and with adding supplementary state variables
• In the above, first two are not always possible and the third adds
extra resources; all this constraints the system design; worse,
asynchronous systems scale extremely poorly with the number of
inputs and the number of states
• This is the reason why majority of digital systems are not
implemented in this way → we need another solution to problem
of race conditions in state variables – synchronous systems
ELEC-H-473 Th01 36/63
Synchronous logic circuits
• Synchronous circuits solve race conditions by inserting 1 memory
bit per state variable (feedback loop) to split present from future
• Present state is updated only at a given moment allowing
transients in wires & Logic Circuit to settle down
• In another words races will be “filtered out” at memory input
• Consequence: fastest paths will have to
wait for the slower paths; i.e. we
downgrade system performance to the
Logic
worst in class – something that does not circuit
sound very optimal from the start
• If differences between delays are not
significant, sacrifice is not huge (make M1
sure you can explain this) but benefits are
M2
BIG since arbitrary number of state
variables can now change simultaneously
ELEC-H-473 Th01 37/63
Bistables, Latches & Flip-Flops 1/2
How to build memory elements for use in sequential logic circuits?
• Bistable – generic electronic circuit with two states, generally
associated to Boolean values of 0 and 1
• Latches – bistables with a control signal C, whatever is found on
input D is copied to the output Q as long as C=1
. Latches are sensitive to level of C, hence they are “open” as long as
C=1; any variation on D will be seen on the output
. This is why they are also often called transparent latches
. Typically banned from usage, unless in some very specific cases ...
• Flip-Flops (FFs) – bistables with a control signal C: input D is
copied to output Q only during transition of C from 0 to 1 rising
edge of C (or, falling edge when transition of C is from 1 to 0)
. In principal control signal could take any form, but if it is periodic,
we will speak of clock (Clk); the period P and frequency F will
have to be decided for each circuit depending on the needs
ELEC-H-473 Th01 38/63
Bistables, Latches & Flip-Flops 2/2
• In digital logic wee speak of different FFs: SR, JK, T & D
• They differ in the way how their control inputs are used to set the
FF state; first two have two control inputs, and last two only one
S+ SR S+ JK
S− 00 01 11 10 S− 00 01 11 10
0 0 0 – 1 0 0 0 1 1
1 1 0 – 1 1 1 0 0 1
S+ T S+ D
S− 0 1 S− 0 1
0 0 1 0 0 1
1 1 0 1 0 1
• Note: different FFs can be derived from a single D-FF, using
extra combinatorial logic, D-FF is used as a basic building block
ELEC-H-473 Th01 39/63
D-FF: mother of all Flip-Flops
• Input D is sampled & stored on the rising edge of a Clk; any
input variation on D outside the rising edge is ignored → this is
why D-FF is sometimes also called sample & hold FF
• Output value Q indicates the last D stored
• In case of simultaneous variations of D & Clk, Q will take the
previous value, the one just before the Clk rising edge (this is
very important hypothesis, see next slide)
Simultaneous Clk &D
Clk
D Q
DFF D
Clk
Q
Previous Previous
value is 1 value is 0
ELEC-H-473 Th01 40/63
D-FF: the real thing
• Physical reality imposes following things:
. Inputs are random variables and they could occur at any time,
including simultaneous changes with the rising edge of the Clk
. FF circuitry will have some RLC values preventing instantaneous
switching, and some transition times on both inputs & outputs
. Clock signal is generated with some physical circuit & will have jitter
• All of the above contribute to metastability – situations in which FFs
cant’t guarantee that the right value is stored; we need to avoid this!
• For D-FF, the spec says that if we have simultaneous change of D &
Clk the device uses the value before; how to achieve this?
. Data must arrive before & should remain valid for some time with
respect to the clock edge → set-up & hold times
Clk
D
Setup Hold
ELEC-H-473 Th01 41/63
D-FF in the real circuit
• Typically input data will come from a FF, so at first rising
(launch) edge of the Clk, data in the source FF (FF1) will appear
on Q output after some small delay – clock to output delay t1
1
2 4
3
• Signal will have to travel through some wires – t2 , some
combinatorial logic (and wires) – t3 and some wires t4
• Destination FF (FF2) can safely store the data at the next
(capturing) edge of the Clk, only if it arrives on time:
i=4
1
∑ ti < FClk − tsetup
i
ELEC-H-473 Th01 42/63
Impact on circuits & microprocessors
• Microprocessors are sequential logic circuits – in fact a collection of
many independent sequential circuits that exchange processed
information (CPU is a highly distributed system as said in the begging)
• To make all these sequential circuits operational, we need to avoid
race conditions, and use FFs to synchronize different microprocessor
functional blocks and state machines
• Thus to improve the performance of a computing system, we need to
improve the performance of the logic circuit by reducing the period,
i.e. allow increase of the Clk operating frequency F
• Assuming same logic circuit, to increase F we need improved electrical
properties of components (transistors, wires etc.) used to implement a
circuit; in ICs this is possible thank’s to CMOS scaling
• Need for synchronization in sequential logic circuits and CMOS scaling
are two key concepts that are behind the Moore’s law, one of the most
important paradigms in microprocessor design & implementation
ELEC-H-473 Th01 43/63
4. Typical logic circuits in microprocessors
ELEC-H-473 Th01 44/63
Multiplexers/Demultiplexers
• Multiplexers select one input signal I among n signals (typically n
is power of 2) based on input word S of log2 n bits
• Truth table of 4:1 MUX is given below; note that a given select
combination makes the corresponding input bit to “pass” to the
output; other bits are ignored as long as S is active
S1 S0 O I0
0 0 I0 I1
4:1 O
0 1 I1 I2
I3
1 0 I2
1 1 I3
s0 s1
• Multiple multiplexers/demultiplexers circuits could be put in
parallel – crossbars – to work with words: each word bit will have
it’s own MUX; this is heavily used for communication in CPUs;
however MUX/DEMUX circuitry PPAC scale quite poorly with n
ELEC-H-473 Th01 45/63
Decoder/Encoders
• Decoders – assert one out of On output lines depending on the
value of Im input, that is m-bit binary; m-to-n decoder that has m
input, n output lines, with n = 2m
• Circuit can be controlled using enable signal EN: if EN=0 all output
lines are 0 (no matter values of I1 & I0 ); when EN=1 the output
line whose index is equal to the value of the input binary data is
asserted (truth table below uses don’t care for I1 , I0 and EN=0)
EN I1 I0 O3 O2 O1 O0
O0
0 – – 0 0 0 0
I0 O1
1 0 0 0 0 0 1 I1 O2
1 0 1 0 0 1 0 03
1 1 0 0 1 0 0
1 1 1 1 0 0 0 EN
• Encoders do the opposite: from 2m input lines they build n-bit
output word (n = log2 m)
ELEC-H-473 Th01 46/63
Adders
• Adders are possibly the most used arithmetic circuits in any IC,
including microprocessors (and not only those used in ALUs)
• Assuming two operands a, b of arbitrary bit-length the sum
o = a + b will be calculated as follows:
ai oi
oi = ai ⊕ bi ⊕ ci , i ∈ 0, n − 1 bi FADD ci+1
ci+1 = ai · bi + ci (ai + bi ) ci
where oi is the sum bit and ci is the cary bit; note that the result
word has one bit more than argument, because of the sum
. Expressions above are implemented in Full-ADDder circuit (FADD)
• We see that to compute the last bit of the sum i = n − 1 we need
ci to be propagated all the way from c0
. Circuit delay is proportional to n, so for big n (64 or 128 bits,
not uncommon these days) performance of the circuit is low
ELEC-H-473 Th01 47/63
Advanced adders 1/2
• Carry-lookahead (CLA) tend to improve adder PPAC, by explicitly
calculating the cary bit as a function of inputs only (we stretch
computation in space, opposite to computation serialization)
• Looking at ci+1 = xi yi + ci (xi + yi ), we say:
gi = xi yi
pi = xi + yi
• Then ci+1 = gi + pi ci can be re-written as:
c3 = g2 + p2 c2
c1 = g0 + p0 c0 = g2 + p2 (g1 + p1 g0 + p1 p0 c0 )
c2 = g1 + p1 c1 = g2 + p2 g1 + p2 p1 g0 + p2 p1 p0 c0
= g1 + p1 (g0 + p0 c0 ) c4 = g3 + p3 c3
= g1 + p1 g0 + p1 p0 c0 = g3 + p3 (g2 + p2 g1 + p2 p1 g0 + p2 p1 p0 c0 )
= g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0 + p3 p2 p1 p0 c0
ELEC-H-473 Th01 48/63
Advanced adders 2/2
• Note that in the above expressions pi and gi depend on inputs xi
and yi only, there is no computational serialization
• Efficiency of a circuit could be better, however for larger x, y
words (>8 bits so for typical operand sizes these days) we may
experience quickly combinatorial explosion: expressions will
become more complex, and thus less performant
• Parallel prefix adders – improve computation of ci+1 by
introducing parallelization; depending on the approach adopted
trade-off is made for key circuit parameters: area, speed, routing
resources; many different variants, some key implementations:
. Brent-Kung – minimum area (low-power), but poor performance
. Kogge-Stone – more area, but better performance
. Sklansky – minimal depth (best performance)
ELEC-H-473 Th01 49/63
ALUs
• ALUs are main components in microprocessors: they perform
arithmetic (addition, subtraction) & logic (AND, OR, NOT)
• Note the absence of multiplications or divisions, generally
performed using dedicated logic circuit for performance (HW)
• Assuming 3-bit opcodes, operands A & B table below indicates
possible encoding of operations (choice is arbitrary):
i2 i1 i0 Name Operation i0 i1 i2
0 0 0 Pass A O← A
0 0 1 A AND B O← A & B
0 1 0 A OR B O← A | B
A
0 1 1 A’ O ← NOT(A)
O
1 0 0 A+B O ← A+B ALU
1 0 1 A-B O ← A-B
1 1 0 A+1 O ← A+1 B
1 1 1 A–1 O ← A-1
ELEC-H-473 Th01 50/63
Counters
• Circuit examples we have seen so far are combinatorial, thus with
limited power of expressivity; counters are sequential circuits, and
like adders are very common in many different logic circuits
including CPUs
• Many different implementations exist, depending on functional
requirements: max count value, count direction, encoding for
output values, etc.
• Generic approach uses half-adder & D-FFs; the use of half
adder/subtractor circuit will enable up/down counter
ai ci+1 ai ci+1 ai ci+1 ai ci+1
Count
bi HADD bi HADD bi HADD bi HADD
oi oi oi oi
D Q D Q D Q D Q
O0 O1 O2 O3
Clk Clk Clk Clk
Clk
ELEC-H-473 Th01 51/63
Registers
• Multiple D-FFs can be grouped together to form words of data: n-bit
word can be stored using n D-FFs that share some inputs
• Various D-FFs are independent circuits, rising clock edge will “push”
data to the output in parallel, complete word appears on the data bus
• Other than D there could be other control inputs (common for all
D-FFs): obviously Clk, but also EN, RST etc.
• Example below shows 4-bit register with EN and RST:
D0 D1 D2 D3
D Q D Q D Q D Q
EN EN EN EN
Clk Clk Clk Clk
Rst
Clk
Q0 Q1 Q2 Q3
• From above, different register variants may be implemented: shift
registers (serial IO), serial-parallel IO for data (de)serialization, etc.
ELEC-H-473 Th01 52/63
Register files
• Multiple registers can be bundled together for storage of arbitrary
number of data words in an array – Register Files (RF)
• Since D-FFs are quite big circuits, best PPAC for RFs is obtained
for small word count (few dozens at most); thus, the number of
address bits is quite low (other technologies will be used for
memories with high capacity)
• RFs are typically used to provide operands to ALU; figure below
indicates a top-level schematic block of 8×8-bit words RF:
. EN – activates the RF (Clk –
EN 8
6 A
not shown)
Add
O
RF ALU . R/W’ – decides on RF access
8
DATA 8 direction (exclusive R or W)
B
R/W'
. Upper 3 bits of Add address
output A, & lower output B
ELEC-H-473 Th01 53/63
5. Modeling and implementation of logic
circuits today
ELEC-H-473 Th01 54/63
Motivation and solution for complex IC modeling
• Logic circuits synthesis using standard manual methods can’t be
applied for implementation of complex logic circuits composed of
millions of gates enabled by CMOS technology in recent years ICs
• We need to enable logic circuit abstraction using behavioral models
and use design automation for implementation process; design
automation relies on highly parallel computers (with loads of memory)
to enable reasonable run-time
• Design modeling & implementation use Hardware Description
Languages (HDLs) and Electronic Design Automation (EDA) SW tools
• HDLs enable formal specification of logic circuits & automated
transformation of these models into descriptions that can be used for
practical circuit implementation (e.g. ASIC or FPGA)
• Formal specifications imply that the HDL model is complete, precise
and without any ambiguities – this is crucial since in this step we
translate verbal specifications (likely to be inaccurate since given in
human language) to logic circuit models (that must be accurate)
ELEC-H-473 Th01 55/63
Practical HDLs
• Most commonly used HDLs are VHDL, Verilog, System Verilog
(the extension of the previous); here we focus on VHDL (maybe
you have followed ELEC-H409)
• VHDL stands for VHSIC HDL where VHSIC = Very High Speed
Integrated Circuits and it has been developed by the Department
of Defense of USA in early ’80 to improve digital circuit design
• HDLs enable behavioral synthesis as opposed to traditional
manual logic synthesis that is lengthy, error prone & limited to
circuits of a small size (dozens of variables max)
• Designer describes the circuit behavior, rather than how it should
be implemented (gates), and the software will do the rest
• Do not misunderstand this: designer specifications need to be
good to get a good circuit out of it → in HW there is no magic;
components that are key for performance (e.g. arithmetic ops)
will be carefully designed by “hand” even if synthesis is automated
ELEC-H-473 Th01 56/63
Practical EDA
• High-level behavioral models are pushed through a series of
software tools that transform initial system specification into
physically implementable integrated circuit
• EDA tools implement sophisticated algorithms to create a “good”
circuit; a good circuit provides desired trade-off among many
parameters that are often condensed in Performance, Power, Area
& Cost (PPAC) metrics
• Note that the use of “good”, and not “optimal” circuit, reflects a
very important fact: these days optimization of one objective will
necessarily compromise the other, so multi-criteria &
multi-objective optimization paradigm are in game
• Typical example: you can’t aim for performance & low-power at
the same time; these two objectives are orthogonal
• Circuit implementation, or design flow is typically split into two
phases: synthesis and Place & Route
ELEC-H-473 Th01 57/63
Circuit synthesis and functional simulation
• Synthesis transforms abstracted, high-level, behavioral HDL
models into logic, described in a netlist – a text file that contains
all primitive Boolean logic operators (gates) & their connections
• Main goal of the synthesis is to obtain correct HDL specification
of the circuit, that will behave as expected
• To confirm logic correctness and validate the HDL model
designers use functional simulation to calculate outputs for a given
set of input vectors (this is of course automated using computers)
• During synthesis some electrical parameters will be taken into
account, but in a approximated way, since the circuit is still at
higher abstraction level
• Initial PPA parameters are assessed with respect to initial
constraints to evaluate the circuit feasibility; if at this stage the
circuit doesn’t meet constraints, there is no point in continuing;
later implementation steps generally make things worse
ELEC-H-473 Th01 58/63
Circuit implementation
• Placement decides on (x, y ) position of each gate in the design,
assuming rectangular (often squared) Silicon real-estate
• Clock tree synthesis connects Clk pins of all FFs in the design;
the idea is to provide clock signal with minimum skew, jitter and
interconnect length; not simple if you assume realistic FF count
• Routing implement wires that connect all gates in the design
• Throughout different steps initial netlist will be transformed into
implementable design database with detailed views that include:
. Electrical views – used to checking timing (combinatorial, wire,
set-up/hold delays, etc.), currents (fan-out, density, max currents
etc.), power dissipation and delivery, temperature (average &
hotspots), mechanical etc. etc. etc.
. Geometrical views – all shapes need to be conform for the CMOS
manufacturing process as indicated by the IC foundry
• These models are exhaustively analyzed for final system validation,
before the design can be shipped for IC manufacturing (sign-off)
ELEC-H-473 Th01 59/63
Digital system seen as RTL
• In general a circuit is composed of combinatorial logic (even in
sequential circuits) and Flip-Flops (even in combinatorial circuits)
• Register Transfer Level or Logic (RTL) – is a system view in
which we consider data flow of signals going through an arbitrary
number of combinatorial circuits separated with Flip-Flops (FFs)
sharing the same (but sometimes different) clock sources
Inputs Outputs
CL CL CL
FF FF FF FF
Clk
• Each “sub-system” appears as independent circuit, and all circuits
work concurrently (in parallel) → important for microprocessors
. What needs to be fulfilled to maximize the system performance?
ELEC-H-473 Th01 60/63
Illustration: from RTL to netlist
ELEC-H-473 Th01 61/63
Illustration: placed & routed design
ELEC-H-473 Th01 62/63
Microprocessors design as logic circuit
• Today all microprocessors, from tiny micro-controllers to super
powerful CPUs are designed using RTL and HDLs
• Complex software tools are used to transform initial HDL models
into implantable design databases that can be shipped to a
manufacturer who fabricates the Application Specific Integrated
Circuit (ASICs) using CMOS technology (we will see this)
• Huge teams of engineers work on a very specific task in the whole
process for significant amount of time; typical IC system design
cycle is ∼year, for approximately ∼20% of new functionality, this
means that ∼80% of system specification is legacy HDL (RTL
re-use from previous generation products)
• All this is quite expensive, hence requires large markets (sale
volumes) to compensate for investments; at the end of the day
the activity could generating significant profits (for some!)
• For us, circuit logic circuit design & implantation will have a
strong impact on processor & computer systems architecture
ELEC-H-473 Th01 63/63