Digital Testing:
Scan-Path Design
07/03/16
Based on text by S. Mourad
"Priciples of Electronic Systems"
Outline
Problems with sequential testing
What is scan
Types of scan
Types of storage devices
Scan Architectures
Cost of Scan
Partial Scan
Problems with Sequential Machine Testing
Complexity of testing sequential circuits
due to
feedback loops
placement of the circuit in a known state
high chance for hazard, essential hazard
Timing problems
The Scan-path Technique
Scan-path design is to reduce test generation
complexity for circuit containing storage devices
and feedback path with combinational logic
The philosophy is to divide & conquer with the
purpose to :
1. Set any internal state easily
2. Observe any state through a distinguishing
sequence
What is Scan Design
This DFT technique is used mainly for testable
synchronous circuit
We assume the use of D-flip-flops only
A mux is placed at the input of each flip-flop
in such a way that all flip-flops can be
connected in a shift register for one mux
selection and to work in a normal mode in the
other
Adding Scan Structure
PI
PO
Combinational
SFF
logic
SFF
SCANOUT
SFF
TC or TCK
SCANIN
SFFs (scan Flipflops)
Not shown: CK
Connect the SFF in a shift register and test
them
Testing strategy
Principle of scan path
Each input to the
FF is considered
an output of the
combinatorial
circuit
each output of the
FF is an input to
the circuit
Testing strategy
Example:
A realization for the double-throw switch
Internal state can be determined via the scan-out
output.
The procedure for circuit testing :
1. Set c =1 to switch the circuit to shift register mode
2. Check operation as a shift register by using
scan-in inputs , scan-out output and the clock
3. Set the initial state of the shift register
4. Set c = 0 to return to normal mode
5. Apply test input pattern to the combinational logic
6. Set c=1 to return to shift register mode
7. Shift out the final state while setting the starting
state for the next test
8. Go to step 4.
Testing the Combinational Part
Repeat until all patterns are applied.
a. Set SE = 1, shift in the initial values on the flipflops.
(These are the signals at the output of the
latches for the first test pattern.)
b. SE = 0, apply a pattern at the primary inputs.
c. Clock the circuit once and observe the results
at the primary outputs.
d. Clock the circuit M times.
End repeat.
An Example
Combinational Circuit
y2
F
G1
F
y1
G2
G3
D
2
1
Clk
Clk
Clk
Inserting the Muxes
Combinational Circuit
y2
G1
F
y1
0
1
D
1
Clk
Clk
Scan-in
G3
G2
0
1
Z
0
1
D
Q
2
Clk
Scan-out
Types of Storage Devices
Multiplexed input flip-flop
Two-port flip-flop works with two
nonoverlapping clocks
Latch-based Scan Design: requires
2-latches clocked with non-overlapping clocks
3-latch clocked with three phases
Scan Design Architectures
Several architectures:
Multiplexed flip-flops design
Level-sensitive scan design
Scan set scan design
Derivative of scan design:
Parallel scan chains
Partial scan
Level Sensitive Scan Design LSSD (IBM)
Level sensitive means that state changes in FSM
are independent of delays nor order of changes
in input signals (if inputs are set to new values)
Scan is ability to shift into or out of any state
All internal storage is implemented using hazard
free polarity-hold switches
Level-sensitive Latch
The latch works with the 3
phases A, B and C
For normal operation,
clocks B and C
D
C
L1
SCAN-IN
For shift operation, clocks
B and A
Two-port flip-flop works
with two non-overlapping
clocks
L1
L2
B
(b)
L2
Polarity Hold Latch (IBM)
D
L1
L2
C
Scan-in
SCAN-IN
D
C
L1
L1
L2
B
B
(a)
A
(b)
L2
LSSD design rules
Storage is implemented by hazard free
polarity-hold latches
latches controlled by non-overlapping clocks
clock inputs to SRLs must be easily controlled
clock feeds clock inputs only
all SRL must be connected into shift registers
input sensitizing conditions
Level-Sensitive Scan-Design Latch
Master latch
Slave latch
D
Q
MCK
Q
D flip-flop
SD
MCK
TCK
overhead
TCK
MCK
TCK
Scan
mode
Logic
Normal
mode
SCK
SCK
LSSD: Testing
Repeat until all patterns are applied.
a. Apply a pattern at the primary inputs.
b. Clock C; then clock B once and observe the results at
the primary outputs.
c. Shift out the response
Apply the initialization for the next pattern at SI.
Clock A, then clock B, M times.
Observe at the primary outputs and the SO pins.
Multiple Scan Chains
Instead of stringing all the flip-flops or the
latches in one shift register
Partition them is several chains
The advantages are:
compatible with multiple clock designs
Shorten test application time
Simplify the stitching of the flip-flops
But, may require extra pins
The Cost of Scan Design
Area (muxes, extra routing)
Additional I/Os
Performance, delays within the flip-flops
Heat when testing at speed
Multiple Scan Registers
Scan flip-flops can be distributed among any number
of shift registers, each having a separate scanin and
scanout pin.
Test sequence length is determined by the longest
scan shift register.
Just one test control (TC) pin is essential.
PI/SCANIN
Combinational
logic
SFF
SFF
TC
CK
SFF
M
U
X
PO/
SCANOUT
Scan Overhead
IO pins: One pin necessary.
Area overhead:
Gate overhead = [4 nf/(ng+10nf)] x 100%,
where ng = comb. gates; nf = flip-flops;
Example
ng = 100k gates, nf = 2k flip-flops, overhead = 6.7%.
More accurate estimate must consider scan wiring
and layout area.
Performance overhead:
Multiplexer delay added in combinational path;
approx. two gate-delays.
Flip-flop output loading due to one additional
fanout; approx. 5-6%.
Hierarchical Scan
Scan flip-flops are chained within
subnetworks before chaining subnetworks.
Advantages:
Automatic scan insertion in netlist
Circuit hierarchy preserved helps in debugging and
design changes
Disadvantage: Non-optimum chip layout.
Scanin
SFF4
SFF1
Scanout
Scanin
SFF2
SFF3
Hierarchical netlist
SFF1
SFF3
Scanout
SFF4
SFF2
Flat layout
Scan Area Overhead
Linear dimensions of active area:
X = (C + S) / r
X = (C + S + aS) / r
Y = Y + ry = Y + Y(1--b) / T
Area overhead
XY--XY
= -------------- x 100%
XY
1--b
= [(1+as)(1+ -------) 1] x
100%
T
1--b
= (as + ------- ) x 100%
y = track dimension, wire
width+separation
C = total comb. cell width
S = total non-scan FF cell
width
s = fractional FF cell area
= S/(C+S)
a = SFF cell width fractional
increase
r = number of cell rows
or routing channels
b = routing fraction in active
area
T = cell height in track
dimension y
Example: Scan Layout
2,000-gate CMOS chip
Fractional area under flip-flop cells, s = 0.478
Scan flip-flop (SFF) cell width increase, = 0.25
Routing area fraction, = 0.471
Cell height in routing tracks, T = 10
Calculated overhead = 17.24%
Actual measured data:
Scan implementation
Area overhead
Normalized clock rate
______________________________________________________________________
None
0.0
1.00
Hierarchical
16.93%
0.87
Optimum layout
11.90%
0.91
ATPG Example: S5378
Original
Number of combinational gates
Number of non-scan flip-flops (10 gates each)
Number of scan flip-flops (14 gates each)
Gate overhead
Number of faults
PI/PO for ATPG
Fault coverage
Fault efficiency
CPU time on SUN Ultra II, 200MHz processor
Number of ATPG vectors
Scan sequence length
2,781
179
0
0.0%
4,603
35/49
70.0%
70.9%
5,533 s
414
414
Full-scan
2,781
0
179
15.66%
4,603
214/228
99.1%
100.0%
5s
585
105,662
Automated Scan Design
Rule
violations
Behavior, RTL, and logic
Design and verification
Scan design
rule audits
Combinational
ATPG
Gate-level
netlist
Develop test
Develop design
Scan hardware
insertion
Scan
netlist
Combinational
vectors
Scan sequence
and test program
generation
Test program
Scan chain order
Design and test
data for
manufacturing
Chip layout: Scanchain optimization,
timing verification
Mask data
Timing and Power
Small delays in scan path and clock skew
can cause race condition.
Large delays in scan path require slower
scan clock.
Dynamic multiplexers: Skew between TC
and TC signals can cause momentary
shorting of D and SD inputs.
Random signal activity in combinational
circuit during scan can cause excessive
power dissipation.
Scan Summary
Scan is the most popular DFT technique:
Rule-based design
Automated DFT hardware insertion
Combinational ATPG
Advantages:
Design automation
High fault coverage; helpful in diagnosis
Hierarchical scan-testable modules are easily combined
into large scan-testable systems
Moderate area (~10%) and speed (~5%) overhead
Disadvantages:
Large test data volume and long test time
Basically a slow speed (DC) test
Overview: Partial-Scan & Scan Variations
Definition
Partial-scan architecture
Scan flip-flop selection methods
Cyclic and acyclic structures
Partial-scan by cycle-breaking
Scan variations
Scan-hold flip-flop (SHFF)
Summary
Partial-Scan Definition
A subset of flip-flops is scanned.
Objectives:
Minimize area overhead and scan sequence
length, yet achieve required fault coverage
Exclude selected flip-flops from scan:
Improve performance
Allow limited scan design rule violations
Allow automation:
In scan flip-flop selection
In test generation
Shorter scan sequences
Partial-Scan Architecture
PI
PO
Combinational
circuit
CK1
FF
CK2
FF
SCANOUT
SFF
TC
SFF
SCANIN
What & Why Partial Scan Design
To scan only a subset of the flip-flops
The circuit is easier to test by the sequential
ATPG.
The area overhead is minimized.
The placement of the flip-flops is such that
the interconnects are minimized.
The delays are shortened.
Partial Scan Design
To scan only a subset of the flip-flops
How to select this subset?
It is an NP-complete problem
Heuristics on graph model to select the
minimum feedback vertex set (MFVS) to
transform the FSM into an acyclic graph
Scan Flip-Flop Selection Methods
Testability measure based
Use of SCOAP: limited success.
Structure based:
Cycle breaking
Balanced structure
Sometimes requires high scan percentage
ATPG based:
Use of combinational and sequential TG
Cycle Breaking
Difficulties in ATPG
S-graph construction and
MFVS problem
Test generation and test
statistics
Partial vs. full scan
Partial-scan flip-flop
Difficulties in Seq. ATPG
Poor initializability.
Poor controllability/observability of state
variables.
Gate count, number of flip-flops, and sequential
depth do not explain the problem.
Cycles are mainly responsible for complexity.
An ATPG experiment:
Circuit
Number of
gates
Number of
flip-flops
Sequential
depth
TLC
355
21
14*
1,112
39
14
Chip A
ATPG
CPU s
Fault
coverage
1,247
89.01%
269
98.80%
* Maximum number of flip-flops on a PI to PO path
Benchmark Circuits
Circuit
PI
PO
FF
Gates
Structure
Sequential depth
Total faults
Detected faults
Potentially detected faults
Untestable faults
Abandoned faults
Fault coverage (%)
Fault efficiency (%)
Max. sequence length
Total test vectors
Gentest CPU s (Sparc 2)
s1196
14
14
18
529
Cycle-free
4
1242
1239
0
3
0
99.8
100.0
3
313
10
s1238
14
14
18
508
Cycle-free
4
1355
1283
0
72
0
94.7
100.0
3
308
15
s1488
8
19
6
653
Cyclic
-1486
1384
2
26
76
93.1
94.8
24
525
19941
s1494
8
19
6
647
Cyclic
-1506
1379
2
30
97
91.6
93.4
28
559
19183
Relevant Results
Theorem 8.1: A cycle-free circuit is always
initializable. It is also initializable in the
presence of any non-flip-flop fault.
Theorem 8.2: Any non-flip-flop fault in a cyclefree circuit can be detected by at most dseq +
1 vectors (dseq is the sequential depth).
ATPG complexity: To determine that a fault is
untestable in a cyclic circuit, an ATPG program
using nine-valued logic may have to analyze
9Nf time-frames, where Nf is the number of
flip-flops in the circuit.
Cycle-Free Example
Circuit
F2
2
F3
F1
Level = 1
s - graph
F2
2
F1
F3
Level = 1
dseq = 3
All faults are testable. See Example 8.6.
A Partial-Scan Method
Select a minimal set of flip-flops for
scan to eliminate all cycles.
Alternatively, to keep the overhead low
only long cycles may be eliminated.
In some circuits with a large number of
self-loops, all cycles other than selfloops may be eliminated.
The MFVS Problem
For a directed graph find a set of vertices with
smallest cardinality such that the deletion of this
vertex-set makes the graph acyclic.
The minimum feedback vertex set (MFVS) problem
is NP-complete; practical solutions use heuristics.
A secondary objective of minimizing the depth of
acyclic graph is useful.
3
L=3
1
4
L=1
A 6-flip-flop circuit
s-graph
L=2
Test Generation
Scan and non-scan flip-flops are controlled from
separate clock PIs:
Normal mode Both clocks active
Scan mode Only scan clock active
Seq. ATPG model:
Scan flip-flops replaced by PI and PO
Seq. ATPG program used for test generation
Scan register test sequence, 001100, of length nsf + 4
applied in the scan mode
Each ATPG vector is preceded by a scan-in sequence to set
scan flip-flop states
A scan-out sequence is added at the end of each vector
sequence
Test length = (nATPG + 2) nsf + nATPG + 4 clocks
Partial Scan Example
Circuit: TLC
355 gates
21 flip-flops
Scan
Max. cycle Depth* ATPG
flip-flops
length
CPU s
Fault sim. Fault
CPU s
cov.
ATPG Test seq.
vectors
length
14
1,247
61
89.01%
805
805
10
157
11
95.90%
247
1,249
32
99.20%
136
1,382
10
13
100.00%
112
1,256
21
100.00%
52
1,190
* Cyclic paths ignored
Partial vs. Full Scan: S5378
Number of combinational gates
Number of non-scan flip-flops
(10 gates each)
Number of scan flip-flops
(14 gates each)
Gate overhead
Number of faults
PI/PO for ATPG
Fault coverage
Fault efficiency
CPU time on SUN Ultra II
200MHz processor
Number of ATPG vectors
Scan sequence length
Original
Partial-scan
Full-scan
2,781
179
2,781
149
2,781
0
30
179
0.0%
4,603
35/49
70.0%
70.9%
5,533 s
414
414
2.63%
4,603
65/79
93.7%
99.5%
727 s
1,117
34,691
15.66%
4,603
214/228
99.1%
100.0%
5s
585
105,662
Flip-flop for Partial Scan
Normal scan flip-flop (SFF) with multiplexer of
the LSSD flip-flop is used.
Scan flip-flops require a separate clock control:
D
MUX
SD
Master
latch
TC
Slave
latch
SFF
(Scan flip-flop)
CK
TC
CK
Normal mode
Scan mode
Scan Variations
Integrated and Isolated scan
methods
Scan path: NEC 1968
Serial scan: 1973
LSSD: IBM 1977
Scan set: Univac 1977
RAS: Fujitsu/Amdahl 1980
Random-Access Scan (RAS)
PI
PO
Combinational
logic
RAM
nff
CK
TC
SCANIN
bits
SCANOUT
SEL
Address decoder
ADDRESS
ACK
Address scan
register
log2 nff bits
RAS Flip-Flop (RAM Cell)
From comb. logic
SCANIN
D
SD
Q
Scan flip-flop
(SFF)
To comb.
logic
CK
TC
SEL
SCANOUT
RAS Applications
Logic test:
reduced test length.
Delay test:
Easy to generate single-input-change (SIC) delay tests.
Advantage:
RAS may be suitable for certain architecture, e.g.,
where memory is implemented as a RAM block.
Disadvantages:
Not suitable for random logic architecture
High overhead gates added to SFF, address
decoder, address register, extra pins and routing
Scan-Hold Flip-Flop (SHFF)
To SD of
next SHFF
SD
TC
SFF
Q
CK
HOLD
The control input HOLD keeps the output
steady at previous state of flip-flop.
Applications:
Reduce power dissipation during scan
Isolate asynchronous parts during scan test
Delay testing
Partial Scan Summary
Partial-scan is a generalized scan method;
scan can vary from 0 to 100%.
Elimination of long cycles can improve
testability via sequential ATPG.
Elimination of all cycles and self-loops
allows combinational ATPG.
Partial-scan has lower overheads (area
and delay) and reduced test length.
Partial-scan allows limited violations of
scan design rules, e.g., a flip-flop on a
critical path may not be scanned.