Design and Implementation of Signal Processing Systems: An Introduction
Design and Implementation of Signal Processing Systems: An Introduction
Outline
What is signal processing? Implementation Options and Design issues:
General purpose (micro) processor (GPP)
Multimedia enhanced extension (Native signal processing)
Systolic array and design methodologies Mapping algorithms to array structures Low power design
Native signal processing and multimedia extension Programmable DSPs Very Long Instruction Word (VLIW) Architecture Re-configurable computing & FPGA Signal Processing arithmetics: CORDIC, and distributed arithmetic. Applications: Video, audio, communication
What is Signal?
A SIGNAL is a measurement of a physical quantity of certain medium. Examples of signals:
Visual patterns (written documents, picture, video, gesture, facial expression) Audio patterns (voice, speech, music) Change patterns of other physical quantities: temperature, EM wave, etc.
Modality:
Different modes of signals over the same or different media. Examples: voice, facial expression and gesture.
Transformation Filtering Detection Estimation Recognition and classification Coding (compression) Synthesis and reproduction Recording, archiving Analyzing, modeling
Speech
Coding Recognition Synthesis Translation
Imaging:
Digital camera, scanner HDTV, DVD
A/D
Requirements:
Real time
Processing must be done before a pre-specified deadline.
Application-Specific Integrated Circuits (ASIC) Re-configurable computing with field-programmable gate array (FPGA)
High throughput
Fast data input/output Fast manipulation of data
Faster algorithms
Reduce the number of arithmetic operations Reduce the number of bits to represent each data Most important example:
1 N
n 0 N 1 k 0
x(n) exp[
2nk ] N 2nk ] N
X (k ) exp[
Evolution of Micro-Processor
Micro-processors implemented a central processing unit on a single chip. Performance improved from 1MFLOP (1983) to 1GFLOP or above Word length (# bits for register, data bus, addr. Space, etc) increases from 4 bits to 64 bits today.
Clock frequency increases from 100KHz to 1GHz Number of transistors increases from 1K to 50M Power consumption increases much slower with the use of lower supply voltage: 5 V drops to 1.5V
MMX (multimedia extension instructions): special instructions for accelerating multimedia tasks. May share the same data-path with other instructions,
or work on special hardware modules.
Make use sub-word parallelism Reduce hardware cost! to improve numerical May not be feasible for calculation speed. extremely high throughput tasks. Implement DSP-specific It is interfering with other tasks arithmetic operations, eg. because GPP is tied up with NSP Saturation arithmetic tasks. operations.
PDSPs were developed to fill a market segment between GPP and ASIC:
GPP flexible, but slow ASIC fast, but inflexible
Main applications:
Video signal processing, MPEG, H.324, H.263, etc. 3D surround sound Graphic engine for 3D rendering
Features:
Multi-processing system with a GPP core plus multiple function modules VLIW-like instructions to promote instruction level parallelism (ILP) Dedicated I/O and memory management units.
Use of FPGA
Rapid prototyping: run fractional ASIC speed without fabrication delay. Hardware accelerator: using the same hardware to realize different function modules to save hardware Low quantity system deployment
Characteristics
High density:
Reduced feature size: 0.25m -> 0.16 m % of wire/routing area increases
Impacts:
Design methodology Performance Power
High complexity:
Increased transistor count: 10M transistors and higher Shortened time-to-market delay: 6-12 months
Design Issues
Given a DSP application, which implementation option should be chosen? For a particular implementation option, how to achieve optimal design? Optimal in terms of what criteria? Software design:
NSP/MMX, PDSP/MSP Algorithms are implemented as programs. Often still require programming in assembly level manually
Hardware design:
ASIC, FPGA Algorithms are directly implemented in hardware modules.
Implementation
Assignment: Each operation can be realized with
One or more instructions (software) One or more function modules (hardware)
A Design Example
Consider the algorithm: Operations:
Multiplication Addition
y
Program:
a(k ) x(k )
k 1
Dependency
y(k) depends on y(k-1) Dependence Graph:
a(1) x(1) a(2) x(2)
a(n) x(n)
*
y(0)
y(n)
Hardware Implementation:
Map each * op. to a multiplier, and each + op. to an adder. Interconnect them according to the dependence graph:
a(n) x(n)
*
y(0)
y(n)
Observations
Eventually, an implementation is realized with hardware. However, by using the same hardware to realize different operations at different time (scheduling), we have a software program!
Bottom line Hardware/ software codesign. There is a continuation between hardware and software implementation. A design must explore both simultaneously to achieve best performance/cost tradeoff.
Algorithm Reformulation
Matching algorithm to architectural features
Similar to optimizing assembly code Exploiting equivalence between different operations
Reformulation methods
Equivalent ordering of execution:
(a+b)+c = a+(b+c)
For regular iterative algorithms and regular processor arrays --> algebraic mapping.
15
Arithmetic
CORDIC
Compute elementary functions
Distributed arithmetic
ROM based implementation
Redundant representation
eliminate carry propagation
In
D1 Q1
D2 Q2
D3 Q3
D4 Q4
Out
Clk
In
D1 Q1
D2 Q2
D3 Q3
D4 Q4
Out
Clk
Polynomial: 1 + x3 + x4
In
Out
D1 Q1
D2 Q2
D3 Q3
D4 Q4
Clk
Setup Time - Feedback for D1 has to go through N XORs before arriving. N Logic delays slows down circuit performance (may need to run at speed). Solution is to have many-input XOR feeding D1 input (1 logic level). State 000 is illegal. When FFs power up, they must be initialized with valid data. Solution is to use XNORs instead. Still produces a PRBS but all zeros is a valid state.
In
D1 Q1
D2 Q2
D3 Q3
D4 Q4
Clk
D1 Q1
D2 Q2
D3 Q3
D4 Q4
Feedback
Refer to Dr. Perkowskis Built-In Self Test Presentation in Test Class for more information.
D1 Q1
Clk
D2 Q2
D3 Q3
D4 Q4
Out_sel[0:1]
Out
HA a1b2 HA a1b1
HA a1b0
HA a2b2
FA a2b1 FA a2b0
HA
a3b2
FA a3b1
FA a3b0
HA a3b3
FA a2b3 FA a1b3
c7
c6
c5
c4
c3
c2
c1
c0
Product Terms
Digital Systems Principals and Applications, Ronald J. Tocci, Prentice Hall 1995, pg 280
Types of Multipliers
Standard Binary Multiplier (ones complement, twos
complement, universal, etc...)
Multiplier Applications
General Purpose Computing Digital Signal Processing
Finite Impulse Response Filters Convolution
12 0000 16
x[7:0] 8
ROM
Look - Up Table 0 1k 2k 3k . . 15k 12
0000 16
A D D
Y[15:0] 16
References
Digital Systems Principals and Applications, Ronald J. Tocci, Prentice Hall 1995, pg 278-282 Xilinx Application Note (XAPP 054). Constant Coefficient Multipliers for XC4000E. http://www.xilinx.com/xapp/xapp054.pdf Altera Application Note (AN 82). Highly Optimized 2-D convolvers in FLEX Devices. http://www.altera.com/document/an/an082_01.pdf Computer Arithmetic Principles, Architecture, and Design, Kai Hwang, John Wiley & Sons, Inc. 1979, pg129-212
References
Dr. Perkowski. Design for Testability Techniques (Built-In Self-Test) presentation. http://www.ee.pdx.edu/~mperkows/CLASS_TEST_99/BIST.PDF Digital Communications Fundamentals and Applications, Bernard Sklar, Prentice Hall 1988, Pg 290-296, Pg 546-555 Xilinx Application Note (XAPP 052). Efficient Shift Registers, LFSR Counters, and Long Pseudo-Random Sequence Generators. http://www.xilinx.com/xapp/xapp052.pdf Sun Microsystems sponsored EDAcafe.com website. Chapter 14 Test. http://www.dacafe.com/Book/CH14/CH14.htm
Sources
Yu Hen Hu Andrew Iverson, ECE 572