Lec08 DSP
Lec08 DSP
Vector Summary
Vector is alternative model for exploiting ILP If code is vectorizable, then simpler hardware, more energy efficient, and better real-time model than Out-of-order machines Design issues include number of lanes, number of functional units, number of vector registers, length of vector registers, exception handling, conditional operations Will multimedia popularity revive vector architectures?
DAP Spr.98 UCB 2
Microcontrollers
Extremely cost sensitive Small word size - 8 bit common Highest volume processors by far Automobiles, toasters, thermostats, ...
Increasing Volume
DSP Outline
Intro Sampled Data Processing and Filters Evolution of DSP DSP vs. GP Processor Lecture material based Introduction to Architectures for Digital Signal Processing lecture by Bob Brodersen
www.cs.berkeley.edu/~pattrsn/152F97/slides/CS152_dsp.pdf Will refer to page from his lecture as RB: i
DSP Introduction
Digital Signal Processing: application of mathematical operations to digitally represented signals Signals represented digitally as sequences of samples Digital signals obtained from physical signals via tranducers (e.g., microphones) and analogto-digital converters (ADC) Digital signals converted back to physical signals via digital-to-analog converters (DAC) Digital Signal Processor (DSP): electronic system that processes digital signals
DAP Spr.98 UCB 5
Algorithms
Frequency domain filtering - FIR and IIR Frequency- time transformations - FFT Correlation
DAP Spr.98 UCB 6
The audio data streams from the source (computer) through the digital analysis and synthesis Hard realtime requirement - the processing must be done at the sample rate
Who Cares?
DSP is a key enabling technology for many types of electronic products DSP-intensive tasks are the performance bottleneck in many computer applications today Computational demands of DSP-intensive tasks are increasing very rapidly In many embedded applications, generalpurpose microprocessors are not competitive with DSP-oriented processors today 1997 market for DSP processors: $3 billion
DAP Spr.98 UCB 9
Different history and different applications led to different terms, different metrics, some new inventions Increasing markets leading to cultural warfare
DAP Spr.98 UCB 10
Most demand good performance All demand low cost Many demand high energy efficiency Trends are towards better support for these (and similar) major applications.
DAP Spr.98 UCB 12
+
Delay/Storage is Delay or
or z1 D
DAP Spr.98 UCB 15
CS 252 Administrivia
Selected projects last week Upcoming events in CS 252 20-Feb DSP/Multimedia Processors #2 (Fri) 25-Feb Memory Hierachy: Caches; Meeting signup 25-Feb Project Survey due (Wed) 26-Feb HW #2 due by 5:00 PM (Thu) 27-Feb Memory Hierarchy Example; 6 minute Proj. Meetings 3:405:40 4-Mar Quiz 1 (5:30PM 8:30PM, 306 Soda) (Wed) Pizza at LaVals 8:30 10PM
M most recent samples in the delay line (Xi) New sample moves data down delay line Tap is a multiply-add Each tap (M+1 taps total) nominally requires:
Two data fetches Multiply Accumulate Memory write-back to update delay line
DAP Spr.98 UCB 18
Multiplier P-Register
. .
radix point
-1 x < 1
radix point
Blocked Floating Point single exponent for a group of fractions Floating point support simplify development
Set to most positive (2N11) or most negative value(2N1) : saturation Many algorithms were developed in this model
DSP Memory
FIR Tap implies multiple memory accesses DSPs want multiple data ports Some DSPs have ad hoc techniques to reduce memory bandwdith demand
Instruction repeat buffer: do 1 instruction 256 times Often disables interrupts, thereby increasing interrupt responce time
DSP Addressing
Have standard addressing modes: immediate, displacement, register indirect Want to keep MAC datapth busy Assumption: any extra instructions imply clock cycles of overhead in inner loop => complex addressing is good => dont use datapath to calculate fancy address Autoincrement/Autodecrement register indirect
lw r1,0(r2)+ => r1 <- M[r2]; r2<-r2+1 Option to do it before addressing, positive or negative
What can do to avoid overhead of address checking instructions for FFT? Have an optional bit reverse address addressing mode for use with autoincrement addressing Many DSPs have bit reverse addressing for radix-2 DAP Spr.98 UCB 33 FFT
DSP Instructions
May specify multiple operations in a single instruction Must support Multiply-Accumulate (MAC) Need parallel move support Usually have special loop support to reduce branch overhead
Loop an instruction or sequence 0 value in reigster usually means loop maximum number of times Must be sure if calculate loop count that 0 does not mean 0
May have saturating shift left arithmetic May have conditional execution to reduce branches
DAP Spr.98 UCB 34
DSPs that fail are often claimed to be good for something other than the highest volume application, but that's just designers fooling themselves. Very recently convention wisdom has changed so that you try to do everything you can digitally at low voltage so as to save energy.
3 years ago people thought doing everything in analog reduced power, but advances inlower power digital DAP Spr.98 UCB 35 design flipped that bit.
Generations of DSPs
JB Slides 19, 21, 25, 29, 31, 32, 33
www.cs.berkeley.edu/~pattrsn/152F97/slides/ slides.evolution.pdf
Zero overhead loops and repeat instructions I/ O support Serial and parallel ports
Weird things
Circular addressing Reverse addressing
Special instructions
shift left and saturate (arithmetic left-shift)
DAP Spr.98 UCB 40
Conclusions
DSP processor performance has increased by a factor of about 150x over the past 15 years (~40%/year) Processor architectures for DSP will be increasingly specialized for applications, especially communiction applications General-purpose processors will become viable for many DSP applications Users of processors for DSP will have an expanding array of choices Selecting processors requires a careful, application-specific analysis DAP Spr.98 UCB 41