DSP Processors
DSP Processors
Comparison
• Since Fixed Point DS processor operates
using integer format, range of numbers get
limited leading to overflow problems. More
coding effort is needed to deal with such a
problem
– Choice for ASIC DSP (performance & small slice
area)
• Floating point offers wide range of data, but
requires complex circuitry hence more
expensive and slower than fixed point.
– Choice for prototyping or proof-of-concept
2
development
• Most floating point numbers perform
automatic normalization so that numbers are
properly shifted & aligned. The programmer
just needs to take care of overflow problem.
• But due to enormous dynamic range,scaling is
rarely needed.
• Floating point processors are easier to use than
fixed point processors but are more expensive
3
Comparison between fixed & floating point
processors
• 16 or 24 bit devices • 32 bit devices
• Limited dynamic range • Large dynamic range
• Overflow & quantization errors • Easier to program since no
must be resolved scaling is required
• Poorer C compiler efficiency. • Better C compiler efficiency. Can
Normally programed in be developed in C
assmbely
• Quick time to market
• Long product developement time
• Faster clock rate • Slower clock rate
• Less silicon area is required • More silicon area is required as
functional units are complex
• Cheaper
• More expensive
• Low power consumption
• High power consumption
• Bursty in nature
• High speed
4
Applications of fixed & floating
Processors
point
• Drive disc and motor • In radar,sonar & seismic
control applications
• Consumer audio • Highend audio applications
applications such as MP3
players,multimedia gaming
and digital cameras
• Speech coding/decoding and • Sound synthesis in
channel coding professional audio vedio
• Communication devices coding/decoding
such as modems & cellular
phones.
5
Sources of Error in DSPs
• DSP System: ADC, DSP device, DAC
• Accuracy depends on number of factors
contributed by ADC & DAC and how the
calculations are done in DSP device
• Errors in ADC & DAC : Quantization errors
(Limited by number of bits)
• Errors in DSP calculations: Finite word length
(Can be reduced by using larger word length
& by rounding instead of truncation)
6
Comparison between DSP and GPP
• Used for embedded • Desk top computing/Servers
applications
• Low power requirement • High Power consumption
• Have features required for
DSP applications (FFT,
Convolution, Correlation
etc.)
• Real time I/O
• High speed on chip
memories
• Bursty in nature
• Deals with infinite
continuous stream of data.
• Slow • High speed
• Has a typical MAC unit.
7
Digital Signal Processors
• Application Specific • Programmable
– Designed to perform one – Can be programmed for
function more accurately, different applications
faster and is more cost – Cost effective than GPP
effective – Architecture is designed
– Ex.: Digital filters, FFT for repetitive nature of
chips signal processing by
pipelining & parallelism
– Performs certain
operations like MAC
faster than GPP
8
• DSP Architectures
– General architectures
• Architectural aspects
– H/W and S/W aspects
– RISC, CISC
– Endianess
3
Digital Signal Processors
• Application Specific • Programmable
– Designed to perform one – Can be programmed for
function more accurately, different applications
faster and is more cost – Cost effective than GPP
effective – Architecture is designed
– Ex.: Digital filters, FFT for repetitive nature of
chips signal processing by
pipelining & parallelism
– Performs certain
operations like MAC
faster than GPP
4
Architecture models
• Von-Neumann
– Single memory space
– Inefficient for memory
intensive operations
5
Architecture models
• Harvard
– Split memory space,
separate prog. & data
buses
– Faster
Tripathi,ASE, Bangalore
66
6
Architecture models
• Modified Harvard
– Split memory space,
separate buses
– Parallel memory access
(using DARAM/DPRAM)
– Used in TMS320C54x,
1 program & 3 data buses
7
VLIW Architecture
P
• Very Long Multiported register file
R
Instruction Word
architecture O
• Ex. TMS320C6x O
N
L Instruction cache
I
T
8
Hardware Aspects
• CPU
– MAC
– ALU
– Shifter
– Pipelining and parallelism
– Buses
– Data address generator
• Memory
– DARAM/DPRAM/SARAM
• Multiport memories are costlier than multiple access due to
more number of pins and larger chip area but permit parallel
access of memory locations
– Cache
– ROM 9
Hardware Aspects…
• Peripherals and Input Output
– Serial port
• Standard serial port
• Buffered serial port
• TDM serial port
• Multi channel buffered serial port( auto-buffering unit supports high
speed transfers & reduces overhead of servicing interrupts)
– Host port interface
– DMA controller
– Parallel port
– Hardware timer
– Power management
• Clock frequency control
• Power-down mode
• Disabling of unused peripherals
10
Software Aspects
• Instruction set
– CISC: Complex Instruction Set Computing
– RISC: Reduced Instruction Set Computing
• Programming languages
– Assembly programs
– C programs
• Software development tools
– C compiler
– Assembler
– Linker
– Simulator
– Code Composer Studio (CCS)
11
RISC Vs CISC
• Instruction set is simple. • It is complex.
(typically <100 instructions) (>1000 instns.)
• Simple opcodes (ADD,SUB) • Instructions are tailored to
DSP(FIR,CONV,MACD)
• Compilers for HLL is shorter &
simple. Control unit is small • Compilers for HLL are costly. Control
(hard wired). unit is large (micro prog).
13
Eg:- 12345678 can be stored in 4x8bit locations as follows:
1000 12 78
1001 34 56
1002 56 34
1003 78 12
14
Endians in DSPs
15
Programmable DSP
• A programmable DSP device should provide
instructions similar to Microprocesors
• The computational capabilities provided by these
instructions should inlcude:
– Arithmetic operations like add,subtract &
multiply
– Logic operations like AND,OR,XOR & NOT
– MAC operations
– Signal scaling operations before & after DSP
16
Programmable DSP Cont..
• Support Architecture should include:
– On-chip registers for storage of
intermediate results
– On-chip memories for signal
samples(RAM)
– On-chip program memory for programs &
fixed data such as filter coefficients
17
Computational building
• blocks
Key issue: Speed and accuracy
• DSP computational building blocks
– Multiplier
– Shifter
– MAC Unit
– ALU
18
Computational Building Blocks of DSP
Computational building
• blocks
DSP computational building blocks
– Multiplier
– Shifter
– MAC Unit
– ALU
4
Shifter
• Required to scale down or scale up operands &
Amrita School of Engineering, Bangalore
• Saturation logic:
– Overflow occurs when accumulator result becomes
larger than the largest (smaller than the smallest(-
ve numbers) results in underflow)
– Accumulator contents are limited to the most +ve
or most –ve value to avoid error known as wrap-
around error.
13
TI DSP IC
TMS 320 C 5X
TMX : Experimental device
TMP : Prototype
TMS : Qualified device
5 :Generation
X :Version number- 0,1,2,3,4x,5,6,7 14
TI DSP Types
• Fixed Point DSPs (16 bit DSPs)
– TMS320C2000(C24X)
– TMS320C5000(C54X & C55X)
– TMS320C6000(C62X & C64X)
• Floating Point DSPs(32 bit
DSPs)
– TMS320C3x
– TMS320C4x
– TMS320C67x 15
Applications
16
Architecture of
TMS320C54x
Digital Signal Processing
Digital Representation of Signals &
Processing of these signals
Digital Representation: Conversion of natural
analog signals to digital by sampling &
quantization.
Signal Types: Analog & Digital
Ex. Speech, image & video, biomedical, music,
radar, seismic signals (low frequency) etc..
4
Processing: Analyze, modify or extract information
from signals.
Key operations:
Convolution,correlation,filtering,transformation &
modulation ……
All these involve MAC operation
Applications:
Image processing (enhancement, edge detection,
denoising,animation etc), data compression,speech
recognition & analysis, communication, music, home
appliances…..
5
Advantages of DSP
• Flexible ( Programmable)
• Less sensitive to tolerance of components
– The memory & processor are fairly independent of
temperature & aging
• Cheaper with better performance and compact
• Cascaded easily
• Easy Storage
• Low power consumption
• Single chip processors possible
• Some signal processing operations impossible
to implement using analog technology
– Low frequency signal processing possible
• ……..
6
Disadvantages of DSP
• Increased system complexity due to additional pre and
post processing devices like ADC, DAC and complex
digital circuitry
• Limited range of frequencies are available for
processing. Large bandwidth designs are too
expensive. BWs in the range of 100 MHZ are still
processed by analog methods.
• Design time is more
• Finite word length problems exist
7
Reference:
TMS320C54x manual available with
CCS Simulator
8
Architecture of C54x
• Fixed Point processor
• Advanced Harvard Architecture, CISC Processor
– Separate memory bus structures for program & data.
• High degree of parallelism
– Multiply, load/store, add/sub to/from ACC and new address
generation can be done simultaneously.
• Powerful Instruction set & most of the operations are
of single cycle
• Targeted for portable devices (cellular phones, MP3
players, digital cameras …)
9
Bus structure
• Has several address/data buses:
1. Program Bus (PB): carries instruction codes &
immediate operands from program memory to CPU.
2. Program Address Bus (PAB): provides addresses to
program memory for both read/write operations.
3. Data Bus (DB): carries data between data memory
space and CPU.
4. Data Address Bus (DAB): provides addresses to
access data memory.
10
Buses in C54x
• 8 major 16-bit buses
– 4 program / data buses
• 1 Program bus, PB
• 3 Data buses
• CB & DB for READ
• EB for Write
– 4 address buses
• PAB, CAB, DAB & EAB
11
• All CPU registers, peripheral registers
and I/O ports occupy data memory space
12
e
r
o
l
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
o
14
Barrel Shifter
e
r
o
l
ALU
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
MAC o Unit
CSSU
17
Central Processing Unit
Amrita School of Engineering, Bangalore
• CPU Registers
• 40-bit ALU
• Two 40-bit Acc Regs (AccA & AccB)
• Barrel Shifter Supporting 0-31 bit left shift
& 0-16 bit right shift range
• MAC Block
• 16-bit Temp Reg (T)
• 16-bit Transition Reg (TRN)
• Compare, Select and Store Unit (CSSU)
• Exponent Encoder
AG AH AL
BG BH BL
• IMR, IFR
• ST0 & ST1
• PMST
• AR0 – AR7(GPRs)
• SP reg
• Circular-Buffer size Register (BK)
• Block-Rep Regs (BRC, RSA and REA)
• PC Extension Reg (XPC)
5
e
r
o
l
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
o
8
ST0 register
15 - 13 12 11 10 9 8-0
9
ST0 register Cont..
15 - 13 12 11 10 9 8-0
• C: Carry,
1 for Carry generated by addition.
0 for Borrow generated by
subtraction
otherwise,
0 for add & 1 for sub.
10
ST0 register cont..
15 - 13 12 11 10 9 8-0
11
Status Register (ST1)
15 14 13 12 7 6 5 4-0
11 10 9 8
BRAF CPL XF HM INTM 0 OVM SXM C16 FRCT CMPT
ASM
are enabled
• OVM: Overflow mode, enables (1) / disables(0) the
accumulator to saturate on overflow.
• SXM: Sign extension mode, enables / disables sign extension
of an arithmetic operation.
13
Status register 1…..
15 14 13 12 11 10 9 8 7 6 5 4-0
BRAF CPL XF HM INTM 0 OVM SXM C16 FRCT CMPT ASM
15
Processor Mode Status Register
15-7 6 5 4 3 2 1 0
16
Barrel Shifter
e
r
o
l
ALU
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
MAC o Unit
CSSU
5
MAC Unit
7
ALU
e
r
o
l
a
g
n
a
B
,
g
nh
ci
S
ra
et
ei
nr
im
gA
n
E
fo
l
o
o 8
Barrel Shifter
• Used for scaling operations…
– Prescaling an input data memory operand or the
Acc value before an ALU operation
• Performing a logical / arithmetic shift of the Acc.
• Normalizing the Acc?
• Post scaling the Acc before storing the Acc value
into data memory.
9
Addressing Modes of TMS320c54x
Data Addressing Modes
• Provide various ways to access operands to
execute instructions and place results in
the memory or the registers.
• C54x has 7 addressing modes
– Immediate Addressing
– Absolute Addressing
– Accumulator Addressing
– Direct Addressing
– Indirect Addressing
– Memory-Mapped Register Addressing
– Stack Addressing
4
Immediate addressing
• Value encoded in the instruction.
• Two types of values:
– Short immediate (3/5/8/9 bits)
– Long immediate (16 bits)
• # indicates immediate.
5
Example
• LD #5, ARP ; 3-bit constant
• LD #1000h, A ; 16-bit
constant
6
Absolute Addressing
• Complete address is specified
• Address is always of 16-bits
• So, instruction is of 2 words
• 4 types:
– dmad addressing
– pmad addressing
– PA addressing
– *(lk) addressing
7
Example
• MVKD SAMPLE, *AR5 ;dmad
addr
• Two instructions:
– READA Smem
– WRITA Smem
9
Direct Addressing
• Lower 7-bit dma is an address offset
• Types:
– DP-Referenced Direct addressing
• Can access upto 128 locations (7 bits) of 512 pages(9
bits of DP) in DMA
– SP-Referenced Direct addressing
10
11
Indirect Addressing
• Uses 8 ARs; AR0-AR7
• Used to step-through sequential locations in
mem in fixed-size steps
• AR modified by:
• Increment / Decrement
• Offset
• Index
• Special modes:
• Circular addressing
• Bit-reversed addressing
5
e
r
o
l
a
g
n
a
B
,
g
nh
ci
S
ra
et
ei
nr
im
gA
n
E
fo
l
o
o 6
28/ 02/ © Dr.Shikha Tripathi,ASE, Bangalore
14
e
r
o
l
a
g
n
a
B
,
g
nh
ci
S
ra
et
ei
nr
im
gA
n
E
fo
l
o
o 7
28/ 02/ © Dr.Shikha Tripathi,ASE, Bangalore
14
Offset Address Modification
• 16-bit offset added to AR
• Two types:
– AR not updated
• Useful in accessing an element in array / structure
– AR updated to new address
• Useful in stepping thro’ an array in fixed step-size.
8
Circular addressing
• Circular buffer: sliding window containing most
recent data
• Uses decrement/increment by 1 or by index
• BK: Circular buffer size register
• A circular buffer of size R must start at a N-bit
boundary (2N > R) (N LSBs of base address must
be zero resulting in Effective Base address
(EFB))
• End of Buffer (EOB) is obtained by replacing N
LSBs of ARx with N LSB of BK
• Index of circular buffer: N LSBs of ARx & step
is added or subtracted from AR
9
Circular Addressing Block Diagram
e
r
o
l
a
g
n
a
B
,
NgLSBs of ARx
n
hi
c
S
r
ae
te
in
ri
m
g
nA
E
fo
l
o
o
10
Example
• Let AR3=1020h and BK=40h.Determine the
start & end address of the buffer. What will be
the content of AR3 after LD *AR3+0%, A
if AR0=0025h
11
Circular addressing
• Rules to be followed:
e
n
i
g
n
E
hf
oc
S
l
ao
to
i
r
m
A
12
Bit reversed addressing
• Enhances the execution speed for FFT
algorithm
• AR0 specifies one-half the size of FFT
(2N-1) where N is integer (2N : size of FFT)
• To generate address, add AR0 to any AR
which is pointing to a data value in bit
reversed fashion.
– Carry propagates from left to right
13
Example
re
o
l
a
g
n
a
B
,
g
nl
io
h
rco
S
e
ae
tn
ii
rg
m
n
E
A
f
o
14
MMR Addressing Mode
5
CPU MMRs (Table 10.3)
6
7
Memory Mapped Register addressing
• Modifies MMRs without affecting DP and SP
• In addition to registers any scratch-pad RAM on
Data Page 0 can be modified
• 2 modes
Direct: forces 9 MSBs of Dmem to 0(DP0)
Indirect: uses 7 LSBs of current AR
If AR1 point to MMR & it contains FF25h,then AR1
points to Time Period register(PRD) whose address
is 25h (7 LSBs of FF25h).After execution AR1=25h.
• Example
LDM MMR ,dst (direct)
STM #lk, *arx (indirect)
8
Example 1 LDM AR4, ;Direct
A After execution
Before execution
A 00 0000 FFFF
A 00 0000 1111
AR4 FFFF AR4 FFFF
9
Amrita School of Engineering, Bangalore
( )
• Arithmetic operations
• Logical operations
• Program control
operations
• Special operations
• Absolute value
• Addition
• Subtraction
• Multiplication
• Label
• Assembler directive
• Mnemonic field
• Operand list
• Comment (after ;)
spaces
.mmregs
.text
stm #1000h,ar2
stm #1100h,ar3
stm #1250h,ar4
ld #0h,a
add *ar2,*ar3,a ;Result stored in AH
sth a,*ar4
.end
18/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 10
14