0 évaluation0% ont trouvé ce document utile (0 vote) 301 vues28 pagesDSP Note
Digital signals processing
Copyright
© © All Rights Reserved
Formats disponibles
Téléchargez aux formats PDF ou lisez en ligne sur Scribd
{1. Digital Signal Processors
11.1 Overview of Digital Signal Processors
the programmable digital signal processors (PDSPs:
ned specifically for digital signal processi
ind instruction set so as to exe
al purpose micropro:
plications. They contain
ah ute computation -intensive DSP
algorithms more efficiently. The programmable DSPs can be divided into two broad
caleg hey are i) general purpose digital signal processors and ii) Special pur-
[pose digital signal processors,
i) General purpose digital signal processors
sors with archit
‘These are basically high speed
ture and instruction sets optimized for DSP
They include fixed point processors such as ‘Texas Instruments.
$320C54x and Motorola DSP563x and floating point pro-
ors such as Texas instruments TMS320C4x, TMS320C67xx and analog
s ADSP21 xxx
devi
ii) Special purpose digital signal processors: These type of processors consist
of hardware i) designed for specific DSP algorithms such as FFT, ii) hard-
ware designed for specific applications such as PCM and filtering.
for special purpose DSPs are Mitel’s multi channel telephony voi
celler (MT93001), FFT processor (PDSP 16515A,TM-44, 'TM-66) and pro-
grammable FIR filter (UPDSP 16256, Model3092).
A number of PDSPs appeared in the, commercial markets in 1980s. In 1979,
Intel introduced the first Digital Signal Processor (The Intel 2920), featuring on-chip
ADC and DACs. ‘Texas instruments introduced the ‘TMS32010 the first generation
fixed point DSP in the TMS320 family. Later it introduced TMS320C20. Note that
1MS32010 family dogs not have C in it because it was originally designed in NMOS
‘er DSPs are designed with CMOS technology.
The TMS320C3x generation is the first of Texas Instruments 32-bit floating-
Point digital signal processors. ‘The 'C3x devices provide an e
Performance architecture and can be used in a wide variety of area
‘chnology, whereas all the oth
» high-
including au-142 Dayqutiat Signal Pree in
Arinnannation andl COMtal, dg
is, tla tm
soneaivo apptiontior gta re anntiTune liens PEPER, copien
rw oqanjunent Hal HM sphetss Re
atin a al Tee ea adeentent nllipliet andl ALA 1 offer up to tp
Jawor qanntony The CPU BM vera (MELOR) HIKE OO MATES
yponalione pet Of
Aron Hating Polit a Mirae
ot ty He rnttigny 12 HIE mm
anenvary apace af the
a20Can dovicwn are 42-bit flowting point digital signal processens oy,
Hhavivedd for parallel pracenthte,
and DMA eontrolion with apie 6 COnt j miiifa’ah Goes
i Pinch device COntAINS AN On-chip sng
fiproceasnor and /O-dntenive application Hach dev ip analy
ve breakpoints for parallel processing, developmen:
is module that supports: hardware breakpolnt ne co
nal debyyeping, ‘The “Ca family accepts source code from the TMS320C4 family
Key applications of the ‘C4x family include 4-dimension)
networking, and (elecommunications base stations
iaumily combines «igh performance Cp
jmivation ports 10. meet (HE Needs of sy
Of Aoating point DSPs
graphies, image pro
ing.
‘The TMSI20CSx accepts source code from the 'Clx, "C2x, and "Clan gen
erations, Faster cycle tines, on-chip memories, a parallel logic unit (PLU), rer
overhead context switching, and block repeats differentiate the 'C5x. Ther
an ANSLC compiler de for the °C5x, which translates the widely uved ANSI
C language diteetly into hi
ily optimized assembly language for the ‘C5x
‘The Texas Instruments ‘TMS320C54x is a family of 16-bit fixed-point DSPs.
TMS320C54x processors tar volume, low-power applications, The first fa
ily members were introduced in Japan in 1994. ‘The fastest processor in the fay
runs at 160 MHz with a 1,6-volt core supply voltage. |
‘The TMS320C55x is a 16-bit fixed-point packaged DSP processor family {0
‘Texas Instruments. It can execute up to two instructions in parallel, with instruction
widths varying from 8 10 48 bits. Its based on the earlier 'TMS320C54 family ™
adds significant enhancements to the older processor architecture and instruction
The ‘TMS320C55x is a partially assembly code ompatible with the TMS320C"
‘The TMS320CSSx is intended for use in applications that include cellular telepho™*
modems and telecom infrastructure applications. ‘Th 15320055 imerfaes &
rectly to SDRAM making it well suited for use in Fieetioe a oducts
large memory buffers are required, eg, digital canrerss sesh om: wont b
digital audio players. ameras and CD-ROM-based P
‘The TMS320C62xx is another fixed.
It is based on a VLIW-like architect
family, the TMS320C6201, is avaitaby
(with 3.3-volt 1/0), and executes upto
family members, the TMS320C6202
Instruments also offers the TM$320,
architecture with support for floatin
Point DSP processor from Texas Instumie
a The first member of the TMs
Aas 200 MHz. It uses an 1.8-volt COF
ey tien MACS per second. Later. t¥° ae
one « TMS320C621 1, were intr ei
Sa ‘amily, which extends the TMS
arithmetic and 64-bit data.Digital Signal Processors 113
edd DSP processors
The PMSA20CO4N 6 4-16 DiC fied point family of package
Instruments’
rom Tess fastramenty, The TMS.2OCOdy is an extension (0 Texas
Her TMSA20C628 architeoture, Phe TMS 120C6dy family targets high
axe stations, digital subseriber loops, mull line
modems, ISDN modems, imaging, 3D Imaging applica ye video applications, and
hint sonar systems, ‘Pho fastoxt TMS I20COdx family members execute at T 7
Gilly with 12 Volt core supply and A, V-vott VO,
perfor
nanee aPPHEALANS SUCH AN WH
fexas Instruments!
The TMS820C67s family 4s the Hoating-point version of
pMsi20C626 family of fixed-point DSPs, Like the TMS320C62s, the TMS320C07%
vascal on a VLIW atchitecttire which allows it 10 execute up 0 eight RISC-like
actions per clock cycle. It is capable of exceuting all TMS320C62x instructions
etic and 64-bit data, — The
is b
inst
and has added) support for floating-point arith
TMSA20CO7% family currently includes the TMS320C6701, the TMS320C67 1, the
PMS320C6712, and the TMS320C0713, ‘The fastest TMS320C67x family member,
the TMS320C6713, operates at 300 MHz and uses a La-volt core supply
The TMS320C67x is upward compatible with the TMS320C62x; the
TMS320C67X. can execute "TMS320C62x object code unmodified, | but the
TMS320C62X cannot execute all ‘TMS320C67x instructions, The TMS320C67x
is only partly compatible with the TMS320C6tx Texas Instruments’ next gener”
ation of the fixed-point TMS320COxxx architecture, since the TMS320CO4x ex-
fends the TMS320C62% instruction set with instructions that are net supported by
the TMS320C07s.
The ADSP-21xx is the fitst
DSP processor family from Analog Devices. The
family consists of a large number of processors based on a common 16-bit fixed-
point architecture with a 24-bit instruction word, Analog Devices has the ADSP-
219x series, which offers sods of up 10 300 MIPS, as well as architectural
enhancements. ADSP-21xx processors are targeted at modem, audio, PC multimedia,
and digital cellular applications ;
The Analog Devices ADSP-219x and ADSP-2199x are 16-bit fixed-point T
with 24-bit instructions. The ADSP- 219x core used in both families is based on the
ADSP-218x architecture, but f architectural enhancements. Most im-
portant among theselare the addition of new addressing modes, an expanded address
ction cache, and
space,'the addition of an instruc xa deeper pipeline (six stages, com-
pared to three on the ADSP-218x) to ‘enable faster clock speeds, The ADSP-219x
iy mostly, but not completely jasembly source code upward compatible with the
ADSP-218x
‘The ADSP-219x and
is particularly focused on M0
ADSP-219x family and three MeN"
‘The ADSP-21xxx. “SHARC
SHARC family. Like the ADSP-2106%,
projected spe
s
thas a number ol
,w-cost applications; the ADSP-2199x
ADSP-2199 target lo
err eontrot applications. ‘There are three members inthe
rrembers in the ADSP-2199x family.
is the successor to the original ADSP-2106x
ithe ADSP-21xxx is # 32-bit floating-point11.4. Digital Signal Processing
eA SP-21XXX targets
DP processor with 48-bit instruction words. “The ADSI BE " ; 4 varity
of applications including consumer, automotive, professional audio, wusttia, ang
medical imaging applications. ‘The ADSP ZIXXX is Havoc Ase 2106x,
but compared to the ADSP-2106x, the ADSP-21xxx has a duplicated data path ang
widened on-chip buses to support SIMD (sing! f
te ADSP-2106x assembly source code without mod,
tage of the SIMD features, software written for the
ry simi
Je-instruction, multiple-data) process
modified.
1xxx family members, the AD
ferent memory system than previous ADSP-21 xxx family
optimized for the older ADSP-21xxx
n optimal performance on the ADSP.
P-2136x, use a slightly longer
The newest ADS!
pipeline and a slightly di
members. Due to these differen
family members must be modifi
2136x.
The Motorola DSP560xx family con:
based on common core architecture, The
tions, where its 24-bit word width improves dynamic range and redu
noise compared to 16-bit fixed-point DSPs
‘The DSP56000 and DSPS6001 were the first members of the DSP560xx family,
and were introduced in 1987. In 1992, Motorola introduced the DSP56002. In 1996,
Motorola introduced the DSPS601 |, targeted specifically at audio decoding for DVD
Wigital Versatile Disc) players.
The DSP560xxhas a 24-bit, fixed-point data path that features an integrated
MAC/ALU with a 24%24-48-bit multiplier, a 56-bit ALU, and two 56-bit accumu
ators that each provide eight guard bits. The DSPS60xx data path uses fractional
arithmetic in all operations, Because the DSP560xx does not have an integer multi-
ply instruction, performing an integer multiply requires programmers to convert the
result of a fractional multiply to integer format by shifting a sign bit into the accumt™
lator MSB. ;
The DSPS60xx has a 24fit, fixed-point data path ‘that features an integrated
MACIALU with a 24%24-48-bit thultiplier, a 56-bit ALU, and two 56-bit accumu
Iators that each provide eight Lig bits. The DSPS60%x data path uses fractional
arithmetic in all operations. THeidata path can sbift values one bit left or right. The
data path provides support for 48-bit double-precition arithmetic. ‘The DSP560%*
provides a carry bit which is updated by shifting and ALLU operations
The Motorola DSPS61xx family is based on Motorola's DSPS610( 16-bit fixed”
point DSP core and has an architecture, instruction set, and develo a envi
ment similar 10 that of Motorola's DSPS60x 24-bit fixed. poinn a
DSPS61xx processors execute at speed up t 30 MIPS and are ad
tions such as digital cellular telephones and pagers, ee
‘The twoprocessors in the D$P561 xx famil
Both offer on-chip voiceband 7 and D/A ciation Rane ve x
veral 24-bit fixed-point DSPs
\dio applica:
quantization
ts of s
amily is popular in digitalDigital Signal Processors 11.5
The DSPS6Ixx family’s data path is based on a 16x 16-32-bit multiplier inte-
grated with a 40-bit accumulator’ providing eight guard bits. Multiply-accumulate
operations execute in a single cycle. The multiply-accumulate unit supports signed,
signed/unsigned, and unsigned multiplication, Although the data path does not in-
clude a barrel shifter, a shifting unit provides accumulator shifts of one or four bits
left, or one, four, or 16 bits right. The data path supports both convergent and biased
rounding as well as saturation and output shifting
The Motorola DSP96002 is a 32-bit IEEE standard 754 floating-point proces-
sor with 32-bit integer support. It has an overall architecture similar to that of the
Motorola DSPS60xx, family. The fastest versions of the processor execute at 20
MIPS.
The DSP96002 has achieved popularity in some scientific and military appli-
cations (especially those involving the fast Fourier transform), but has not found
widespread use elsewhere.
11.2 Selecting Digital Signal Processors
The factors that influence the selection of a DSP processor for a given application are
architectural features, execution speed, type of arithmetic and word length:
1. Architectural features - Though most of the digital signal processors avail-
able today have good architectural features the key features of interest include
size of on-chip memory, special instructions and /O capability. In applica-
tions where large memory is required on-chip memory is essential. It helps
in accessing the data at high speeds and executing the program rapidly. For
memory hungry applications (e.g. digital audio - Dolby AC-2, FAX/Modem,
MPEG coding/decoding), the size of internal RAM should be high. For ap-
plications that require fast and efficient communication or data flow with the
Cutside world, /O features such interface to ADC and DACs, DMA capability
and support for multiprocessing may be important. Depending on the applica-
tion, a rich set of special instructions to support DSP operations are important,
e.g. zero-overhead looping capability, dedicated DSP instructions, and circular
addressing.
2, Execution speed - The execution speed of digital signal processors plays an
important role in selecting the processor. The execution speed is measured in
terme of the clock speed of the processor, in MHz, and the number of instruc-
tions performed, n millions of instructions per second (MIPS) or in the case of
floating point digital signal processors, in millions of floating point operations
per second (MFLOPS). Comparison of execution speed of processors based
eer cuch measures may not be meaningful. For example, the C62x family of
provessors can execute a8 many as eight instructions ina eycle. The number11,6. Digital Signal Processing
ch cycle also differs from proces
osmed 580F ty
of operations performed in eed on the ¢ to
ON Speed gp
Thus, an alternative measure ts ; ution speed yt mn
gorithms such as FFT, FIR and HR filters ~
1 two most common type of arithmetic y
¢ of arithmetic - Th a
‘ sgt r ign | processors are fixed and floating point arithmetic, “ae
digital signal proce nt :
on are favoured in low cost, high volume applications (p Fi
phones and computer disk drives). Floating arithmetic is the natypy s®
for applications with wide and variable dynamic range requirements (4 -
range may be defined as the difference between the largest and smal |
nal levels that be represented). In general, floating processors 6,
expensive than fixed point processors.
4. Word length - Processor data word length is an important parameter in psp,
it can have a significant impact on signal quality. In general, the longer th
word the lower the errors that are introduced by digital signal Processing. Fis, |
point digital signal processors aimed at telecommunications markets tej y
use a 16-bit word length (c.g. TMS320C54x), whereas those aimed atti
quality audio applications tend to use 24-bits (e.g. DSP56300). In fixed pun
audio processing, for example, a processor word length of at least 24-bis is
required. In most floating point DSP processors, a 32-bit data size (244
mantissa and 8-bit exponents) are used for single-precision arithmetic. Mat
floating point DSP processors also have fixed point arithmetic capability!
often support variable data size, fixed point arithmetic.
11.3 Applications of PDSPs
In this section, we will study the applications of PDSPs in both real-world and po
divided into three categories: commu
La acquisitions.
typing applications ‘The applications are
tion systems, multimedia, and control/dat
11.3.1 Communications systems
PDSPs have been applied to im,
include Caller ID , cordless hy
TMS320C2x. TMS320¢50 i
” ' Se
bitrate (1.4 Kbps), real-time y ener Pees ial
plemented with a 16-bit fixed-point TMS320C5x PDSP. I ir as ery
Speech recognition system . Modem poe itave for ronal
Aigital communication, Digital baseber
plication of PDSPs, System prototynien
Ps are also suitable for a
signal processing is another psp ae
M8 can be accomplished using P3Digital Signal Processors 11.7
low cost and ease of programming. Navigation using the Global Positioning System
(GPS) has been widely accepted for commercial applications such as.electronic
rection finding, The C30 is in charge of signal processing tasks such as correlation,
FFT, digital filtering, decimation, and demodulation . For defense system applica-
tion, a linear array of TMS320C30 as the front-end and a Transputer processor array
as the back-end for programmable radar signal processing are developed . The PDSP
front-end performs pulse compression, moving target indication (MT1), and constant
false alarm (CFA) rate detection.
11.3.2 Audio Signal Processing
The audible signals cover frequency range from 20 to 20,000 Hz, PDSP applications
to audio signal processing can be classified into three categories according to the
qualities and audible range of the signal professional audio products, consumer audio
products, and computer audio multimedia systems
Table 11.1 shows some of the applications of PDSPs in audio signal processing
Table 11.1
Professional Audio Products DSP Algorithms Used
Digital Audio Tape(DAT) ‘Compression techniques: MPEG,
Graphic and Parametric Equalizers _| Digital FIR/IIR filters
Multichannel Digital Audio Recorders | ADPCM, AC-3
Digital Audio Effects Processors _| Delay-Line Modulation
CD Players and Recorders PCM
Digital Amplifiers/Speakers Digital Filtering
Digital Versatile Disk (DVD) Players _| AC-3, MPEG... —
Satellite (DBS) Broadcasting AC3,MPEG |
Home Theater Systems _| Ac:3, THX
11.3.3 Control and Data acquisition
PDSP has found numerous applications in modern control and data acquisition ap-
plications as well. Several control applications are implemented using Motorola
DSP56000 PDSPs that function as both powerful microcontrollers and as fast dig-
ital signal processors. Its 56-bit accumulator (hence the code name S6xxx) pro-
| vides 8-bit extension registers in conjunction with saturation arithmetic to allow
| 256 successive consecutive additions without the need to check for overflow con-
| sition or limit eycles, The output noise power due to roundoff noise of the 24-bitcessing
11.8. Digital Signal Pro
56001 is 65.536 times tes than that for 16-bit Psp,
DSO i xamples include @ PID (proportion, integration, nl ve %
conioles, Deeyatie comtole «FTMS320C40 for distribured mon
~~ "OY Par
controller and an a
PDSP.
.
41.3.4 Biometric Information Processing
as in bioinformatics. Handwri ae
hentication techniques is cheap, reliable ang
s verification method can be p. i
an
.d applicatio
tric autl
uuthorized. Thi
and security systems.
The PDSPs also fin
cation, one of the biomet
intrusive to the person being 3
variety of entrance monitoring
11.3.5 Image/Video processing
Existing image and video compression standards such as JPEG, and MPEG are bax:
on the DCT (discrete cosine transforms) algorithm. The JPEG 2000 image cnn:
standard is based on the discrete wavelet transform (DWT). These standards
ten implemented in moder digital cameras and digital camcorders where PDSP5 il
play an important role. Medical imaging has become another fast growing api
tion area of PDSPs: It can be used as on-line data processor for processing magn:
resonance imaging (MRI). It can perform real-time dynamic imaging such asthe
diac imaging, angio-graphy (examination of the blood vessels using x-rays flim
the injection of a radio-opaque substance), and abdominal imaging.
Other applications of PDSPs include : Digital Cellular phones, Auton
inspection, Vehicle collision avoidance, Voice-Over-Internet, Motor contol \*
mail, Navigation equipment, Video conferencing, Toys, Games consoles, ‘Music
thesis, Satellite communications, Seismic analysis, Secure communications, Tages
answering machines, Sonar Modems (POTS, ISDN, cable, ...), Noise cancel
Medical ultrasound and Patient monitoring. ‘ :
11.4 Von Neumann Architecture
ee ie Neumann developed the first computer architecture that ae
structions ee pte eR by codes residing in memory. In this, PF wie
ca
is most-widely used in maj | Only Memory (ROM). The Von Neuman? ot ee
main pecbicSine, Re CPL ce ea es oe with Tt
can be either reading an instruction reat
data from/to the memor ie
ry. Both ime Si f
and data use the same signal cannot occur at the same time Since wsDigital Signal Processors 11.9
ye Data bus: Transports data between CPU and its peripherals. It is bidirectional
CPU can read or write data in the peripherals
bus: The CPU uses the address bus to indicate which peripherals it
is 19 access and within cach peripheral which specific register. The address bus is
unidirectional, The CPU always writes the address, which is read by the peripherals.
addres
trol bus: The bus carrier signals that are used to manage and synchronize the
anges between the CPU and its peripherals, as well as that indicates if the CPU
ts to read or write the peripheral
<—{“Clock_] [Memory
program’
data
Peripherals
]
11.1 Von Neumann Architecture
CPU
"The main characteristics of the Von Ncurnann architecture is that it only possesses
s system. The same bus carries all the information exchanged between the CPU
the peripherals including the instruction codes as well as the data processed by
CPU.
5 Harvard ..rchitecture
term Harvard originated from the Harvard Mark | relay-based computer which
«instructions on punched tape and data in relay latches. The Harvard architec-
physically separates memories for their instructions and data, requiring dedicated
for each of them. Instructions and operands can therefore be fetched simulta-
sly.
Most DSP processors use a mod
buses; allowing access to filter
ified Harvard architecture with two or three mem-
coefficients and input signals in the same cycle.
Since it possesses two independent bus systems, the Harvard architecture is ea-
ic of simultaneous reading an instruction code and reading or writing a memory
ripheral as part of the execution of the previous instruction. Since it has two
ries, it is not possible for the CPU to mistakenly write codes into the program
ry and therefore compute the code while itis executing.
However it is less exible. It needs two independent memory banks. These two
uurces are not interchangeable.d
11.10 Digital Signal Proces:
Data’
memory
Peripherals
Program cre
memory
UE E
——
ue
Fig. 11.2. Harvard architecture
The modified Harvard architecture used DSPs multiport memory that has sepa-
rate bus systems for program memory and data memory and input/output peripherals,
It may also have multiple bus system for program memory alone or for data mem-
ory alone. These multiple bus system increases complexity of the CPU, but allow
it to access several memory locations simultaneously, there by increasing the data
throughput between memory and CPU.
11.6 VLIW Architecture
The new architecture that has attracted a great deal of attention in the DSP community
is the Very Long Instruction Word(VLIW). The Very long instruction word process-
ing increase the number of instructions that are processed per cycle. It is essentially @
concatenation of several short instructions and require multiple execution units, 1"-
ning in parallel, to carry out the instructions in a single cycle. The new architecture
makes use of extensive parallelism whilst retaining some of the good features of pre
vious DSP processors. VLIW architecture executes multiple instructions/cycle and
use simple, regular instruction sets,
The Very Long Instruction Word (VLIW) processor consists of architecture that
reads a relatively large group of instructions and executes them at the same time. Th?
VLIW processor combines many simple instructions into a single long instruction
word that uses different registers. A language compiler or pre-processor sepa/a®
program instructions into basic operations that are performed by the processo™
parallel . These operations are placed into a "very Jong instruction word” that #!
Processor can then disassemble, and then transfer each operation to an approptis®
execution unit. For example, the group might contain four instructions, and the
piler ensures that those four instructions are not dependent on each other so they °™2
be executed simultaneously. Otherwise, it places "
yhere necessary. "no-ops” (blank instructions) #®
group where necessary.Dipital Signal Processors FEEL
|
, Ee
U xecution
7
J]
Fig. 11.3 VLIW architecture,
ntages of VLIW architecture
- Increased performance
. Better comp
targets
. Potentially ea
ier fo program
. Potentially scalable
Can add more execution uni
VLIW instruction.
. allow more instructions to be packed into the
tages of VLIW architecture
. New kind of programmer/ compiler complexity
. Program must keep track of instruction scheduling,
. Increased memory use
. High power consumption
. Misleading MIPS ratings
Multiply Accumulate Unit (MAC)
Mattiply- Accumulate (MAC) operation is the basis of many digital signal pro-
algorithms, notably digital filtering. The term “digital filter” refers to an algo-
by which a digital signal or sequence of numbers is transformed into a another
‘of numbers termed the output digital signal. Digital filters involve signals11.12 Digital
gnal Processing
in the digital domain (discrete-time signals) and are used extensively in ay
such as digital image processing, pattern recognition, and spectra analysis, Jy 1
1 FIR filters are preferred in lower order solutions, and since they do not
feedback, they exhibit naturally bounded response, They are si
and require one RAM location and one coefficient for each order
For FIR filters the output of the filter is given by
PPlication,
a
ploy
lement,
N-l
(rr) = S> n(k)h(n — k) (ity
k=0
where x(n) is the input to the filter, h(n) is the impulse response of the fit
and y(n) is output of the filter. The output of an FIR filter is simply a finite length
Weighted sum of the present and previous inputs to the filter. Hence to perform
filtering through above equation, the minimum requirement is to quickly ‘multiply
{wo values, and add the result. To make it possible, a fast dedicated hardware MAC,
using either fixed point or floating point arithmetic is mandatory. Characteristics of
typical fixed point MAC include
1. 16 x 16 bit 2's complement inputs
2. 16 x 16 bit multiplier with 32-bit product in 25 ns
3. 32/40 bit accumulator
In the TMS320C50, for example,
the FIR equation can be efficiently imple-
mented using the instruction pair:
RPT NMI
MACD_ HNMI, XNMI
1, Multiplies the data sample, 2(n~ k), in the dat sfficients
, a he coe!
/(ke), in the program memory; “a aa
2. Adds previous product t the accumulator:
3. Implements the unit delay, 5) mbolized by Asa, sample
. data
a(n — k), up to update the Pes delay tine,” PY Shifting the
iDigital Signal Processors 11S
jon.
Multiply-Accumutate (MAC) Fu
MAC speed applies both 10 finite impulse response (FIR) and finite impulse
yase (IIR) filters. The complexity of the filter response dictates the number MAC
juions required per sample period.
pultiply accumulate step performs the following:
4 Reads 4. 16-bit sample data (pointed to by a register)
8 Increments the sample data pointer by 2
8 Reads a 16-bit coefficient (pointed to by another re
Increments the coefficient registet pointer by 2
a Sign Multiply (16-bit) data and coefficient to yield a 32-bit result
8 Adds the result to the contents of a 32-bit register pair for accumu!
‘The TM$320CS4x multiply-accumulate (MAC) unit performs a 16 x 16 —
fractional multiply-accumulate operation in a single instruction cycle. The multi-
1 supports signed/signed multiplication, signed/ansigned multiplication, and un-
ed/unsigned multiplication. These operations allow efficient extended-precision
metic, Many instructions using the MAC unit can optionally specify automatic
\i-to-nearest rounding.
8 Pipelining ~
st of the early microprocessors execute instructions entirely sequentially.
execution of first instruction the next one starts. The problem with this is that it
xtremely inefficient, since the second instruction has to wait until all the steps of
instruction are completed. To improve the efficiency, advanced microprocessors
digital signal processors use an approach called pipelining in which different
ses of operation and execution of instructions are carried out in parallel. That is
jodern processors the first step of execution is performed on the first instruction,
then when the instruction passes to the next step, a new instruction is started. The
s in the pipeline are often called stages.
The basic action of any microprocessor can be broken down into a series of four
iple steps. They are
1. The Fetch phase(F) in which the next instruction is Fetched from the address
stored in the program counter.
2. The decode phase (D) in which the instruction in the instruction register is
decoded and the address in the program counter is incremented.
—<——11.14 Digital Signal Processin&
from the data buses and also y,
TMtes
3, Memory read (R) phase reads the
data to the data buses.
struction currently in the instrug
xecutes the in ton
4, The Execute phase (X) € “
register and also completes the write process
In a modem processor. the above four steps get repeated oe and over again up,
til the program is finished executing These are, in fact, the Four stages in a classe
rrsae stages could be said t0 represent One phase in he
RISC pipeline. Each of the et
lifecycle” of an instruction. An instruction starts Out In the fetch phase, moves tothe
decode phase, then to the memory read phase, and finally to the execute phase. Each
phase takes a fixed, but by no means, equal amount of time
eaking down its instruction into a series of discrete
pleted in sequence by specialized hardware. Be-
cause an instruction’s lifecycle consists of four fairly distinct phases, the instruction
execution process is divided into a sequence of four discrete pipeline stages, where
shase in the standard instruction lifecycle. Now
each pipeline stage corresponds to a pl
that che number of pipeline stages is referred to asthe pipeline depth. So a four-sage
pipeline has a pipeline depth of four.
To understand the pipelining in a better way, let us assume that the number
of stages is four and the execution time of an instruction is four nanoseconds lf
we assume the time taken for each stage in the instruction is equal, then the time
taken for each stage is one nanosecond. So our original single-cycle processor’
four-nanosecond execution process is now broken down into four discrete, ee
tial pipeline stages of one nanosecond each in length. At the beginning of the first
nanosecond, the first instruction enters the fetch stage. After that nanosecond is Con
plete, the second nanosecond begins and the first instruction moves on to the decode
stage while the second instruction enters the fetch stage. At the start of the third
nsanosecond, the first instruction advances to the memory read stage. the second it-
struction advances tothe decode stage, and the third green instruction enters the feet
stage. At the fourth nanosecond, the first instruction advances to the execution stage
the second to the memory read stage, the third to the decode stage, and the fo!
the fetch oe After the fourth nanosecond has fully elapsed and the fifth
ond starts, the first instruction has passed from the pipeline and is now fis
ecuting. Thus we can say that atthe end of four nanoseconds (= four clock eyes)
the pipelined processor depicted below has completed one i a araeee
edaieimecsttticrivelines ipleted one instruction. £
|, the pipeline is now full and ompletine
instructions at a rate of one instructis the processor can begin &
instruction per Se jon/n5 ont
nites io atoel oh nanosecond. This 1 instructi
pl a four-fold improvement over the si 4 compli”
Oe df 025 neructiodalia (Ordnance ee
every 16 nanoseconds).
The pipelining stages for different DSPs are shown in table nia. nowt #
7
Pipelining a processor means br
pipeline stages which can be com
Es
‘TMS320C54x has two additional phases : pre-fetch (PF) phi hic
ase wiDigital Signal Processors 11.15
dress of the instruction to be fetched and the access phase (A) which reads the address
ofthe operand and modify the auxiliary registers and stack pointer if required.
Instruction 1 TR Px
Instruction 2 dy R, | x,
b Instruction 3 RTD] R | X%
Instruction 4 BR [Do] R |X
Fig. 11.4 Four stages of TMS320C54x
Table 11.2 Pipeline in different TMS320 Processors
DSP processor | Pipeline phases
i TMS320C2000 | F-D-R-X (4 levels)
TMS320C3x__| F-D-R-X (4 levels)
TMS320C5x__ | F-D-R-X (4 levels)
‘TMS320C54x_| PF-F-D-A-R-X ( levels)
Pipelining leads to dramatic improvements in system performance. The more
es that we can break the pipeline into, the more theoretical speed we can get from
For example, let’s suppose it takes 12 clock cycles to handle all the steps to process
instruction. In theory, if you use a 4-stage pipeline, your maximum throughput is
struction every 3 cycles. But if you use a 6-stage pipeline, maximum throughput
instruction every 2 cycles.
9 Architecture of TMS320C50
TMS320C5x generation of the Texas instruments TMS320CS0 digital signal
ssor is fabricated with CMOS integrated circuit technology. It is a fixed point,
bit processor running at 40 MHz. The single instruction execution time is 50 nsec.
architectural design is based on the combination of advanced Harward architec.
+ onchip peripherals and onchip memory. Moreover the TMS320CS0 has a highly
ialized instruction set. These features enable the operational flexibility and the
ice speed, which together with the cost effectiveness make the signal processor as
Suitable device for a wide range of applications.
‘The TM$320C50 has a programmable memory map (address range is 224K x 16
Words), which can vary for each application. Onchip memory include 10K words11.16 Digital Signal Processing
of the RAM and 2K words of the ROM. All CSx DSPs have the same CPU gn,
however they have different on-chip memory configuration and on-chip Petiphen,
Table below provides a comparison of the devices in the C5x generation,
‘On Chip Memory (16 bit words)
| Device DARAM SARAM | ROM VO ports
| Data | Datatreg | Datasreg | Prog | Serial | Paralgy
TM5320C50 | 544 | __512 9K 2K" | 2 OK
CsI 344 | 512 1K 8K" | 2) eK |
cs2 $44] 512 - 4K | 1 | 6aK
C53 $44] 512 3K 167 | 2 oK
cs7s S44 512 6K 2 | 2° | 64K+HPr
* ROM boot loader available
# TDM serial port not available
© Include auto differed serial par (BSP) but TDM serial port not available.
HPI - Host Port Interface
The functional block diagram of TMS320CS5x is shown in Fig.11.1. It canbe
divided into four sub blocks. They are 1. Bus Structure 2. Central Processing Usit
3. Onchip Memory and 4. Onchip peripherals.
11.9.1 Bus Structure
Separate program and data buses in the advance Harvard architecture of “C5x m3
imize the processing power and provide a high degree of parallelism. Many DS?
applications are accomplished using single-cycle multiply / accumulate instructic®
with a data move option. For example, when data is multiplied, a previous
can be loaded into, added to or subtract from the accumulator and, at the same tim.
new address can be generated. In addition the ‘C5x included the control mechanis
to manage interrupts, repeated operations and function calling.
The ‘C5x architecture has four buses:
(i) Program bus (PB)
(ii) Program address bus (PAB)
(iii) Data read bus (DB)
(iv) Data read address bus (DAB)
memory to the CPU.
‘The program address bus provides address to
read and write.Digital Signal Processors 1117
Merny
UY Pahl
a(t fe fof seve Fp
ot [soir
fe [Sse] ja a] ef fs
{LES}
oUt al (StS to]
cake I
Memory
Te
meng [Lo i Hef snp
Samaesing)] Sessoms caw ine
{4
aus q * Ntiner, y aH '
ch * Accumulator wat a aaa
er a Aan |] Asuna my mation
sitar Riess} existe
ince eneraron fatthmetic |] + Shiters
Se a, batt ut
[instruction] ARAU) 4
cru,
Data Bus
Fig. 11.5 The functional block diagram of TMS320C50 (Texas instruments)
The data read bus interconnects various elements of the CPU to data memory
ipace.
The data read address bus provides the address to a
8 the data memory space,
}
11.9.2 Central Processing Unit
ts of the following elements:
ing Unit cor
|
fre ‘Central Proces
(i) Central arithmetic logic unit (CALU)
(ii) Parallel logic unit (PLU)
iii) Auxiliary register arithmetic unit (ARAU)
(iv) Memory mapped registers
(¥) Program controller11.18 Digital Signal Processing
11.9.2.1. Central Arithmetic Logic Unit (CALU)
erform 2's complement arithmetic. Tt consists of,
following : 16-bitx 16 bit parallel multiplier 32 bit accumulator (ACC), 32.bi an
buffer (ACCB), product register (PREG), additional shifters at the output of both the
accumulator and the product register (PREG)
“The 16x 16 bit hardware multiplier is capable of computing a signed or an yy,
signed 32- bit product in a single machine cycle. Al multiply instructions exces
the MPYU (multiply unsigned) instruction perform a signed multiply operation jn
the multiplier, The 16-bit temporary register O(TREGO) holds one of the operand
for the multiplier, and the other input is from the data bus or the program bus, The
product register holds the product.
The LT (load TREGO) instruction normally loads TREGO to provide one operang
from the data bus and the MPY instruction provides the second operand for mut.
plication operations. A multiplication also can be performed with a short or log
immediate operand by using the MPY instruction with an immediate operand,
‘The 32 bit ALU and accumulator implement a wide range of arithmetic and logic
ecute in a cycle. One input to ALU comes from
gister of the
‘The CPU uses the CALU to Pp
functions, the majority of which ex
accumulator and the other input can be furnished from the product re
multiplier, the accumulator buffer (ACB) or the output of the scaling shifter, The
result of operations performed in ALU are stored in accumulator.
The scaling shifter has a 16 bit input connected to the data bus and a 32-bit output
connected to the ALU. This scaling shifter produces a left shift of 0 to 16 bits on
the input data. A 5-bit register TREGI specifies the number of bits by which the
scaling shifter should shift or the shift count is specified by a constant embedded i
the instruction word.
1.9.2.2. Parallel Logic Unit (PLU)
‘The Parallel logic unit (PLU) is a second logic unit, that executes logic opera!
data without affecting the contents of the accumulator. It can directly set, cleats
toggle multiplier bit in a status / control register on any data memory locatio!
executing a logical operation on the two operands as defined by the instructiO™ i
PLU writes the result to the same data memory location from which the first operant?
was fetched.
testo"
a. After
119.23 Auxiliary Register Arithmetic Unit (ARAU)
The ‘C5x consists of a register file containing eight auxiliary register caro-AR?)
cach of 16 bit length, a 3 bit auxiliary register pointer (ARP) and an unsign® 1
Te ALU. ‘The auxiliary register file is connected to the auxitiagjilann
unit (ARAU). The auxiliary registers are used for indirect addressing of the daDigital Signal Processors 11.19
memory oF for temporary data storage. The ARs and the ARP ean be loaded from
data memory, the ACC or the PREG or by an immediate operand, defined in the
jnsiruction. The contents of the ARs can be stored in the data memory or used as
inputs to the CALU.
11.9.24 Index register (INDX)
The 16-bit index register (INDX) is used by the ARAU as a step value to modify
e address in the ARs during indirect addressing. The INDX can be added to or
subtracted from the current AR on any AR update cycle. The INDX can be used to
increment or decrement the address in steps larger than 1.
11.9.2.5 Auxiliary Register compare register (ARCR)
The 16-bit ARCR is used for address boundary comparison. It limits blocks of data
ind supports logical comparisons between the current AR and ARCR in conjunction
1.9.2.6 Block Move address Register (BMAR)
¢ 16-bit BMAR holds an address value of the source destination space of a block
ove. The BMAR can also hold the address of an operand in program memory for a
ultiply accumulate operation.
1.9.2.7 Block repeat Registers (RPTC, BRCR, PASR, PAER)
RPTC: The repeat count register is 16-bit length. It hold the repeat count in a
repeat single operation and is loaded by the RPT and RPTC instructions,
BRCR: The 16-bit block repeat counter register holds the count value for the
block repeat feature. This value is loaded before a block repeat operation
is initiated.
PASR: The block repeat program address start register indicates the 16-bit ad-
dress where the repeated block of code starts.
PAER: The block repeat Program Address End Register indicates the 16-bit ad-
dress where the repeated block of code ends.
1.9.2.8 Auxiliary Registers (ARO-AR7)
© ight 16-bit auxiliary registers (ARO-AR7) can be accessed by the CALU and
odified by the ARAU or the PLU. The primary function of the ARs is to provide
6-bit address for indirect addressing to data space. The ARs can also be used as
nerai purpose registers or counters.11,20. Digital Signal Processing
11.9.2.9 Instruction Register IREG)
ar (IREG) hold the opeode of the instruction
The 16-bit Instruction register (REG) brine
cuted.
11.9.2.10 Interrupt Register (MR, IFR)
The 16-bit Interrupt mask register (IMR) individually masks specific inten, Sa
required time. The 16 bit interrupt flag register indicates the current Status of
interrupts
11,9.2.11 Status Register
The two 16 bit status register contain status and control bits for the CPU.
11.9.2.12 Memory mapped Registers
The 'C5x has 96 registers mapped into page 0 of the data memory space (00-5R
This memory mapped register space contains various control and status registers in
cluding those for CPU, serial port, timer and software wait state generator. Adit
ally, the first 16 /O port locations are mapped into this data memory space, allving
them to be accessed either as data memory using single word instruction or a I0
locations with two word instruction.
11.9.2.13 Program Controller
The Program Controller contains lo
tions, manages the CPU pipe line,
the conditional operations. It consi
gic circuitry that decodes the operational ins”
Stores the status of CPU operations and det
ists of the following elements.
(i) Program Counter
(ii) Status and control registers id
(iii) Hardware Stack
(iv) Address generation logic
(v) Instruction register,
11.9.2.14 Program Counter
Program i
or ext progam none eee (PC) which contains the address nd
memory either onchip of 4
Off chip, watch instructions. The PC addres
an instruction is loaded into the trae Program address bus. Tee f
start the next instruction fetch cycle," T8ister (IREG). 4.Dipital Signal Processors 1124
11.9.2.15 Hardware Stack
The stack is a 16 bit wide and 8 levels de ep and is accessible via the PUSH and POP.
instructions. The stack is used during interrupts and subroutine to save and restore
the PC contents.
11.9.2.16 Program memory Address
neration
It contains the code for application and holds table information and immediate
operands. The program memory is accessed only by the program address bus. ‘The
address for this bus is generated by the program counter when
immediate operands are accessed.
struction and long
11.9.2.17 Status and Control Registers
The "C5x has four status and control register. The; ircular buffer
Process mode status register, status registers STO and STI
trol register.
11.9.2.18 Circular Buffer Registers (CBSR1, CBSR2, CBER1, CBER2, CBCR)
The *C5x support two concurrent circular buffers. The registers CBSR1 and CBSR2
are 16-bit registers that hold the address when the circular buffer starts. The registers,
CBERI, CBER2 indicate the address when the circular buffer ends. The 16-bit cir,
cular buffer control register (CBCR) controls the operation of these circular buffers.
11.9.2.19 Process Mode Status register (PMST)
‘The PMST resides in the memory mapped register space of data memory page 0 and
can be saved in the same way as any other data memory location. The PMST e;
acted upon directly by the CALU and the PLU.
11.9.2.20 Status Registers (STO and ST1)
The status registers can be stored into data memory and loaded from data memory,
12 *C5x status to be saved and restored for subroutines. ‘The LST
thereby allowin; ,
to STO and ST! and the SST instruction reads from them,
instruction writes
11.9.3 On-chip Memory
The 'C5x architecture has a total memory address range of 224K words 16bits
The memory space is divided into four memory segments,11,22. Digital Signal Processing
64 K- word m memory space
64 K- word
64K - word
32 K- word
local data memory space
input / output ports
Global data memory space
The 62 K word program space contains the instruction to be executed
Tata memory space stores data used by the instruction, The gy
1/0 port space interfaces to external memory mapped peripherals The 32 x” "
elobal data space can share data with other processors within the system,
The |:
word local
Wn
cludes
on-chip memory of "C5 i
(i) Program read only memory
ii) Data/program single access RAM (SARAM)
(iii) Data/program dual access RAM (DARAM)
1.9.3.1 Program Memory
‘The °C5x DSP carry a 16 bit on-chip maskable programmable ROM. The Progen
memory can reside both on and off chip. If the pin MP/MC is high, the device
sor and it starts running from off-chip memory. Ifthe
‘onfigured as a microcomputer and it stats ring
running, the device configuration cane
bit in the PMST.
configured as a microproc
pin MP/MC is low, the device is
from on-chip ROM. Once the program i
changed by setting or clearing the MP/
11.9.3.2 Data/Program Dual access RAM (DARAM)
All the °C5x devices have 1056 words of DARM. The DARADM is divided ite
three individually selectable memory blocks. 512 word data or program DARAM
block BO, 512-word data DARAM block B1, and 32-word data DARAM boak®
DARAM Block BO can be configured by software as data or program memory ig
be configured into program space by setting the CNF bit in ST1. DARAM blo
and B2 are always configured as data memory. The DARAM can be rea
to in the same machine cycle.
d and weit
11.9.3.3 Data/Program single Access RAM (SARAM)
AIL°C5x DSP s except the ‘C52 carry a 16 bit on-chip single access RAM oe a
sizes which is divided into 2K word and 1K -word block that continues IP ye
or data memory space. The SARAM can be configured by software in 0"
ways.
(i) All SARAM configured as data memory
(ii) All SARAM configured as program memory gal
iaDigital Signal Processors 11.23
(iit) SARAM configured as both data memory ad program memory
The SARAM requires a full machine cycle to perform a read or a write.
11.9.3.4 On Chip Memory Protection
he program met y fe . .
The program memory protection feature prevents an instruction fetch from off-chip
memory from reading or writing on-chip program memory. ‘This feature can be used
a“ the on-chip ROM to secure program code that is stored in off-chip memory.
11.9.4 On-chip Peripherals
‘The on-chip peripherals interface connected to the °C5x CPU include
(i) Clock generator
(ii) Hardware timer.
(iii) Software programmable wait state generators
(iv) General purpose /O pins |
(v) Parallel /O ports |
(vi) Serial port interface i
(vii) Buffered serial port
(viii) Time-division multiplexed (TDM) serial port
(ix) Host port interface
(x) User unmaskable interrupts
11.9.4.1 Clock Generator
The clock generator consist of an internal oscillator and a phase lock loop (PLL)
ircuit. The clock generator is driven by a crystal oscillator circuit or by an external
clock source. When the PLL option is selected, the CPU clock is multiplied by a
specific factor and generate a low frequency clock than that of CPU.
1.9.4.2 Hardware Timer
The timer is an on-chip down counter that can be used to periodically generate CPU
interrupts. It can be stopped, restarted, reset or disabled by specific status bits The
timer operation is controlled via the timer control register (TCR), the timer counter
register CTIM), and the timer period register (PRD), The timer is driven by a 4-bit
prescaler, The timer clocks at a rate between 1/2 and 1/32 of the machine cycle rate,
depending upon the timer’s divide down ratio.11.24 Digital Signal Processing
11.9.4.3 Softw:
Programmable wait state generators
The software programmable wait state generators can extend ey
es a. conver Sane OS ye
to seven machine cycles. This operation provides, ome SNE Meas in, a)
‘CSx to external devices that do not satisfy the full speed access time requinc tte
the °C5x, Devices that require more than seven wait states can be imtertaceg Md
Fp Ace Using
hardware READY line, When all external accesses are configured tg 2et0 wai
. en ltt
the internal clocks to the wait state generators are shutoff. Shutting of 3 int )
re i emg
clocks allow this circuitry to 1un with lower power consumption, na
1.9.4.4 General Purpose I/O Pins
The °C5x has two general purpose
control input (BIO) and the exten
Peripheral device status. A branch
State of the BIO input.
‘The XF pin signals to external devices via software. It is set high by the sere
XF instruction and reset low by the CLRC XE instruction,
Pairs that are software controlled, They ae tran
mal flag output (XF) pin. The BIO
can be conditionally execu
11.9.4.5 Parallel VO Ports
The "Cx has 64K parallel /O ports, Sixteen of the 64K 1/0 ports are men)
mapped in data page 0. Each of the VO ports can be addressed by the IN or he
OUT instruction or any instruction that Teads ot writes a location in data mews)
Space. Access to memory shaped 1/0 Space are distinguished from program andi
accesses by the IS signal Soing low; the DS Signal is not active, even though te0
Ports is actually accessed through data space.
11.9.4.6 Serial Port Interface
memory mapped registers sre ist
XSR), The data receive registe? 2 of
the incoming serial data from out
{ansmit register (DXR) hols
4 on the data transmit shift regisDigital Signal Processors 11.25
The data receive shift register (RSR) holds the incoming serial data from the
al data receive (DR) pin and controls the transfer of the data to the DRR.
The data transmit shift register (XSR) controls the transfer of the outgoing data
the DXR and holds the data to be transmitted on the serial data transmit (DX)
9.4.7 Buffered Serial Port (BSP)
available on the “C56 and "C57 devices. It operates on either auto buffering or
buffered mode, When operated in nonbuffered mode, the BSP functions same as
basic standard serial port. The autobuffered mode allows high speed data transfer
reduces interrupt latencies.
.4.8 TDM Serial Port
M serial port interface is implemented on the ‘C50, “C51 and "C53 devices. It
ates in either TDM or non-TDM mode. When operated in non-TDM mode,
TDM serial port also functions as the basic standard serial port. It allows to
municate serially with up to seven devices. The TDM port. therefore, provides
le and efficient interface for multiprocessing applications.
4.9 Host Port Interface (HPI)
‘HPI 1s available on the "LCS7 and 'C5S7S devices. It is an 8-bit parallel port used
face a host device or host processor to the ‘CSx. Information 1s exchanged be-
n the "CSx are host device through on chip. "CSx memory that is accessible
the host anu the CSx.
.4.10 User maskable interrupts
°C5x has four external, maskable user interrupts (INTI-INT4) that external de-
can use to interrupt the processor, and one external nonmaskable interrupt
). Internal interrupts are generated by the timer (INT), the serial port (RINT,
-TRNT, BRNT, BXNT). the host port (HINT) and the software interrupt in.
(TRAP, NMI and INTR). Interrupt priorities are set so that reset (RS) has
izhest-priority and INTG has the lowest priorty. The NMI has the second highest
10 Addressing modes
‘Addressing modes in TMS32050 are
) Immediate addressing
) Indirect addressing11.26. Digital Signal Processing
J ing
(iii) Register addressin
(iv) Memory mapped register addressing
(v) Direct addressing
(vi) Circular addressing mode.
1110.1 Immediate addressing
Immediate addressing is used to handle constant data. It allows the a
operate on an actal value. The data can be either a 16-bit constant orga
7.9 or 13. Depending on the length of the data, the addressing mode is na
2s long immediate or short immediate addressing mode, In long imineiiag
ing the data is contained in a portion of the bits in a single Word insiucig -
assembly code level, the developer uses a “#” prefix to specify immedi i"
Example
LD #80n,A: The instruction loads an immediate value 80h into te
mulator. *
11.10.2 Indirect addressing
The indirect address mode uses the auxiliary registers (ARs) to hold the atiess |
of operands in memory. In indirect addressing, any location in the 64K-vouds |
memory space can be accessed using a 16-bit address contained in AR. Each aig
register (ARO-AR7) provide flexible and powerful indirect addressing. To
specific auxiliary register, the auxiliary register pointer (ARP) is loaded witha
from 0 to 7 for ARO through AR7 respectively. There are seven types of i
addressing
(i) Auto increment
(ii) Auto decrement
(iii) Post indexing by adding the contents of ARO
(iv) Post indexing by subtracting the contents of ARO
(v) Single indirect addressing with no increment
(vi) Single indirect addressing with no decrement
(vil) Bit reversed addressing
11.103 Register addressing ;
, ci
The register addressing mode uses operands in CPU registers eithe" 7 js?
as with a direct reference to a specific register, or implicitly, with its
de the a8
‘rinsically refers certain registers. That is in this addressing mo || a
Digital Signal Processors 11-27
one of 8V0 Spe
# | purpose memory mapped registers in CPU. The block move
ates
s register (BMAR) and the dynamic bit manipulation register (DBMR). In ei-
se, operand reference is simplified hecause 16bit values can be used without
yg a full 16-bit operand address or immediate value
Mor example the instructions BLDP, BLDP, BLPD, MADD and MADS instrue-
the BMAR to address an operand in progr
jon 8 ‘am memory,
11104 Memory mapped register addressing
Memory mapped register addressing is used to access efficiently the CPU and on chip
ripheral registers. It operates like the direct addressing except that the upper 9-bits
ofthe address that is accessed are assumed to be 0s. This allows us to address the
memory mapped registers of data page 0 directly without the overhead of changing
iheDP or auxiliary register, Only the seven lower bits of the complete code, including
opcode and operand can be represented using a single 16-bit word.
The following instructions operate in the memory mapped register addressing mode
LAMM - Load Accumulator with Memory Mapped Register
LMMR > - Load Memory Mapped Register
SAMM_ - Store Accumulator in Memory Mapped Register
SMMR_ - Store Memory Mapped Register
1.105 Direct addressing mode
DP (9 bit) dma 7 bit
UE Specify actuat
location inside
ea 128 word page
7
2
1
0
7 a
Fe
Pagen
Fig. 11.6
Direct addressing allows the CPU to access operands by specifying an offset
| ftom a base address that is defined in data pointer. DP (Data pointer) is a 9-bit field
Contained in the status registers (STO). In this mode the address of the operand is
—: cessine
11,28 Digital SIE , mory address (dma) with the 9
spe it da egress is placed on an its ory
rained by concat atin pit dat amon tS aid a interna gi
. ar it field, 1
data page poi Ve epce dat pointer | 9 tH : t Points t0 one one
tata memory address Du and t qebit address jn the instruction points tg ,
r data memory pages :
possible data ™ eer) page:
ds within that
le
11.10.6 Cireular addressing mode
he most sophisticated 5x addressing mode. Many
FIR filtering can use circular by re
ntains most recent data eee a
ro
sing is
te ation an
plementation of circularad
lat addres
128 wor
Circular 4
rithms such &s
memory to impl
cessed. Five dedi
ing. They are
cular Buffer 1 Start Register
convolution, Corre F
Fement a Sling WINGO which co
cated registers are allocated for im
cBSR! - Circ
CBSR2 ~ Circular Buffer 2 Start Register
CBER] ~ Circtlar Buffer 1 End Register
CBER2 - Circular Buffer 2 End Register
cBCR - Circular Buffer Control Register
load the starting address of ci
‘The registers CBSR 1 and CBSR 2 are used t
{and CBER 2 are used to load the end adress
lar buffer and the registers CBER
cucular buffer. The 8-bit CBER enables and disables circular buffer operation. At
ditionally, one ofthe auxiliary registers (ARS) is used as the pointer into the circulr
buffer.
first we load the start and end addresses into the corre
To define circular buffer,
Seg huffer repsters, Next a value is loaded between the start and end ri)
1 obi
aoe cular buffer into an AR and the corresponding circular buffer eneble B®
the CBCR is set.
11.11 Instruction Set
In this sect
section we will study briefly about the instructions used in TMS320C30.
11.111 Summary of instructions
Accumulator Memory Reference Instructi
ions
Mnemonic Description
Mnemonic _Deeription
ABS ro catry
Absolute val bi
value of ACC; zes i
ADCB Add ACCB and carry bit to ACC 4
Add data me
Add data meno ats With left shift to ACC
Y value, With left shift of 16, to ACC
Vous aimerez peut-être aussi