0% ont trouvé ce document utile (0 vote)
301 vues28 pages

DSP Note

Digital signals processing

Transféré par

jibin immanuel
Copyright
© © All Rights Reserved
Nous prenons très au sérieux les droits relatifs au contenu. Si vous pensez qu’il s’agit de votre contenu, signalez une atteinte au droit d’auteur ici.
Formats disponibles
Téléchargez aux formats PDF ou lisez en ligne sur Scribd
0% ont trouvé ce document utile (0 vote)
301 vues28 pages

DSP Note

Digital signals processing

Transféré par

jibin immanuel
Copyright
© © All Rights Reserved
Nous prenons très au sérieux les droits relatifs au contenu. Si vous pensez qu’il s’agit de votre contenu, signalez une atteinte au droit d’auteur ici.
Formats disponibles
Téléchargez aux formats PDF ou lisez en ligne sur Scribd
{1. Digital Signal Processors 11.1 Overview of Digital Signal Processors the programmable digital signal processors (PDSPs: ned specifically for digital signal processi ind instruction set so as to exe al purpose micropro: plications. They contain ah ute computation -intensive DSP algorithms more efficiently. The programmable DSPs can be divided into two broad caleg hey are i) general purpose digital signal processors and ii) Special pur- [pose digital signal processors, i) General purpose digital signal processors sors with archit ‘These are basically high speed ture and instruction sets optimized for DSP They include fixed point processors such as ‘Texas Instruments. $320C54x and Motorola DSP563x and floating point pro- ors such as Texas instruments TMS320C4x, TMS320C67xx and analog s ADSP21 xxx devi ii) Special purpose digital signal processors: These type of processors consist of hardware i) designed for specific DSP algorithms such as FFT, ii) hard- ware designed for specific applications such as PCM and filtering. for special purpose DSPs are Mitel’s multi channel telephony voi celler (MT93001), FFT processor (PDSP 16515A,TM-44, 'TM-66) and pro- grammable FIR filter (UPDSP 16256, Model3092). A number of PDSPs appeared in the, commercial markets in 1980s. In 1979, Intel introduced the first Digital Signal Processor (The Intel 2920), featuring on-chip ADC and DACs. ‘Texas instruments introduced the ‘TMS32010 the first generation fixed point DSP in the TMS320 family. Later it introduced TMS320C20. Note that 1MS32010 family dogs not have C in it because it was originally designed in NMOS ‘er DSPs are designed with CMOS technology. The TMS320C3x generation is the first of Texas Instruments 32-bit floating- Point digital signal processors. ‘The 'C3x devices provide an e Performance architecture and can be used in a wide variety of area ‘chnology, whereas all the oth » high- including au- 142 Dayqutiat Signal Pree in Arinnannation andl COMtal, dg is, tla tm soneaivo apptiontior gta re anntiTune liens PEPER, copien rw oqanjunent Hal HM sphetss Re atin a al Tee ea adeentent nllipliet andl ALA 1 offer up to tp Jawor qanntony The CPU BM vera (MELOR) HIKE OO MATES yponalione pet Of Aron Hating Polit a Mirae ot ty He rnttigny 12 HIE mm anenvary apace af the a20Can dovicwn are 42-bit flowting point digital signal processens oy, Hhavivedd for parallel pracenthte, and DMA eontrolion with apie 6 COnt j miiifa’ah Goes i Pinch device COntAINS AN On-chip sng fiproceasnor and /O-dntenive application Hach dev ip analy ve breakpoints for parallel processing, developmen: is module that supports: hardware breakpolnt ne co nal debyyeping, ‘The “Ca family accepts source code from the TMS320C4 family Key applications of the ‘C4x family include 4-dimension) networking, and (elecommunications base stations iaumily combines «igh performance Cp jmivation ports 10. meet (HE Needs of sy Of Aoating point DSPs graphies, image pro ing. ‘The TMSI20CSx accepts source code from the 'Clx, "C2x, and "Clan gen erations, Faster cycle tines, on-chip memories, a parallel logic unit (PLU), rer overhead context switching, and block repeats differentiate the 'C5x. Ther an ANSLC compiler de for the °C5x, which translates the widely uved ANSI C language diteetly into hi ily optimized assembly language for the ‘C5x ‘The Texas Instruments ‘TMS320C54x is a family of 16-bit fixed-point DSPs. TMS320C54x processors tar volume, low-power applications, The first fa ily members were introduced in Japan in 1994. ‘The fastest processor in the fay runs at 160 MHz with a 1,6-volt core supply voltage. | ‘The TMS320C55x is a 16-bit fixed-point packaged DSP processor family {0 ‘Texas Instruments. It can execute up to two instructions in parallel, with instruction widths varying from 8 10 48 bits. Its based on the earlier 'TMS320C54 family ™ adds significant enhancements to the older processor architecture and instruction The ‘TMS320C55x is a partially assembly code ompatible with the TMS320C" ‘The TMS320CSSx is intended for use in applications that include cellular telepho™* modems and telecom infrastructure applications. ‘Th 15320055 imerfaes & rectly to SDRAM making it well suited for use in Fieetioe a oducts large memory buffers are required, eg, digital canrerss sesh om: wont b digital audio players. ameras and CD-ROM-based P ‘The TMS320C62xx is another fixed. It is based on a VLIW-like architect family, the TMS320C6201, is avaitaby (with 3.3-volt 1/0), and executes upto family members, the TMS320C6202 Instruments also offers the TM$320, architecture with support for floatin Point DSP processor from Texas Instumie a The first member of the TMs Aas 200 MHz. It uses an 1.8-volt COF ey tien MACS per second. Later. t¥° ae one « TMS320C621 1, were intr ei Sa ‘amily, which extends the TMS arithmetic and 64-bit data. Digital Signal Processors 113 edd DSP processors The PMSA20CO4N 6 4-16 DiC fied point family of package Instruments’ rom Tess fastramenty, The TMS.2OCOdy is an extension (0 Texas Her TMSA20C628 architeoture, Phe TMS 120C6dy family targets high axe stations, digital subseriber loops, mull line modems, ISDN modems, imaging, 3D Imaging applica ye video applications, and hint sonar systems, ‘Pho fastoxt TMS I20COdx family members execute at T 7 Gilly with 12 Volt core supply and A, V-vott VO, perfor nanee aPPHEALANS SUCH AN WH fexas Instruments! The TMS820C67s family 4s the Hoating-point version of pMsi20C626 family of fixed-point DSPs, Like the TMS320C62s, the TMS320C07% vascal on a VLIW atchitecttire which allows it 10 execute up 0 eight RISC-like actions per clock cycle. It is capable of exceuting all TMS320C62x instructions etic and 64-bit data, — The is b inst and has added) support for floating-point arith TMSA20CO7% family currently includes the TMS320C6701, the TMS320C67 1, the PMS320C6712, and the TMS320C0713, ‘The fastest TMS320C67x family member, the TMS320C6713, operates at 300 MHz and uses a La-volt core supply The TMS320C67x is upward compatible with the TMS320C62x; the TMS320C67X. can execute "TMS320C62x object code unmodified, | but the TMS320C62X cannot execute all ‘TMS320C67x instructions, The TMS320C67x is only partly compatible with the TMS320C6tx Texas Instruments’ next gener” ation of the fixed-point TMS320COxxx architecture, since the TMS320CO4x ex- fends the TMS320C62% instruction set with instructions that are net supported by the TMS320C07s. The ADSP-21xx is the fitst DSP processor family from Analog Devices. The family consists of a large number of processors based on a common 16-bit fixed- point architecture with a 24-bit instruction word, Analog Devices has the ADSP- 219x series, which offers sods of up 10 300 MIPS, as well as architectural enhancements. ADSP-21xx processors are targeted at modem, audio, PC multimedia, and digital cellular applications ; The Analog Devices ADSP-219x and ADSP-2199x are 16-bit fixed-point T with 24-bit instructions. The ADSP- 219x core used in both families is based on the ADSP-218x architecture, but f architectural enhancements. Most im- portant among theselare the addition of new addressing modes, an expanded address ction cache, and space,'the addition of an instruc xa deeper pipeline (six stages, com- pared to three on the ADSP-218x) to ‘enable faster clock speeds, The ADSP-219x iy mostly, but not completely jasembly source code upward compatible with the ADSP-218x ‘The ADSP-219x and is particularly focused on M0 ADSP-219x family and three MeN" ‘The ADSP-21xxx. “SHARC SHARC family. Like the ADSP-2106%, projected spe s thas a number ol ,w-cost applications; the ADSP-2199x ADSP-2199 target lo err eontrot applications. ‘There are three members inthe rrembers in the ADSP-2199x family. is the successor to the original ADSP-2106x ithe ADSP-21xxx is # 32-bit floating-point 11.4. Digital Signal Processing eA SP-21XXX targets DP processor with 48-bit instruction words. “The ADSI BE " ; 4 varity of applications including consumer, automotive, professional audio, wusttia, ang medical imaging applications. ‘The ADSP ZIXXX is Havoc Ase 2106x, but compared to the ADSP-2106x, the ADSP-21xxx has a duplicated data path ang widened on-chip buses to support SIMD (sing! f te ADSP-2106x assembly source code without mod, tage of the SIMD features, software written for the ry simi Je-instruction, multiple-data) process modified. 1xxx family members, the AD ferent memory system than previous ADSP-21 xxx family optimized for the older ADSP-21xxx n optimal performance on the ADSP. P-2136x, use a slightly longer The newest ADS! pipeline and a slightly di members. Due to these differen family members must be modifi 2136x. The Motorola DSP560xx family con: based on common core architecture, The tions, where its 24-bit word width improves dynamic range and redu noise compared to 16-bit fixed-point DSPs ‘The DSP56000 and DSPS6001 were the first members of the DSP560xx family, and were introduced in 1987. In 1992, Motorola introduced the DSP56002. In 1996, Motorola introduced the DSPS601 |, targeted specifically at audio decoding for DVD Wigital Versatile Disc) players. The DSP560xxhas a 24-bit, fixed-point data path that features an integrated MAC/ALU with a 24%24-48-bit multiplier, a 56-bit ALU, and two 56-bit accumu ators that each provide eight guard bits. The DSPS60xx data path uses fractional arithmetic in all operations, Because the DSP560xx does not have an integer multi- ply instruction, performing an integer multiply requires programmers to convert the result of a fractional multiply to integer format by shifting a sign bit into the accumt™ lator MSB. ; The DSPS60xx has a 24fit, fixed-point data path ‘that features an integrated MACIALU with a 24%24-48-bit thultiplier, a 56-bit ALU, and two 56-bit accumu Iators that each provide eight Lig bits. The DSPS60%x data path uses fractional arithmetic in all operations. THeidata path can sbift values one bit left or right. The data path provides support for 48-bit double-precition arithmetic. ‘The DSP560%* provides a carry bit which is updated by shifting and ALLU operations The Motorola DSPS61xx family is based on Motorola's DSPS610( 16-bit fixed” point DSP core and has an architecture, instruction set, and develo a envi ment similar 10 that of Motorola's DSPS60x 24-bit fixed. poinn a DSPS61xx processors execute at speed up t 30 MIPS and are ad tions such as digital cellular telephones and pagers, ee ‘The twoprocessors in the D$P561 xx famil Both offer on-chip voiceband 7 and D/A ciation Rane ve x veral 24-bit fixed-point DSPs \dio applica: quantization ts of s amily is popular in digital Digital Signal Processors 11.5 The DSPS6Ixx family’s data path is based on a 16x 16-32-bit multiplier inte- grated with a 40-bit accumulator’ providing eight guard bits. Multiply-accumulate operations execute in a single cycle. The multiply-accumulate unit supports signed, signed/unsigned, and unsigned multiplication, Although the data path does not in- clude a barrel shifter, a shifting unit provides accumulator shifts of one or four bits left, or one, four, or 16 bits right. The data path supports both convergent and biased rounding as well as saturation and output shifting The Motorola DSP96002 is a 32-bit IEEE standard 754 floating-point proces- sor with 32-bit integer support. It has an overall architecture similar to that of the Motorola DSPS60xx, family. The fastest versions of the processor execute at 20 MIPS. The DSP96002 has achieved popularity in some scientific and military appli- cations (especially those involving the fast Fourier transform), but has not found widespread use elsewhere. 11.2 Selecting Digital Signal Processors The factors that influence the selection of a DSP processor for a given application are architectural features, execution speed, type of arithmetic and word length: 1. Architectural features - Though most of the digital signal processors avail- able today have good architectural features the key features of interest include size of on-chip memory, special instructions and /O capability. In applica- tions where large memory is required on-chip memory is essential. It helps in accessing the data at high speeds and executing the program rapidly. For memory hungry applications (e.g. digital audio - Dolby AC-2, FAX/Modem, MPEG coding/decoding), the size of internal RAM should be high. For ap- plications that require fast and efficient communication or data flow with the Cutside world, /O features such interface to ADC and DACs, DMA capability and support for multiprocessing may be important. Depending on the applica- tion, a rich set of special instructions to support DSP operations are important, e.g. zero-overhead looping capability, dedicated DSP instructions, and circular addressing. 2, Execution speed - The execution speed of digital signal processors plays an important role in selecting the processor. The execution speed is measured in terme of the clock speed of the processor, in MHz, and the number of instruc- tions performed, n millions of instructions per second (MIPS) or in the case of floating point digital signal processors, in millions of floating point operations per second (MFLOPS). Comparison of execution speed of processors based eer cuch measures may not be meaningful. For example, the C62x family of provessors can execute a8 many as eight instructions ina eycle. The number 11,6. Digital Signal Processing ch cycle also differs from proces osmed 580F ty of operations performed in eed on the ¢ to ON Speed gp Thus, an alternative measure ts ; ution speed yt mn gorithms such as FFT, FIR and HR filters ~ 1 two most common type of arithmetic y ¢ of arithmetic - Th a ‘ sgt r ign | processors are fixed and floating point arithmetic, “ae digital signal proce nt : on are favoured in low cost, high volume applications (p Fi phones and computer disk drives). Floating arithmetic is the natypy s® for applications with wide and variable dynamic range requirements (4 - range may be defined as the difference between the largest and smal | nal levels that be represented). In general, floating processors 6, expensive than fixed point processors. 4. Word length - Processor data word length is an important parameter in psp, it can have a significant impact on signal quality. In general, the longer th word the lower the errors that are introduced by digital signal Processing. Fis, | point digital signal processors aimed at telecommunications markets tej y use a 16-bit word length (c.g. TMS320C54x), whereas those aimed atti quality audio applications tend to use 24-bits (e.g. DSP56300). In fixed pun audio processing, for example, a processor word length of at least 24-bis is required. In most floating point DSP processors, a 32-bit data size (244 mantissa and 8-bit exponents) are used for single-precision arithmetic. Mat floating point DSP processors also have fixed point arithmetic capability! often support variable data size, fixed point arithmetic. 11.3 Applications of PDSPs In this section, we will study the applications of PDSPs in both real-world and po divided into three categories: commu La acquisitions. typing applications ‘The applications are tion systems, multimedia, and control/dat 11.3.1 Communications systems PDSPs have been applied to im, include Caller ID , cordless hy TMS320C2x. TMS320¢50 i ” ' Se bitrate (1.4 Kbps), real-time y ener Pees ial plemented with a 16-bit fixed-point TMS320C5x PDSP. I ir as ery Speech recognition system . Modem poe itave for ronal Aigital communication, Digital baseber plication of PDSPs, System prototynien Ps are also suitable for a signal processing is another psp ae M8 can be accomplished using P3 Digital Signal Processors 11.7 low cost and ease of programming. Navigation using the Global Positioning System (GPS) has been widely accepted for commercial applications such as.electronic rection finding, The C30 is in charge of signal processing tasks such as correlation, FFT, digital filtering, decimation, and demodulation . For defense system applica- tion, a linear array of TMS320C30 as the front-end and a Transputer processor array as the back-end for programmable radar signal processing are developed . The PDSP front-end performs pulse compression, moving target indication (MT1), and constant false alarm (CFA) rate detection. 11.3.2 Audio Signal Processing The audible signals cover frequency range from 20 to 20,000 Hz, PDSP applications to audio signal processing can be classified into three categories according to the qualities and audible range of the signal professional audio products, consumer audio products, and computer audio multimedia systems Table 11.1 shows some of the applications of PDSPs in audio signal processing Table 11.1 Professional Audio Products DSP Algorithms Used Digital Audio Tape(DAT) ‘Compression techniques: MPEG, Graphic and Parametric Equalizers _| Digital FIR/IIR filters Multichannel Digital Audio Recorders | ADPCM, AC-3 Digital Audio Effects Processors _| Delay-Line Modulation CD Players and Recorders PCM Digital Amplifiers/Speakers Digital Filtering Digital Versatile Disk (DVD) Players _| AC-3, MPEG... — Satellite (DBS) Broadcasting AC3,MPEG | Home Theater Systems _| Ac:3, THX 11.3.3 Control and Data acquisition PDSP has found numerous applications in modern control and data acquisition ap- plications as well. Several control applications are implemented using Motorola DSP56000 PDSPs that function as both powerful microcontrollers and as fast dig- ital signal processors. Its 56-bit accumulator (hence the code name S6xxx) pro- | vides 8-bit extension registers in conjunction with saturation arithmetic to allow | 256 successive consecutive additions without the need to check for overflow con- | sition or limit eycles, The output noise power due to roundoff noise of the 24-bit cessing 11.8. Digital Signal Pro 56001 is 65.536 times tes than that for 16-bit Psp, DSO i xamples include @ PID (proportion, integration, nl ve % conioles, Deeyatie comtole «FTMS320C40 for distribured mon ~~ "OY Par controller and an a PDSP. . 41.3.4 Biometric Information Processing as in bioinformatics. Handwri ae hentication techniques is cheap, reliable ang s verification method can be p. i an .d applicatio tric autl uuthorized. Thi and security systems. The PDSPs also fin cation, one of the biomet intrusive to the person being 3 variety of entrance monitoring 11.3.5 Image/Video processing Existing image and video compression standards such as JPEG, and MPEG are bax: on the DCT (discrete cosine transforms) algorithm. The JPEG 2000 image cnn: standard is based on the discrete wavelet transform (DWT). These standards ten implemented in moder digital cameras and digital camcorders where PDSP5 il play an important role. Medical imaging has become another fast growing api tion area of PDSPs: It can be used as on-line data processor for processing magn: resonance imaging (MRI). It can perform real-time dynamic imaging such asthe diac imaging, angio-graphy (examination of the blood vessels using x-rays flim the injection of a radio-opaque substance), and abdominal imaging. Other applications of PDSPs include : Digital Cellular phones, Auton inspection, Vehicle collision avoidance, Voice-Over-Internet, Motor contol \* mail, Navigation equipment, Video conferencing, Toys, Games consoles, ‘Music thesis, Satellite communications, Seismic analysis, Secure communications, Tages answering machines, Sonar Modems (POTS, ISDN, cable, ...), Noise cancel Medical ultrasound and Patient monitoring. ‘ : 11.4 Von Neumann Architecture ee ie Neumann developed the first computer architecture that ae structions ee pte eR by codes residing in memory. In this, PF wie ca is most-widely used in maj | Only Memory (ROM). The Von Neuman? ot ee main pecbicSine, Re CPL ce ea es oe with Tt can be either reading an instruction reat data from/to the memor ie ry. Both ime Si f and data use the same signal cannot occur at the same time Since ws Digital Signal Processors 11.9 ye Data bus: Transports data between CPU and its peripherals. It is bidirectional CPU can read or write data in the peripherals bus: The CPU uses the address bus to indicate which peripherals it is 19 access and within cach peripheral which specific register. The address bus is unidirectional, The CPU always writes the address, which is read by the peripherals. addres trol bus: The bus carrier signals that are used to manage and synchronize the anges between the CPU and its peripherals, as well as that indicates if the CPU ts to read or write the peripheral <—{“Clock_] [Memory program’ data Peripherals ] 11.1 Von Neumann Architecture CPU "The main characteristics of the Von Ncurnann architecture is that it only possesses s system. The same bus carries all the information exchanged between the CPU the peripherals including the instruction codes as well as the data processed by CPU. 5 Harvard ..rchitecture term Harvard originated from the Harvard Mark | relay-based computer which «instructions on punched tape and data in relay latches. The Harvard architec- physically separates memories for their instructions and data, requiring dedicated for each of them. Instructions and operands can therefore be fetched simulta- sly. Most DSP processors use a mod buses; allowing access to filter ified Harvard architecture with two or three mem- coefficients and input signals in the same cycle. Since it possesses two independent bus systems, the Harvard architecture is ea- ic of simultaneous reading an instruction code and reading or writing a memory ripheral as part of the execution of the previous instruction. Since it has two ries, it is not possible for the CPU to mistakenly write codes into the program ry and therefore compute the code while itis executing. However it is less exible. It needs two independent memory banks. These two uurces are not interchangeable. d 11.10 Digital Signal Proces: Data’ memory Peripherals Program cre memory UE E —— ue Fig. 11.2. Harvard architecture The modified Harvard architecture used DSPs multiport memory that has sepa- rate bus systems for program memory and data memory and input/output peripherals, It may also have multiple bus system for program memory alone or for data mem- ory alone. These multiple bus system increases complexity of the CPU, but allow it to access several memory locations simultaneously, there by increasing the data throughput between memory and CPU. 11.6 VLIW Architecture The new architecture that has attracted a great deal of attention in the DSP community is the Very Long Instruction Word(VLIW). The Very long instruction word process- ing increase the number of instructions that are processed per cycle. It is essentially @ concatenation of several short instructions and require multiple execution units, 1"- ning in parallel, to carry out the instructions in a single cycle. The new architecture makes use of extensive parallelism whilst retaining some of the good features of pre vious DSP processors. VLIW architecture executes multiple instructions/cycle and use simple, regular instruction sets, The Very Long Instruction Word (VLIW) processor consists of architecture that reads a relatively large group of instructions and executes them at the same time. Th? VLIW processor combines many simple instructions into a single long instruction word that uses different registers. A language compiler or pre-processor sepa/a® program instructions into basic operations that are performed by the processo™ parallel . These operations are placed into a "very Jong instruction word” that #! Processor can then disassemble, and then transfer each operation to an approptis® execution unit. For example, the group might contain four instructions, and the piler ensures that those four instructions are not dependent on each other so they °™2 be executed simultaneously. Otherwise, it places " yhere necessary. "no-ops” (blank instructions) #® group where necessary. Dipital Signal Processors FEEL | , Ee U xecution 7 J] Fig. 11.3 VLIW architecture, ntages of VLIW architecture - Increased performance . Better comp targets . Potentially ea ier fo program . Potentially scalable Can add more execution uni VLIW instruction. . allow more instructions to be packed into the tages of VLIW architecture . New kind of programmer/ compiler complexity . Program must keep track of instruction scheduling, . Increased memory use . High power consumption . Misleading MIPS ratings Multiply Accumulate Unit (MAC) Mattiply- Accumulate (MAC) operation is the basis of many digital signal pro- algorithms, notably digital filtering. The term “digital filter” refers to an algo- by which a digital signal or sequence of numbers is transformed into a another ‘of numbers termed the output digital signal. Digital filters involve signals 11.12 Digital gnal Processing in the digital domain (discrete-time signals) and are used extensively in ay such as digital image processing, pattern recognition, and spectra analysis, Jy 1 1 FIR filters are preferred in lower order solutions, and since they do not feedback, they exhibit naturally bounded response, They are si and require one RAM location and one coefficient for each order For FIR filters the output of the filter is given by PPlication, a ploy lement, N-l (rr) = S> n(k)h(n — k) (ity k=0 where x(n) is the input to the filter, h(n) is the impulse response of the fit and y(n) is output of the filter. The output of an FIR filter is simply a finite length Weighted sum of the present and previous inputs to the filter. Hence to perform filtering through above equation, the minimum requirement is to quickly ‘multiply {wo values, and add the result. To make it possible, a fast dedicated hardware MAC, using either fixed point or floating point arithmetic is mandatory. Characteristics of typical fixed point MAC include 1. 16 x 16 bit 2's complement inputs 2. 16 x 16 bit multiplier with 32-bit product in 25 ns 3. 32/40 bit accumulator In the TMS320C50, for example, the FIR equation can be efficiently imple- mented using the instruction pair: RPT NMI MACD_ HNMI, XNMI 1, Multiplies the data sample, 2(n~ k), in the dat sfficients , a he coe! /(ke), in the program memory; “a aa 2. Adds previous product t the accumulator: 3. Implements the unit delay, 5) mbolized by Asa, sample . data a(n — k), up to update the Pes delay tine,” PY Shifting the i Digital Signal Processors 11S jon. Multiply-Accumutate (MAC) Fu MAC speed applies both 10 finite impulse response (FIR) and finite impulse yase (IIR) filters. The complexity of the filter response dictates the number MAC juions required per sample period. pultiply accumulate step performs the following: 4 Reads 4. 16-bit sample data (pointed to by a register) 8 Increments the sample data pointer by 2 8 Reads a 16-bit coefficient (pointed to by another re Increments the coefficient registet pointer by 2 a Sign Multiply (16-bit) data and coefficient to yield a 32-bit result 8 Adds the result to the contents of a 32-bit register pair for accumu! ‘The TM$320CS4x multiply-accumulate (MAC) unit performs a 16 x 16 — fractional multiply-accumulate operation in a single instruction cycle. The multi- 1 supports signed/signed multiplication, signed/ansigned multiplication, and un- ed/unsigned multiplication. These operations allow efficient extended-precision metic, Many instructions using the MAC unit can optionally specify automatic \i-to-nearest rounding. 8 Pipelining ~ st of the early microprocessors execute instructions entirely sequentially. execution of first instruction the next one starts. The problem with this is that it xtremely inefficient, since the second instruction has to wait until all the steps of instruction are completed. To improve the efficiency, advanced microprocessors digital signal processors use an approach called pipelining in which different ses of operation and execution of instructions are carried out in parallel. That is jodern processors the first step of execution is performed on the first instruction, then when the instruction passes to the next step, a new instruction is started. The s in the pipeline are often called stages. The basic action of any microprocessor can be broken down into a series of four iple steps. They are 1. The Fetch phase(F) in which the next instruction is Fetched from the address stored in the program counter. 2. The decode phase (D) in which the instruction in the instruction register is decoded and the address in the program counter is incremented. —<—— 11.14 Digital Signal Processin& from the data buses and also y, TMtes 3, Memory read (R) phase reads the data to the data buses. struction currently in the instrug xecutes the in ton 4, The Execute phase (X) € “ register and also completes the write process In a modem processor. the above four steps get repeated oe and over again up, til the program is finished executing These are, in fact, the Four stages in a classe rrsae stages could be said t0 represent One phase in he RISC pipeline. Each of the et lifecycle” of an instruction. An instruction starts Out In the fetch phase, moves tothe decode phase, then to the memory read phase, and finally to the execute phase. Each phase takes a fixed, but by no means, equal amount of time eaking down its instruction into a series of discrete pleted in sequence by specialized hardware. Be- cause an instruction’s lifecycle consists of four fairly distinct phases, the instruction execution process is divided into a sequence of four discrete pipeline stages, where shase in the standard instruction lifecycle. Now each pipeline stage corresponds to a pl that che number of pipeline stages is referred to asthe pipeline depth. So a four-sage pipeline has a pipeline depth of four. To understand the pipelining in a better way, let us assume that the number of stages is four and the execution time of an instruction is four nanoseconds lf we assume the time taken for each stage in the instruction is equal, then the time taken for each stage is one nanosecond. So our original single-cycle processor’ four-nanosecond execution process is now broken down into four discrete, ee tial pipeline stages of one nanosecond each in length. At the beginning of the first nanosecond, the first instruction enters the fetch stage. After that nanosecond is Con plete, the second nanosecond begins and the first instruction moves on to the decode stage while the second instruction enters the fetch stage. At the start of the third nsanosecond, the first instruction advances to the memory read stage. the second it- struction advances tothe decode stage, and the third green instruction enters the feet stage. At the fourth nanosecond, the first instruction advances to the execution stage the second to the memory read stage, the third to the decode stage, and the fo! the fetch oe After the fourth nanosecond has fully elapsed and the fifth ond starts, the first instruction has passed from the pipeline and is now fis ecuting. Thus we can say that atthe end of four nanoseconds (= four clock eyes) the pipelined processor depicted below has completed one i a araeee edaieimecsttticrivelines ipleted one instruction. £ |, the pipeline is now full and ompletine instructions at a rate of one instructis the processor can begin & instruction per Se jon/n5 ont nites io atoel oh nanosecond. This 1 instructi pl a four-fold improvement over the si 4 compli” Oe df 025 neructiodalia (Ordnance ee every 16 nanoseconds). The pipelining stages for different DSPs are shown in table nia. nowt # 7 Pipelining a processor means br pipeline stages which can be com Es ‘TMS320C54x has two additional phases : pre-fetch (PF) phi hic ase wi Digital Signal Processors 11.15 dress of the instruction to be fetched and the access phase (A) which reads the address ofthe operand and modify the auxiliary registers and stack pointer if required. Instruction 1 TR Px Instruction 2 dy R, | x, b Instruction 3 RTD] R | X% Instruction 4 BR [Do] R |X Fig. 11.4 Four stages of TMS320C54x Table 11.2 Pipeline in different TMS320 Processors DSP processor | Pipeline phases i TMS320C2000 | F-D-R-X (4 levels) TMS320C3x__| F-D-R-X (4 levels) TMS320C5x__ | F-D-R-X (4 levels) ‘TMS320C54x_| PF-F-D-A-R-X ( levels) Pipelining leads to dramatic improvements in system performance. The more es that we can break the pipeline into, the more theoretical speed we can get from For example, let’s suppose it takes 12 clock cycles to handle all the steps to process instruction. In theory, if you use a 4-stage pipeline, your maximum throughput is struction every 3 cycles. But if you use a 6-stage pipeline, maximum throughput instruction every 2 cycles. 9 Architecture of TMS320C50 TMS320C5x generation of the Texas instruments TMS320CS0 digital signal ssor is fabricated with CMOS integrated circuit technology. It is a fixed point, bit processor running at 40 MHz. The single instruction execution time is 50 nsec. architectural design is based on the combination of advanced Harward architec. + onchip peripherals and onchip memory. Moreover the TMS320CS0 has a highly ialized instruction set. These features enable the operational flexibility and the ice speed, which together with the cost effectiveness make the signal processor as Suitable device for a wide range of applications. ‘The TM$320C50 has a programmable memory map (address range is 224K x 16 Words), which can vary for each application. Onchip memory include 10K words 11.16 Digital Signal Processing of the RAM and 2K words of the ROM. All CSx DSPs have the same CPU gn, however they have different on-chip memory configuration and on-chip Petiphen, Table below provides a comparison of the devices in the C5x generation, ‘On Chip Memory (16 bit words) | Device DARAM SARAM | ROM VO ports | Data | Datatreg | Datasreg | Prog | Serial | Paralgy TM5320C50 | 544 | __512 9K 2K" | 2 OK CsI 344 | 512 1K 8K" | 2) eK | cs2 $44] 512 - 4K | 1 | 6aK C53 $44] 512 3K 167 | 2 oK cs7s S44 512 6K 2 | 2° | 64K+HPr * ROM boot loader available # TDM serial port not available © Include auto differed serial par (BSP) but TDM serial port not available. HPI - Host Port Interface The functional block diagram of TMS320CS5x is shown in Fig.11.1. It canbe divided into four sub blocks. They are 1. Bus Structure 2. Central Processing Usit 3. Onchip Memory and 4. Onchip peripherals. 11.9.1 Bus Structure Separate program and data buses in the advance Harvard architecture of “C5x m3 imize the processing power and provide a high degree of parallelism. Many DS? applications are accomplished using single-cycle multiply / accumulate instructic® with a data move option. For example, when data is multiplied, a previous can be loaded into, added to or subtract from the accumulator and, at the same tim. new address can be generated. In addition the ‘C5x included the control mechanis to manage interrupts, repeated operations and function calling. The ‘C5x architecture has four buses: (i) Program bus (PB) (ii) Program address bus (PAB) (iii) Data read bus (DB) (iv) Data read address bus (DAB) memory to the CPU. ‘The program address bus provides address to read and write. Digital Signal Processors 1117 Merny UY Pahl a(t fe fof seve Fp ot [soir fe [Sse] ja a] ef fs {LES} oUt al (StS to] cake I Memory Te meng [Lo i Hef snp Samaesing)] Sessoms caw ine {4 aus q * Ntiner, y aH ' ch * Accumulator wat a aaa er a Aan |] Asuna my mation sitar Riess} existe ince eneraron fatthmetic |] + Shiters Se a, batt ut [instruction] ARAU) 4 cru, Data Bus Fig. 11.5 The functional block diagram of TMS320C50 (Texas instruments) The data read bus interconnects various elements of the CPU to data memory ipace. The data read address bus provides the address to a 8 the data memory space, } 11.9.2 Central Processing Unit ts of the following elements: ing Unit cor | fre ‘Central Proces (i) Central arithmetic logic unit (CALU) (ii) Parallel logic unit (PLU) iii) Auxiliary register arithmetic unit (ARAU) (iv) Memory mapped registers (¥) Program controller 11.18 Digital Signal Processing 11.9.2.1. Central Arithmetic Logic Unit (CALU) erform 2's complement arithmetic. Tt consists of, following : 16-bitx 16 bit parallel multiplier 32 bit accumulator (ACC), 32.bi an buffer (ACCB), product register (PREG), additional shifters at the output of both the accumulator and the product register (PREG) “The 16x 16 bit hardware multiplier is capable of computing a signed or an yy, signed 32- bit product in a single machine cycle. Al multiply instructions exces the MPYU (multiply unsigned) instruction perform a signed multiply operation jn the multiplier, The 16-bit temporary register O(TREGO) holds one of the operand for the multiplier, and the other input is from the data bus or the program bus, The product register holds the product. The LT (load TREGO) instruction normally loads TREGO to provide one operang from the data bus and the MPY instruction provides the second operand for mut. plication operations. A multiplication also can be performed with a short or log immediate operand by using the MPY instruction with an immediate operand, ‘The 32 bit ALU and accumulator implement a wide range of arithmetic and logic ecute in a cycle. One input to ALU comes from gister of the ‘The CPU uses the CALU to Pp functions, the majority of which ex accumulator and the other input can be furnished from the product re multiplier, the accumulator buffer (ACB) or the output of the scaling shifter, The result of operations performed in ALU are stored in accumulator. The scaling shifter has a 16 bit input connected to the data bus and a 32-bit output connected to the ALU. This scaling shifter produces a left shift of 0 to 16 bits on the input data. A 5-bit register TREGI specifies the number of bits by which the scaling shifter should shift or the shift count is specified by a constant embedded i the instruction word. 1.9.2.2. Parallel Logic Unit (PLU) ‘The Parallel logic unit (PLU) is a second logic unit, that executes logic opera! data without affecting the contents of the accumulator. It can directly set, cleats toggle multiplier bit in a status / control register on any data memory locatio! executing a logical operation on the two operands as defined by the instructiO™ i PLU writes the result to the same data memory location from which the first operant? was fetched. testo" a. After 119.23 Auxiliary Register Arithmetic Unit (ARAU) The ‘C5x consists of a register file containing eight auxiliary register caro-AR?) cach of 16 bit length, a 3 bit auxiliary register pointer (ARP) and an unsign® 1 Te ALU. ‘The auxiliary register file is connected to the auxitiagjilann unit (ARAU). The auxiliary registers are used for indirect addressing of the da Digital Signal Processors 11.19 memory oF for temporary data storage. The ARs and the ARP ean be loaded from data memory, the ACC or the PREG or by an immediate operand, defined in the jnsiruction. The contents of the ARs can be stored in the data memory or used as inputs to the CALU. 11.9.24 Index register (INDX) The 16-bit index register (INDX) is used by the ARAU as a step value to modify e address in the ARs during indirect addressing. The INDX can be added to or subtracted from the current AR on any AR update cycle. The INDX can be used to increment or decrement the address in steps larger than 1. 11.9.2.5 Auxiliary Register compare register (ARCR) The 16-bit ARCR is used for address boundary comparison. It limits blocks of data ind supports logical comparisons between the current AR and ARCR in conjunction 1.9.2.6 Block Move address Register (BMAR) ¢ 16-bit BMAR holds an address value of the source destination space of a block ove. The BMAR can also hold the address of an operand in program memory for a ultiply accumulate operation. 1.9.2.7 Block repeat Registers (RPTC, BRCR, PASR, PAER) RPTC: The repeat count register is 16-bit length. It hold the repeat count in a repeat single operation and is loaded by the RPT and RPTC instructions, BRCR: The 16-bit block repeat counter register holds the count value for the block repeat feature. This value is loaded before a block repeat operation is initiated. PASR: The block repeat program address start register indicates the 16-bit ad- dress where the repeated block of code starts. PAER: The block repeat Program Address End Register indicates the 16-bit ad- dress where the repeated block of code ends. 1.9.2.8 Auxiliary Registers (ARO-AR7) © ight 16-bit auxiliary registers (ARO-AR7) can be accessed by the CALU and odified by the ARAU or the PLU. The primary function of the ARs is to provide 6-bit address for indirect addressing to data space. The ARs can also be used as nerai purpose registers or counters. 11,20. Digital Signal Processing 11.9.2.9 Instruction Register IREG) ar (IREG) hold the opeode of the instruction The 16-bit Instruction register (REG) brine cuted. 11.9.2.10 Interrupt Register (MR, IFR) The 16-bit Interrupt mask register (IMR) individually masks specific inten, Sa required time. The 16 bit interrupt flag register indicates the current Status of interrupts 11,9.2.11 Status Register The two 16 bit status register contain status and control bits for the CPU. 11.9.2.12 Memory mapped Registers The 'C5x has 96 registers mapped into page 0 of the data memory space (00-5R This memory mapped register space contains various control and status registers in cluding those for CPU, serial port, timer and software wait state generator. Adit ally, the first 16 /O port locations are mapped into this data memory space, allving them to be accessed either as data memory using single word instruction or a I0 locations with two word instruction. 11.9.2.13 Program Controller The Program Controller contains lo tions, manages the CPU pipe line, the conditional operations. It consi gic circuitry that decodes the operational ins” Stores the status of CPU operations and det ists of the following elements. (i) Program Counter (ii) Status and control registers id (iii) Hardware Stack (iv) Address generation logic (v) Instruction register, 11.9.2.14 Program Counter Program i or ext progam none eee (PC) which contains the address nd memory either onchip of 4 Off chip, watch instructions. The PC addres an instruction is loaded into the trae Program address bus. Tee f start the next instruction fetch cycle," T8ister (IREG). 4. Dipital Signal Processors 1124 11.9.2.15 Hardware Stack The stack is a 16 bit wide and 8 levels de ep and is accessible via the PUSH and POP. instructions. The stack is used during interrupts and subroutine to save and restore the PC contents. 11.9.2.16 Program memory Address neration It contains the code for application and holds table information and immediate operands. The program memory is accessed only by the program address bus. ‘The address for this bus is generated by the program counter when immediate operands are accessed. struction and long 11.9.2.17 Status and Control Registers The "C5x has four status and control register. The; ircular buffer Process mode status register, status registers STO and STI trol register. 11.9.2.18 Circular Buffer Registers (CBSR1, CBSR2, CBER1, CBER2, CBCR) The *C5x support two concurrent circular buffers. The registers CBSR1 and CBSR2 are 16-bit registers that hold the address when the circular buffer starts. The registers, CBERI, CBER2 indicate the address when the circular buffer ends. The 16-bit cir, cular buffer control register (CBCR) controls the operation of these circular buffers. 11.9.2.19 Process Mode Status register (PMST) ‘The PMST resides in the memory mapped register space of data memory page 0 and can be saved in the same way as any other data memory location. The PMST e; acted upon directly by the CALU and the PLU. 11.9.2.20 Status Registers (STO and ST1) The status registers can be stored into data memory and loaded from data memory, 12 *C5x status to be saved and restored for subroutines. ‘The LST thereby allowin; , to STO and ST! and the SST instruction reads from them, instruction writes 11.9.3 On-chip Memory The 'C5x architecture has a total memory address range of 224K words 16bits The memory space is divided into four memory segments, 11,22. Digital Signal Processing 64 K- word m memory space 64 K- word 64K - word 32 K- word local data memory space input / output ports Global data memory space The 62 K word program space contains the instruction to be executed Tata memory space stores data used by the instruction, The gy 1/0 port space interfaces to external memory mapped peripherals The 32 x” " elobal data space can share data with other processors within the system, The |: word local Wn cludes on-chip memory of "C5 i (i) Program read only memory ii) Data/program single access RAM (SARAM) (iii) Data/program dual access RAM (DARAM) 1.9.3.1 Program Memory ‘The °C5x DSP carry a 16 bit on-chip maskable programmable ROM. The Progen memory can reside both on and off chip. If the pin MP/MC is high, the device sor and it starts running from off-chip memory. Ifthe ‘onfigured as a microcomputer and it stats ring running, the device configuration cane bit in the PMST. configured as a microproc pin MP/MC is low, the device is from on-chip ROM. Once the program i changed by setting or clearing the MP/ 11.9.3.2 Data/Program Dual access RAM (DARAM) All the °C5x devices have 1056 words of DARM. The DARADM is divided ite three individually selectable memory blocks. 512 word data or program DARAM block BO, 512-word data DARAM block B1, and 32-word data DARAM boak® DARAM Block BO can be configured by software as data or program memory ig be configured into program space by setting the CNF bit in ST1. DARAM blo and B2 are always configured as data memory. The DARAM can be rea to in the same machine cycle. d and weit 11.9.3.3 Data/Program single Access RAM (SARAM) AIL°C5x DSP s except the ‘C52 carry a 16 bit on-chip single access RAM oe a sizes which is divided into 2K word and 1K -word block that continues IP ye or data memory space. The SARAM can be configured by software in 0" ways. (i) All SARAM configured as data memory (ii) All SARAM configured as program memory gal ia Digital Signal Processors 11.23 (iit) SARAM configured as both data memory ad program memory The SARAM requires a full machine cycle to perform a read or a write. 11.9.3.4 On Chip Memory Protection he program met y fe . . The program memory protection feature prevents an instruction fetch from off-chip memory from reading or writing on-chip program memory. ‘This feature can be used a“ the on-chip ROM to secure program code that is stored in off-chip memory. 11.9.4 On-chip Peripherals ‘The on-chip peripherals interface connected to the °C5x CPU include (i) Clock generator (ii) Hardware timer. (iii) Software programmable wait state generators (iv) General purpose /O pins | (v) Parallel /O ports | (vi) Serial port interface i (vii) Buffered serial port (viii) Time-division multiplexed (TDM) serial port (ix) Host port interface (x) User unmaskable interrupts 11.9.4.1 Clock Generator The clock generator consist of an internal oscillator and a phase lock loop (PLL) ircuit. The clock generator is driven by a crystal oscillator circuit or by an external clock source. When the PLL option is selected, the CPU clock is multiplied by a specific factor and generate a low frequency clock than that of CPU. 1.9.4.2 Hardware Timer The timer is an on-chip down counter that can be used to periodically generate CPU interrupts. It can be stopped, restarted, reset or disabled by specific status bits The timer operation is controlled via the timer control register (TCR), the timer counter register CTIM), and the timer period register (PRD), The timer is driven by a 4-bit prescaler, The timer clocks at a rate between 1/2 and 1/32 of the machine cycle rate, depending upon the timer’s divide down ratio. 11.24 Digital Signal Processing 11.9.4.3 Softw: Programmable wait state generators The software programmable wait state generators can extend ey es a. conver Sane OS ye to seven machine cycles. This operation provides, ome SNE Meas in, a) ‘CSx to external devices that do not satisfy the full speed access time requinc tte the °C5x, Devices that require more than seven wait states can be imtertaceg Md Fp Ace Using hardware READY line, When all external accesses are configured tg 2et0 wai . en ltt the internal clocks to the wait state generators are shutoff. Shutting of 3 int ) re i emg clocks allow this circuitry to 1un with lower power consumption, na 1.9.4.4 General Purpose I/O Pins The °C5x has two general purpose control input (BIO) and the exten Peripheral device status. A branch State of the BIO input. ‘The XF pin signals to external devices via software. It is set high by the sere XF instruction and reset low by the CLRC XE instruction, Pairs that are software controlled, They ae tran mal flag output (XF) pin. The BIO can be conditionally execu 11.9.4.5 Parallel VO Ports The "Cx has 64K parallel /O ports, Sixteen of the 64K 1/0 ports are men) mapped in data page 0. Each of the VO ports can be addressed by the IN or he OUT instruction or any instruction that Teads ot writes a location in data mews) Space. Access to memory shaped 1/0 Space are distinguished from program andi accesses by the IS signal Soing low; the DS Signal is not active, even though te0 Ports is actually accessed through data space. 11.9.4.6 Serial Port Interface memory mapped registers sre ist XSR), The data receive registe? 2 of the incoming serial data from out {ansmit register (DXR) hols 4 on the data transmit shift regis Digital Signal Processors 11.25 The data receive shift register (RSR) holds the incoming serial data from the al data receive (DR) pin and controls the transfer of the data to the DRR. The data transmit shift register (XSR) controls the transfer of the outgoing data the DXR and holds the data to be transmitted on the serial data transmit (DX) 9.4.7 Buffered Serial Port (BSP) available on the “C56 and "C57 devices. It operates on either auto buffering or buffered mode, When operated in nonbuffered mode, the BSP functions same as basic standard serial port. The autobuffered mode allows high speed data transfer reduces interrupt latencies. .4.8 TDM Serial Port M serial port interface is implemented on the ‘C50, “C51 and "C53 devices. It ates in either TDM or non-TDM mode. When operated in non-TDM mode, TDM serial port also functions as the basic standard serial port. It allows to municate serially with up to seven devices. The TDM port. therefore, provides le and efficient interface for multiprocessing applications. 4.9 Host Port Interface (HPI) ‘HPI 1s available on the "LCS7 and 'C5S7S devices. It is an 8-bit parallel port used face a host device or host processor to the ‘CSx. Information 1s exchanged be- n the "CSx are host device through on chip. "CSx memory that is accessible the host anu the CSx. .4.10 User maskable interrupts °C5x has four external, maskable user interrupts (INTI-INT4) that external de- can use to interrupt the processor, and one external nonmaskable interrupt ). Internal interrupts are generated by the timer (INT), the serial port (RINT, -TRNT, BRNT, BXNT). the host port (HINT) and the software interrupt in. (TRAP, NMI and INTR). Interrupt priorities are set so that reset (RS) has izhest-priority and INTG has the lowest priorty. The NMI has the second highest 10 Addressing modes ‘Addressing modes in TMS32050 are ) Immediate addressing ) Indirect addressing 11.26. Digital Signal Processing J ing (iii) Register addressin (iv) Memory mapped register addressing (v) Direct addressing (vi) Circular addressing mode. 1110.1 Immediate addressing Immediate addressing is used to handle constant data. It allows the a operate on an actal value. The data can be either a 16-bit constant orga 7.9 or 13. Depending on the length of the data, the addressing mode is na 2s long immediate or short immediate addressing mode, In long imineiiag ing the data is contained in a portion of the bits in a single Word insiucig - assembly code level, the developer uses a “#” prefix to specify immedi i" Example LD #80n,A: The instruction loads an immediate value 80h into te mulator. * 11.10.2 Indirect addressing The indirect address mode uses the auxiliary registers (ARs) to hold the atiess | of operands in memory. In indirect addressing, any location in the 64K-vouds | memory space can be accessed using a 16-bit address contained in AR. Each aig register (ARO-AR7) provide flexible and powerful indirect addressing. To specific auxiliary register, the auxiliary register pointer (ARP) is loaded witha from 0 to 7 for ARO through AR7 respectively. There are seven types of i addressing (i) Auto increment (ii) Auto decrement (iii) Post indexing by adding the contents of ARO (iv) Post indexing by subtracting the contents of ARO (v) Single indirect addressing with no increment (vi) Single indirect addressing with no decrement (vil) Bit reversed addressing 11.103 Register addressing ; , ci The register addressing mode uses operands in CPU registers eithe" 7 js? as with a direct reference to a specific register, or implicitly, with its de the a8 ‘rinsically refers certain registers. That is in this addressing mo | | a Digital Signal Processors 11-27 one of 8V0 Spe # | purpose memory mapped registers in CPU. The block move ates s register (BMAR) and the dynamic bit manipulation register (DBMR). In ei- se, operand reference is simplified hecause 16bit values can be used without yg a full 16-bit operand address or immediate value Mor example the instructions BLDP, BLDP, BLPD, MADD and MADS instrue- the BMAR to address an operand in progr jon 8 ‘am memory, 11104 Memory mapped register addressing Memory mapped register addressing is used to access efficiently the CPU and on chip ripheral registers. It operates like the direct addressing except that the upper 9-bits ofthe address that is accessed are assumed to be 0s. This allows us to address the memory mapped registers of data page 0 directly without the overhead of changing iheDP or auxiliary register, Only the seven lower bits of the complete code, including opcode and operand can be represented using a single 16-bit word. The following instructions operate in the memory mapped register addressing mode LAMM - Load Accumulator with Memory Mapped Register LMMR > - Load Memory Mapped Register SAMM_ - Store Accumulator in Memory Mapped Register SMMR_ - Store Memory Mapped Register 1.105 Direct addressing mode DP (9 bit) dma 7 bit UE Specify actuat location inside ea 128 word page 7 2 1 0 7 a Fe Pagen Fig. 11.6 Direct addressing allows the CPU to access operands by specifying an offset | ftom a base address that is defined in data pointer. DP (Data pointer) is a 9-bit field Contained in the status registers (STO). In this mode the address of the operand is — : cessine 11,28 Digital SIE , mory address (dma) with the 9 spe it da egress is placed on an its ory rained by concat atin pit dat amon tS aid a interna gi . ar it field, 1 data page poi Ve epce dat pointer | 9 tH : t Points t0 one one tata memory address Du and t qebit address jn the instruction points tg , r data memory pages : possible data ™ eer) page: ds within that le 11.10.6 Cireular addressing mode he most sophisticated 5x addressing mode. Many FIR filtering can use circular by re ntains most recent data eee a ro sing is te ation an plementation of circularad lat addres 128 wor Circular 4 rithms such &s memory to impl cessed. Five dedi ing. They are cular Buffer 1 Start Register convolution, Corre F Fement a Sling WINGO which co cated registers are allocated for im cBSR! - Circ CBSR2 ~ Circular Buffer 2 Start Register CBER] ~ Circtlar Buffer 1 End Register CBER2 - Circular Buffer 2 End Register cBCR - Circular Buffer Control Register load the starting address of ci ‘The registers CBSR 1 and CBSR 2 are used t {and CBER 2 are used to load the end adress lar buffer and the registers CBER cucular buffer. The 8-bit CBER enables and disables circular buffer operation. At ditionally, one ofthe auxiliary registers (ARS) is used as the pointer into the circulr buffer. first we load the start and end addresses into the corre To define circular buffer, Seg huffer repsters, Next a value is loaded between the start and end ri) 1 obi aoe cular buffer into an AR and the corresponding circular buffer eneble B® the CBCR is set. 11.11 Instruction Set In this sect section we will study briefly about the instructions used in TMS320C30. 11.111 Summary of instructions Accumulator Memory Reference Instructi ions Mnemonic Description Mnemonic _Deeription ABS ro catry Absolute val bi value of ACC; zes i ADCB Add ACCB and carry bit to ACC 4 Add data me Add data meno ats With left shift to ACC Y value, With left shift of 16, to ACC

Vous aimerez peut-être aussi