Application Note
Application Note
Application Note
Enhancing the Computational Performance of the C2000™
Microcontroller Family
Table of Contents
1 Introduction.............................................................................................................................................................................2
2 Floating-Point Unit (FPU)....................................................................................................................................................... 3
3 Control Law Accelerator (CLA)..............................................................................................................................................4
4 Trigonometric Math Unit (TMU)............................................................................................................................................. 5
5 Fast Integer Division Unit (FINTDIV)..................................................................................................................................... 6
6 Viterbi, Complex Math, and CRC Unit (VCU)........................................................................................................................ 7
7 Summary................................................................................................................................................................................. 9
8 References.............................................................................................................................................................................. 9
Revision History.......................................................................................................................................................................10
List of Figures
Figure 1-1. System Block Diagram with Math Enhancements.....................................................................................................2
Figure 4-1. TMU Performance Improvement for Park Transform Example................................................................................. 6
Figure 6-1. VCU Performance Improvements Compared to Software-Only Implementations.................................................... 8
List of Tables
Table 2-1. FPU Performance Improvements............................................................................................................................... 3
Table 3-1. CLA Performance Improvements................................................................................................................................4
Table 3-2. CLA Performance for FFT...........................................................................................................................................4
Table 4-1. TMU Supported Instructions Summary.......................................................................................................................5
Table 4-2. TMU Performance Improvements...............................................................................................................................6
Table 5-1. FINTDIV Performance Improvements.........................................................................................................................7
Trademarks
C2000™ are trademarks of Texas Instruments.
All trademarks are the property of their respective owners.
SPRY288C – APRIL 2020 – REVISED DECEMBER 2021 Enhancing the Computational Performance of the C2000™ Microcontroller 1
Submit Document Feedback Family
Copyright © 2021 Texas Instruments Incorporated
Introduction www.ti.com
1 Introduction
Real-time control systems require fast and efficient processing, with latency kept to a minimum in order
to maintain stability and boost overall performance. In addition, the increasing sophistication of modern
motor systems, power electronics, smart grid technology, robotics, and similar applications require the central
processor to keep up with numerous tasks simultaneously.
The C2000 family of microcontrollers (MCUs) from Texas Instruments addresses these challenges with an array
of integrated on-chip hardware math enhancements that dramatically increase the performance of the MCU in
many real-time applications. The five key enhancements are:
• Floating-Point Unit (FPU)
• Control Law Accelerator (CLA)
• Trigonometric Math Unit (TMU)
• Fast Integer Division Unit (FINTDIV)
• Viterbi, Complex Math, and CRC Unit (VCU)
At the center of each C2000 MCU lies a fast fixed-point central processing unit (CPU) that on its own provides
excellent 32-bit processing capabilities. The FPU provides seamless integration of floating-point hardware
into the CPU. To augment this further, the CLA provides an independent floating-point CPU operating at the
full speed of the device and it is designed to perform control law computations with minimal latency. This
effectively doubles the raw computing capabilities of the device. The TMU provides hardware support for
common trigonometric math functions, while the FINTDIV enables fast integer division operations. The VCU
adds hardware support for communications, complex math, and CRC calculations. This paper provides an
overview of each of these math enhancements.
2 Enhancing the Computational Performance of the C2000™ Microcontroller SPRY288C – APRIL 2020 – REVISED DECEMBER 2021
Family Submit Document Feedback
Copyright © 2021 Texas Instruments Incorporated
www.ti.com Floating-Point Unit (FPU)
SPRY288C – APRIL 2020 – REVISED DECEMBER 2021 Enhancing the Computational Performance of the C2000™ Microcontroller 3
Submit Document Feedback Family
Copyright © 2021 Texas Instruments Incorporated
Control Law Accelerator (CLA) www.ti.com
Another key benefit of the CLA, over hardware-based control law implementations, is flexibility. The CLA is a
fully software programmable solution where developers can freely modify their control system without the time
and high cost required to redesign a hardware-based solution. CLA in addition to these benefits can also perform
compute intensive functions such as FFT (both complex and real). Table 3-2 provides the details of the cycles
Table 3-2. CLA Performance for FFT
Function Type Cycles
FFT Complex 256 pt 27323
512 pt 64538
1024 pt 133881
Real FFT 512 pt 37537
1024 pt 85012
The CLA is able to minimize latency because it has direct access to the various control peripherals such as
the ADC and PWM modules. Utilizing this low-latency architecture and capability to directly access the various
control peripherals provides a fast trigger response. The CLA is able to read the ADC result register on the same
cycle that the ADC sample conversion is completed. This “just-in-time” reading of the ADC reduces the sample
to output delay and enables faster system response for higher frequency control loops.
4 Enhancing the Computational Performance of the C2000™ Microcontroller SPRY288C – APRIL 2020 – REVISED DECEMBER 2021
Family Submit Document Feedback
Copyright © 2021 Texas Instruments Incorporated
www.ti.com Trigonometric Math Unit (TMU)
Programming the CLA consists of initialization code and tasks. A task is similar to an interrupt service routine,
and once started it runs to completion. Each task is capable of being triggered by a variety of peripherals
without CPU intervention. This makes the CLA very efficient since it does not use interrupts for hardware
synchronization, nor must the CLA do any context switching. Compared with the traditional interrupt-based
scheme, the CLA approach eliminates jitter, and furthermore the execution time becomes deterministic. It
supports eight independent tasks, each of which is mapped back to an event trigger, such as a timer or the
availability of an ADC result. Separate tasks can be used to support multiple control loops or phases at the same
time.
Some C2000 devices feature an enhanced version of the CLA with the option of running the lowest priority
task as a background task. Once triggered, it runs continuously until it is terminated or reset by the CLA or
MCU. The remaining tasks in priority order can interrupt the background task when they are triggered. If needed,
portions of the background task can be made uninterruptible. Typical uses of the background task include
running continuous functions, such as communications and clean-up routines.
Another key benefit of the CLA, over hardware-based control law implementations, is flexibility. The CLA is a
fully software programmable solution where developers can freely modify their control system without the time
and high cost required to redesign a hardware-based solution.
4 Trigonometric Math Unit (TMU)
The TMU is an extension of the FPU and enhances the instruction set of the C28x+FPU by efficiently executing
trigonometric and arithmetic operations that are commonly used in control system applications. Similar to the
FPU, the TMU is an IEEE-754 floating-point math unit tightly coupled with the CPU. However, where the
FPU provides general-purpose floating-point math support, the TMU focuses on accelerating several specific
trigonometric math operations that would otherwise be quite cycle intensive. These operations include sine,
cosine, arctangent, divide, and square root. Some C2000 devices include an enhanced version of the TMU
for supporting nonlinear PID applications. Additional instructions have been added for efficient computation of
logarithm and inverse exponent operations which are used in the nonlinear control law. The TMU instructions
include:
Table 4-1. TMU Supported Instructions Summary
Operation C Equivalent Operation
Multiply by 2*pi a = b * 2pi
Divide by 2*pi a = b / 2pi
Divide a=b/c
Square Root a = sqrt(b)
Sin Per Unit a = sin(b*2pi)
Cos Per Unit a = cos(b*2pi)
Arc Tangent Per Unit a = atan(b)/2pi
Arc Tangent 2 and Quadrant Operation Operation to assist in calculating ATANPU2
Logarithm a = LOG2(b)
Inverse Exponent a = 2-|b|
The TMU uses the same pipeline, memory bus architecture, and FPU registers as the C28x+FPU, thereby
removing any special requirements for interrupt context save or restore.
The C2000 compiler has built-in support that allows automatic generation of the TMU instructions. The user
writes code in C using math.h functions, and the compiler uses the TMU instructions, where applicable,
instead of run-time support library calls. This results in significantly fewer cycles and dramatically increases
the performance of trigonometric operations.
The TMU can have a significant impact on many commonly used real-time control algorithms such as:
• Park and Inverse Park Transforms
• Space Vector Generation
• dq0 and Inverse dq0 Transforms
• FFT Magnitude and Phase Calculations
SPRY288C – APRIL 2020 – REVISED DECEMBER 2021 Enhancing the Computational Performance of the C2000™ Microcontroller 5
Submit Document Feedback Family
Copyright © 2021 Texas Instruments Incorporated
Fast Integer Division Unit (FINTDIV) www.ti.com
For example, a Park Transform typically takes anywhere from 80 to more than 100 cycles to execute on the
FPU. With the TMU a Park Transform takes only 13 cycles, yielding an 85 percent improvement as compared to
without the TMU.
In a typical system application, such as digital motor control (AC induction and permanent magnet) and 3-phase
solar applications, about a 1.4 times performance improvement can be achieved using the TMU over just the
FPU.
Table 4-2. TMU Performance Improvements
Number of Execution Cycles
FPU TMU
Application Min/Max Min/Max Improvement
Motor AC Induction 888/952 593/670 1.42x (vs FPU)
Motor Permanent Magnet 783/786 547/592 1.32x (vs FPU)
Solar 3-Phase 1351/1358 985/983 1.38x (vs FPU)
An existing C28x design can realize an immediate advantage using the TMU without the need to rewrite any
code. Simulation-based generated code can realize the same benefits. Portability is maintained since the same
code can be used on TI MCUs with and without the TMU support.
5 Fast Integer Division Unit (FINTDIV)
The FINTDIV extended instruction set optimally supports fast division operations commonly found in adaptive
control systems for scaling parameters based on a variable. All instructions execute in a single cycle and three
types of integer division are supported (Truncated, Modulus, Euclidean) of varying data type sizes (16/16, 32/16,
32/32, 64/32, 64/64) in unsigned or signed formats. Truncated format is the traditional division performed in C
language (where “/” is the integer, and “%” is the remainder); however, the integer value is non-linear around
zero. Modulus and Euclidean formats are more appropriate for precise control applications because the integer
value is linear around the zero point, and this avoids potential calculation hysteresis. Both the Modulus and
Euclidean divisions are supported by C intrinsics, and the C28x compiler supports all three division formats for
all data types. Since the FINTDIV uses the existing FPU register set to carry out the FINTDIV operations, there
are no special considerations relating to interrupt context save and restore.
6 Enhancing the Computational Performance of the C2000™ Microcontroller SPRY288C – APRIL 2020 – REVISED DECEMBER 2021
Family Submit Document Feedback
Copyright © 2021 Texas Instruments Incorporated
www.ti.com Viterbi, Complex Math, and CRC Unit (VCU)
(1) FINTDIV implements 64-bit integer division that is optimized in a fixed number of cycles for deterministic behavior. Without the
FINTDIV acceleration enabled, 64-bit integer division is implemented with generic CPU instructions and the number of cycles can vary
significantly based on the value of the numerator and denominator.
SPRY288C – APRIL 2020 – REVISED DECEMBER 2021 Enhancing the Computational Performance of the C2000™ Microcontroller 7
Submit Document Feedback Family
Copyright © 2021 Texas Instruments Incorporated
Viterbi, Complex Math, and CRC Unit (VCU) www.ti.com
• Complex filters are used to improve data reliability, transmission distance, and power efficiency, and are
commonly used in other various signal processing applications. The VCU can perform a complex I and Q
multiply with coefficients (four multiplies) in a single cycle, as compared to approximately 10 cycles without
the VCU. In addition, the VCU can read/write the real and imaginary parts of 16-bit complex data to memory
in a single cycle.
• CRC algorithms are used for verifying data integrity over large data blocks, communication packets, or code
sections. The VCU can perform 8-bit, 16-bit, 24-bit, and 32-bit CRCs completely in the background, offloading
the main C28x CPU. For example, the VCU can compute the CRC for a block length of 10 bytes in 10 cycles,
as compared to approximately 250 cycles without the VCU. A CRC result register contains the current CRC
and is updated each time a CRC instruction is executed. This simplifies the CRC calculations and access to
the final CRC value.
Devices with the C28x+VCU add an extended set of registers and instructions to the standard C28x architecture,
which are used to support the acceleration of communications-based algorithms. The additional registers are:
nine result registers, two traceback registers, a configuration and status register, and a CRC result register. The
VCU performs fixed-point operations using the same existing instruction set format, pipeline, and memory bus
architecture as C28x.
Programming the VCU is made easy with TI’s C2000Ware software suite. TI provides a complete library of
C-callable assembly functions. These functions are implemented using the VCU instruction set to optimize
efficiency and minimize overhead. TI also provides higher-level functions to support PLC communications
standards such as PRIME and G3.
Some devices utilize a dedicated cyclic redundancy check unit (VCRC) rather than the full featured VCU for
applications not requiring Viterbi decoding or complex math support. This enhanced VCRC is an extension of
the C28x CPU and it includes registers and instructions to support CRC algorithms. CRC algorithms provide
a straightforward method for verifying data integrity over large data blocks, communication packets, or code
sections. The VCRC can perform 8-bit, 16-bit, 24-bit, and 32-bit CRCs, and it is capable of computing the
polynomial code checksum for a block length of 10 bytes in 10 cycles (a byte of data in a single cycle). For
custom CRC polynomials the execution time increases to three cycles. A CRC result register contains the
current CRC, which is updated whenever a CRC instruction is executed.
8 Enhancing the Computational Performance of the C2000™ Microcontroller SPRY288C – APRIL 2020 – REVISED DECEMBER 2021
Family Submit Document Feedback
Copyright © 2021 Texas Instruments Incorporated
www.ti.com Summary
7 Summary
Utilizing the high performance C28x CPU along with the advanced hardware math enhancements described in
this paper, the TI C2000 family of MCUs provides the advanced processing power required for today’s complex
real-time control systems. Combining these enhancements with the various control-optimized peripherals, such
as high-speed ADCs and high-resolution PWMs, engineers can minimize latency while increasing system
performance. TI provides a comprehensive set of development tools and software that enable engineers to
quickly design, test, and produce extremely reliable control systems. A wide range of TI C2000 MCUs are
available to solve the most demanding control system requirements.
The C2000 MCU family includes a wide array of devices that have been designed for both high performance and
low-cost real-time control applications. Based on an extremely fast C28x CPU, advanced control peripherals,
and integrated analog functions, the C2000 MCUs can reduce system cost while increasing system reliability.
Combining the CPU with the CLA running concurrently can effectively double the throughput of the device.
Additionally, some family members feature a dual-core microcontroller, and when combining each CPU with its
own CLA, the device has the capability for delivering the equivalent of up to four times the performance of a
single CPU. Conversely, other family members feature a high level integration of control and analog peripherals
for reducing system complexity and offers greater efficiency for cost-sensitive designs.
The C2000 family of MCUs is ideal for applications requiring advanced real-time signal processing such as
industrial drives, digital power, renewable energy, smart sensing, white goods appliances, motor control, electric
vehicle and hybrid electric vehicle (EV/HEV).
8 References
For additional information about the C2000 MCU family, see the TI web site at:
• http://www.ti.com/c2000
The availability of the various math units and peripherals on each device can be found in the following document:
• Texas Instruments: C2000 Real-Time Control Peripheral Reference Guide
For detailed information about the CLA, see the device-specific Technical Reference Manual.
The extended instruction sets for the FPU, TMU, FINTDIV, VCRC, and VCU can be found in the following
document:
• Texas Instruments: TMS320C28x Extended Instruction Sets Technical Reference Manual
Details about the FPU, TMU, and FINTDIV intrinsics for providing ease of software development can be found in
the following document:
• Texas Instruments: TMS320C28x Optimizing C/C++ Compiler v20.2.0.LTS User's Guide
SPRY288C – APRIL 2020 – REVISED DECEMBER 2021 Enhancing the Computational Performance of the C2000™ Microcontroller 9
Submit Document Feedback Family
Copyright © 2021 Texas Instruments Incorporated
Revision History www.ti.com
Revision History
NOTE: Page numbers for previous revisions may differ from page numbers in the current version.
Changes from Revision B (April 2020) to Revision C (November 2021) Page
• Updated the numbering format for tables, figures and cross-references throughout the document...................2
• Updates were made in Section 3........................................................................................................................4
10 Enhancing the Computational Performance of the C2000™ Microcontroller SPRY288C – APRIL 2020 – REVISED DECEMBER 2021
Family Submit Document Feedback
Copyright © 2021 Texas Instruments Incorporated
IMPORTANT NOTICE AND DISCLAIMER
TI PROVIDES TECHNICAL AND RELIABILITY DATA (INCLUDING DATA SHEETS), DESIGN RESOURCES (INCLUDING REFERENCE
DESIGNS), APPLICATION OR OTHER DESIGN ADVICE, WEB TOOLS, SAFETY INFORMATION, AND OTHER RESOURCES “AS IS”
AND WITH ALL FAULTS, AND DISCLAIMS ALL WARRANTIES, EXPRESS AND IMPLIED, INCLUDING WITHOUT LIMITATION ANY
IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT OF THIRD
PARTY INTELLECTUAL PROPERTY RIGHTS.
These resources are intended for skilled developers designing with TI products. You are solely responsible for (1) selecting the appropriate
TI products for your application, (2) designing, validating and testing your application, and (3) ensuring your application meets applicable
standards, and any other safety, security, regulatory or other requirements.
These resources are subject to change without notice. TI grants you permission to use these resources only for development of an
application that uses the TI products described in the resource. Other reproduction and display of these resources is prohibited. No license
is granted to any other TI intellectual property right or to any third party intellectual property right. TI disclaims responsibility for, and you
will fully indemnify TI and its representatives against, any claims, damages, costs, losses, and liabilities arising out of your use of these
resources.
TI’s products are provided subject to TI’s Terms of Sale or other applicable terms available either on ti.com or provided in conjunction with
such TI products. TI’s provision of these resources does not expand or otherwise alter TI’s applicable warranties or warranty disclaimers for
TI products.
TI objects to and rejects any additional or different terms you may have proposed. IMPORTANT NOTICE
Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265
Copyright © 2022, Texas Instruments Incorporated