0% found this document useful (0 votes)
24 views8 pages

SW Dev WP

The Tensilica Software Development Toolkit (SDK) provides a comprehensive environment for developing application code for Cadence Tensilica DPUs, featuring a graphical IDE, optimizing compiler, and various profiling and debugging tools. It enables customization of processors to enhance performance and energy efficiency, supporting multi-processor subsystems and advanced optimization techniques. The toolkit also includes a Vectorization Assistant and modeling tools for efficient SoC design and simulation.

Uploaded by

ziguoxut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views8 pages

SW Dev WP

The Tensilica Software Development Toolkit (SDK) provides a comprehensive environment for developing application code for Cadence Tensilica DPUs, featuring a graphical IDE, optimizing compiler, and various profiling and debugging tools. It enables customization of processors to enhance performance and energy efficiency, supporting multi-processor subsystems and advanced optimization techniques. The toolkit also includes a Vectorization Assistant and modeling tools for efficient SoC design and simulation.

Uploaded by

ziguoxut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Tensilica Datasheet

Tensilica Software Development Toolkit (SDK)


Quickly develop application code

Features Software Development Tools for Cadence


• Cadence Tensilica Xtensa Xplorer Integrated
® ® ® ™ Tensilica DPUs
Development Environment (IDE) with full graphical user If you’ve looked at Tensilica’s website or processor product
interface (GUI) briefs, you know that you can extend Tensilica’s Xtensa
• Mature, optimizing Xtensa C/C++ Compiler (XCC) dataplane processors (DPUs) —adding instruction sets,
• Operator overloading support in C for custom data types execution units, and processor I/O interfaces—to match your
specific application needs.
• Pipeline-modeling, cycle-accurate instruction set simulator
(ISS) with fast TurboXim By customizing the DPU for a particular application, you
can often get significantly lower energy consumption and
• GNU profiler, linker, debugger, assembler, and utilities
10-100X performance increases. This level of performance
• Multi-processor subsystem simulation, debug, profiling, and efficiency is often essential in the SoC dataplane. By
and memory partitioning customizing the DPU, you create a core that’s uniquely yours,
• Vectorization Assistant for locating code loops that need giving you extra protection in today’s highly competitive
restructuring to enable vectorization marketplace.
• Project management tools
• Performance and energy analysis tools Souce
C/C++, OS,
• Use Mentor Graphics NucleusPLUS, Express Logic’s Libraries

ThreadX, Micrium’s uC/OS-II, T-Engines’ µT-Kernel, or the


Linux operating systems

Compile Profile
Benefits XCC ISS
• Easy-to-use Xtensa Xplorer IDE based on familiar Eclipse
platform
• Small, high-performance code from ‘C’ source
Simulate/Debug
– Compiler offers state-of-the-art inter-procedural and ISS, XTMP,
XTSC
alias analysis
– Automatic vectorization of operations for Xtensa SIMD
Figure 1: Tensilica’s Eclipse-based Xtensa Xplorer IDE serves as the
processors cockpit for custom processor development.
– Automatic Flexible Length Instruction eXtension (FLIX)
instruction bundling for multi-issue Xtensa very long
The Xtensa Processor Developer’s Toolkit is the integrated
instruction word (VLIW) cores design environment that delivers powerful tools to your
• Detailed pipeline analysis guides optimizations from cycle/ desktop to guide you through the processor customization
pipeline-accurate ISS process. You’ll find that Tensilica has created the most
• Fast TurboXim simulation for up to 50 million instructions advanced, powerful, and easy-to-use tools for processor
per second customization.

• Vectorization Assistant guides code optimizations for


better SIMD performance
• Easily and quickly evaluate multi-processor subsystems
• Familiar GNU-based toolchain
Tensilica Software Development Toolkit (SDK)

The Processor Developer’s Toolkit is required


for any design team that is using Tensilica’s
Designer-Defined Set/Choose
TIE instructions to modify the processor. If Instructions (optional) Configuration Options
you are using an Xtensa processor with no
modification or only changes to configuration
Xtensa Processor Generator
options, you do not need the Processor
Developer’s Toolkit—you’ll only need the
Processor Generator Outputs
Software Developer’s Toolkit. Application
Source C/C++
Designers get a compiler, linker, assembler, Hardware System Modeling/Design Software Tools

and debugger for their particular processor


EDA Xplorer IDE Compile
hardware. As the base instruction set Scripts
RTL ISS
GUI
architecture (ISA) is always present, third-party to All Tools
tools can still be used even when the core is Fast Function Executable
Simulator (TurboXim)
customized for a particular application. GNU Software Toolkit
(Assembler, Linker,
Synthesis XTSC Debugger, Profiler) Profile Using
SystemC ISS
A Comprehensive System System
Block Place and Route Modeling XTMP Xtensa C/C++ (XCC)
Now in their 11th generation, Tensilica’s tools C-Based
Compiler
System
for software development are highly refined Verification Modeling Choose Different
Pin-Level C Software Libraries Configuration or
and provide developers with a complete, Chip Integration/ Cosimulation Develop New
comprehensive solution for both system design Co-Verification Operating Systems Instructions

and software development, as illustrated in


Figure 3.
To Fab/FPGA System Development Software Development

Figure 2. Tensilica’s proven methodology automates the creation


of customized processors and matching software tools.

Figure 3. The editor includes many useful functions to speed up code generation and debugging

www.cadence.com 2
Tensilica Software Development Toolkit (SDK)

Project manager Xtensa compiler toolchain


When you start a new software project or modify an ongoing Tensilica’s Xtensa C/C++ compiler is based on the GNU
project, the project manager organizes all related project source compiler front-end with a highly customized code generation
files and allows you to create new classes, files, or folders. New back-end (derived from the Open64 project) targeting the
projects can be managed using the built-in project-management compact 16/24-bit Xtensa ISA. The Xtensa C/C++ compiler also
and version-control mechanisms, which eliminate the need to includes support for the TIE language, including intermediate
manually maintain makefiles and provide a clean environment for representation and optimization. The Xtensa C/C++ compiler
new project builds. The project manager allows you to set all tool additionally supports Tensilica’s FLIX, allowing from 4-byte to
options and flags (build properties) for each build target within 16-byte VLIW instruction bundles of up to 30 simultaneous
each individual project. Optionally, you can create unmanaged instructions limited only by opcode availability.
projects that allow total user control over build target properties.
The Xtensa C/C++ compiler employs sophisticated multi-level
Multi-processor subsystems optimizations such as function inlining, software pipelining, static
single assignment (SSA) optimizations, and other code generation
The Xtensa Xplorer IDE provides multi-processor projects that techniques to reduce code size. All of these optimizations increase
allow the designer to create a subsystem of heterogeneous cores code execution speed and reduce code size. Based on industry-
with shared memory. The memory partitioning for each core and standard benchmarks, the Xtensa C/C++ compiler generates the
the shared memory area are specified in the GUI to make that task highest code density when compared to compilers for other 32-bit
simple. RISC architectures.
Simulation of the resultant system is launched from within the IDE The Xtensa C/C++ compiler provides the advanced optimization
and allows the software developer to debug, profile, and partition techniques known as feedback-directed optimization and
their code very quickly. interprocedural analysis.
Source code editor Feedback-directed optimization is a two-step process where code
The C-code editor allows you to efficiently create and modify is instrumented on the first pass of compilation and run using a
your code using rich editing capabilities. Recognition of language representative input data set to produce a file containing profiling
features such as keywords, comments, declarations, and strings information. On the second pass, this profiling information is
are eased through syntax highlighting. Symbol indexing allows used to optimize application code to further reduce branch
fast program navigation including find declaration, find definition, delays, improve inlining, and minimize the impact of register
and find type. Other features in the editor that speed up coding spills. The Xtensa C/C++ compiler will optimize an application’s
include code completion, auto indenting, and quick diff. Block critical areas for performance while optimizing the remainder of
comment/uncomment is useful when debugging or profiling large the code for space. The Xtensa C/C++ compiler is also capable
source files, as is text folding for hiding areas of text that you don’t of hardware feedback-directed optimization, in which the user’s
need to view. Other standard views, such as source outline, make target hardware platform can run the instrumented code to
target, and problems are also available. similarly provide application-specific optimization. Hardware
feedback-directed optimization is a much faster method and the
optimization is performed on the actual target system as opposed
to simulated in the ISS.

Original Ported

Only the text in the


red box needs to be
changed to convert the
source to fixed point

Figure 4. Operator overloading makes porting existing code easier

www.cadence.com 3
Tensilica Software Development Toolkit (SDK)

Interprocedural analysis is an optimization method that looks preprocessor in the GNU tools, and the flags for the preprocessor
globally across all associated files of an application at link remain the same. The assembler and linker also utilize the same
time. Global optimization is a much more powerful method flags as the GNU versions of the tools.
than optimizing locally within an expression or procedure.
Interprocedural analysis examines relationships across function Xtensa debugger
calls, and can perform optimizations that cannot be achieved The debugger allows you to target either the pipeline-/cycle-
with a local scope. Interprocedural analysis eliminates unneeded accurate ISS or TurboXim when no hardware is available, or
computations, improves function inlining, and performs alias external probes to connect with hardware development boards.
analyses that may not be performed by less sophisticated As shown in Figure 5, the GUI-based debugger allows full system
optimization techniques. visibility into your project; it controls program execution and
The Xtensa C/C++ compiler supports operator overloading on provides views to variables, breakpoints, memory, registers, etc.
custom data types in the ‘C’ programming language (without the Source and assembly code can be made visible simultaneously
overhead that is often associated with it). while debugging an application, and either code window can be
single stepped. The debugger interoperates seamlessly with the
Tensilica is well known for its ability to let designers add custom other development tools (compiler toolchain, ISS) to allow rapid
instructions and data types to improve performance. If an code development for Xtensa processor systems.
application needs to work on 56-bit data, a designer can define a
custom 56-bit data type with a single line of code. The designer Cores in multi-processor subsystems can be debugged and
can also specify what regular ‘C’ operators, such as ‘+’ and ‘*’, stepped synchronously or asynchronously with the other cores.
should do when using this data type. The overloading is always With user-defined data formatting, any data value can be
done with zero overhead so the resulting binaries are always re-formatted to display a more user-friendly representation. This is
efficient. particularly effective when dealing with non-native ‘C’ types such
Porting and creating ‘C’ application code that uses custom data as fixed point or vector data or when certain bits represent status.
types is easier because standard ‘C’ operator syntax can be used. This data can be displayed in the Xtensa Xplorer IDE however you
This makes the code easier to read and simpler to port via changes want using familiar print formatting. Datatypes that are defined by
in the ‘C’ header files rather than throughout the source code Tensilica in its DSP engines have default formatting that will show
itself. See Figure 4. the user-friendly representations automatically. See Figure 6 for an
example.
The rest of the software development toolchain is based on
standard GNU tools. The compiler front-end remains similar to the

Figure 5. The Xtensa debugger allows full visibility into the system

Figure 6. Data can be reformatted the way you want it

www.cadence.com 4
Tensilica Software Development Toolkit (SDK)

Figure 7. The profiling window allows performance metric analysis while optimizing code “hot spots”

Profiling tools data from hardware instantiated in an FPGA or ASIC. You can
track performance data such as instruction execution count,
Code profiling is an extremely important tool for optimizing the
subroutine calls, subroutine total cycles, cache performance,
performance of your application code. The Xtensa Xplorer IDE
etc. While viewing functions in the profiling view, you can also
enables you to view profiling results generated by Tensilica’s
simultaneously view the assembly code in the disassembly view
pipeline-accurate ISS (see Figure 7). Additionally, for much
and the source code in the editor. The call graph view enables
faster and more accurate profiling, you can generate profiling
you to view the entire application hierarchy’s caller and callee

Figure 8. The pipeline viewer helps you understand instruction stalls and latency issues

www.cadence.com 5
Tensilica Software Development Toolkit (SDK)

functions. For those inner loop optimizations, the graphical Vectorization Assistant
pipeline view (Figure 8) shows any pipeline inefficiencies and
bubbles that may be occurring. Vectorization is the process of transforming the flow of your
code (from the usual handling of one data item at a time) into a
Profiling of multi-processor subsystems shows each core side by parallel loop that operates on multiple data items at once. The
side for easy load assessment and re-partitioning guidance. See Xtensa compiler is capable of performing this transformation
Figure 9. automatically, but you can help it exploit implicit parallelism in your
code by eliminating certain patterns of data access that prevent
successful vectorization.
Cycles
Figure 10 shows how the Vectorization Assistant finds and displays
15,000 loops in your code that could be “vectorized” by the compiler if
the source was tweaked. Locating areas in the code that have not
ICache Miss Cycles
DCache Miss Cycles
been vectorized, but could be, can take a long time looking at
10,000 Uncached Instruction Fetch profiles, assembler, and pipeline views—then you have the task
Uncached Load Cycles of doing the optimization to make it vectorize. In a few clicks, the
Interlock Cycles
Vectorization Assistant gets you to the loops in your source code
Branch Delay Cycles
5,000
Total Cycles that would benefit the most from vectorization.

The list of messages shown is initially sorted by the number


0 of processor cycles used by a given loop, such that the most
core0 core1
expensive loops appear first. You can focus the view on a
particular file, folder, or project; you can filter out certain classes
Figure 9. Multi-core profiling
of messages that are not currently interesting; and you can hide
messages that you do not wish to address at the moment.

Figure 10. Vectorization Assistant helps find areas that can be improved

www.cadence.com 6
Tensilica Software Development Toolkit (SDK)

SoC Modeling Modeling with XTMP


Many SoC designs today employ multiple processors. As SoC
XTMP
design becomes more complex, new methods to describe, debug,
and profile overall system performance need to be employed. Device A Device B
Model RTL Model
Unfortunately, most software development tools vendors do not
provide pre-silicon simulation environments for multi-processor
SoCs. Tensilica offers two modeling tools: XTensa Modeling FIFO
Protocol (XTMP) for modeling in C and XTensa SystemC (XTSC) for Producer Core Consumer Core
modeling in SystemC.

Both tools are powerful additions to Tensilica’s software


System
development toolkit. They provide an Application Programming RAM ROM Memory RAM ROM
Interface (API) to the ISS, allowing fast and accurate simulation of
SoC designs incorporating one or more processor cores. Running
up to 10,000 times faster than RTL simulators, the XTMP/XTSC
XTMP_core consumer = XTMP_coreNew(”consumer”, config);
environments are potent tools for software development and XTMP_core producer= XTMP_coreNew(”producer”, config);
SoC design. Both tools give you the ability to rapidly explore SoC
XTMP_queue fifo = XTMP_queueNew(”fifo”, width, depth);
partitioning alternatives and hardware/software performance XTMP_connectQueue(fifo, producer, “FIFO_OUT”, consumer, (”FIFO_IN”);
tradeoffs. See Figure 11.
Figure 12. XTMP provides a simulation environment using instantiations
Modeling of multi-processor-capable ISS, memory models, and connectors

Use XTMP/XTSC to instantiate and connect models/RTL


Modeling with XTSC

XTSC
Xtensa ISS Compile and
Device A Device B
Libraries Link on Host SystemC RTL SystemC
Application Pin-Level XTSC
Code
User Device FIFO
Models Run Producer Core Consumer Core

Figure 11. Using the ISS with XTMP or XTSC for modeling
System
RAM ROM Memory RAM ROM

XTMP and XTSC are used for simulating homogeneous or


heterogeneous multi-processor design subsystems as well as
complex uniprocessor architectures. Use the Xtensa Xplorer IDE’s
xtsc-run
multi-processor project to instantiate multi-processor subsystems -set_queue_parm=depth=4
(or do it manually) and optionally connect them to custom -create_core=Producer
-connect_core_queue=Producer,FIFO_OUT,fifo
peripherals and interconnects. You can create, debug, profile, -create_core=Consumer
-connect_queue_core=fifo,FIFO_IN,Consumer
and verify combined SoC and software architectures early in the
design process. As the simulator operates at a higher level than Figure 13. With its pin-level modeling capabilities, XTSC allows
HDL simulations, simulation time is cut drastically. See Figures 12 co-simulation with Verilog
and 13.

XTMP and XTSC are integrated into the Xtensa Xplorer IDE, which Modeling of local and system memory
automates the creation and development of multi-processor XTMP and XTSC allow memory modeling of both local and system
subsystem simulations. For XTMP, simulations are described in memory. System memory can have programmable latencies
standard C code, which you can modify to allow more complex specified for different transaction types, allowing an accurate
systems and additional simulator control if required. For XTSC, system simulation for analyzing performance tradeoffs. Memory-
simulations are described in standard SystemC code. In addition, mapped peripherals may be included in an XTMP/XTSC system
you have full visibility into all aspects of the simulation through simulation, and functions are provided to connect the processor to
the extensive API. Designers can use a single Xtensa Xplorer IDE peripheral devices.
to debug all simulated cores for additional visibility. The Xtensa
Xplorer IDE manages all of these connections for you in its IDE for
simplicity and easy viewing of any core.

www.cadence.com 7
Tensilica Software Development Toolkit (SDK)

Multi-threaded environment Simulation Speed Modeling Tool Benefits


An XTMP or XTSC simulation runs in a multi-threaded 20 to 50 MIPS 1
Standalone ISS in Fast functional
environment, with each processor running in its own thread. Core TurboXim mode simulation for
threads can be run asynchronously or synchronized through events rapid application
using the attached debugger. Another option is to run all cores in testing
lock-step, cycle-by-cycle mode. If one core stops on a break, all 1.5 to 40 MIPS1,2 XTMP or XTSC in Fast functional
cores stop until it resumes. XTMP and XTSC have many options TurboXim mode simulation for
for implementing, controlling, and displaying results of system rapid application
simulations deploying multiple cores, memories, and user-defined testing at the
devices. system level
Pin-level SystemC modeling with Verilog 800K to 1,600K Standalone ISS Software
cycles/second in cycle-accurate verification and
Additionally, Tensilica provides a link between its pipeline-accurate, mode cycle accuracy
cycle-accurate ISS and the leading Verilog simulators. Designers
can now run pin-level SystemC co-simulations of Tensilica DPUs 600K to 1,000K XTMP in cycle- Multi-core
in their native Verilog simulators with pin-level XTSC, as seen in cycles/second2 accurate mode subsystem
Figure 13. modeling and
cycle accuracy
Relative performance for different modeling levels 350K to 600K XTSC in cycle- Multi-core
The wide range of choices allows customers to trade off speed cycles/second2 accurate mode subsystem
versus model accuracy and pick the best type of model for the modeling with
task at hand. Fast functional simulation gives the equivalent of SystemC interfaces
20-50 MIPS and is accurate to the CPU clock cycle level, while a and cycle accuracy
full Verilog gate-level model may run only 10-100 cycles/sec but 1K to 4K cycles/ Verilog RTL Functional
provides the accuracy needed to verify detailed timing. The range second simulation verification,
of modeling options and their estimated relative performance is pipeline-cycle
summarized in Table 1. accuracy and
high visibility and
accuracy
10 to 100 cycles/ Verilog gate-level Timing
second simulation verification,
pipeline-cycle
accuracy and
high visibility and
accuracy.

1. TurboXim mode simulation speed is an estimate for relatively long-running


application programs (1 billion instructions or more)
2. Simulation speed is an estimate for a single Xtensa core in XTMP or XTSC
3. Running on a typical low-cost dual-core workstation

Table 1. Modeling Performance

Summary
The Xtensa Xplorer IDE is a complete GUI-based collection of tools
that allows the software developer to create code for systems
based on Xtensa processors. From project implementation to code
generation to analysis, the Xtensa SDK enables you to achieve
fast time-to-market while employing one of the most efficient
32-bit architectures available today. Xtensa processors lower
total system costs and help design teams construct extremely
high-performance system architectures.

Cadence Design Systems enables global electronic design innovation and plays an essential role in the
creation of today’s electronics. Customers use Cadence software, hardware, IP, and expertise to design
and verify today’s mobile, cloud, and connectivity applications. www.cadence.com

© 2014 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo, Tensilica, and Xtensa are registered trademarks
and Xplorer is a trademark of Cadence Design Systems, Inc. in the United States and other countries. All other trademarks are the property of
their respective owners. 08/14 2768 SA/DM/PDF

You might also like