pg414 Vdu en Us 1.0
pg414 Vdu en Us 1.0
Chapter 1: Introduction......................................................................................... 6
Features...................................................................................................................................6
IP Facts.....................................................................................................................................7
Chapter 2: Overview.................................................................................................8
Navigating Content by Design Process............................................................................... 8
Core Overview........................................................................................................................ 9
Applications.......................................................................................................................... 11
Unsupported Features.........................................................................................................11
Licensing and Ordering....................................................................................................... 12
Chapter 8: Overview............................................................................................... 58
Software Prerequisites........................................................................................................ 59
Section I
Chapter 1
Introduction
The AMD LogiCORE™ IP H.264/H.265 Video Decode Unit (VDU) core is a Hard IP in AMD
Versal™ AI Edge and AMD Versal™ AI Core series. The VDU core has multiple instance decoder
cores (upto four cores) and are implemented as Hard IP in AMD Versal™ AI Edge and AMD
Versal™ AI Core series. Each instance of decoder core in VDU is capable of decompressing single
or multiple H.264/H.265 compliance video streams simultaneously at resolutions of up to
3840×2160 pixels at 60 frames per second (4K UHD at 60 Hz). In a multiple stream scenario, it
can decode H.265 only, H.264 only or both compliant streams simultaneously. Refer to the
product tables in Versal ACAP AI Edge Series Product Selection Guide (XMP464) and Versal AI Core
Series Product Selection Guide (XMP452) to know about the number of instance decoder cores
inside VDU for different device variants.
Features
The features of each instance of the decoder core in AMD LogiCORE IP H.264/H.265 Video
Decode Unit core are as follows:
IP Facts
AMD LogiCORE™ IP Facts Table
Core Specifics
Supported Device Family1 AMD Versal™ Adaptive SoC (AMD Versal™ AI Edge and AMD Versal™ AI Core
Series)
Supported User Interfaces AXI4-Lite, AXI4
Provided with Core
Design Files Encrypted RTL
Example Design Not Provided
Constraints File Xilinx Design Constraints File (XDC)
Simulation Model Not Provided
Supported S/W Driver Included in PetaLinux
Tested Design Flows2
Design Entry AMD Vivado™ Design Suite
Synthesis Vivado Synthesis
Support
Release Notes and Known Issues Master Answer Record: 000034162
All Vivado IP Change Logs Master Vivado IP Change Logs: 72775
Support web page
Notes:
1. For a complete list of supported devices, see the AMD Vivado™ IP catalog.
2. For the supported versions of third-party tools, see theVivado Design Suite User Guide: Release Notes, Installation, and
Licensing (UG973).
Chapter 2
Overview
• System and Solution Planning: Identifying the components, performance, I/O, and data
transfer requirements at a system level. Includes application mapping for the solution to PS,
PL, and AI Engine. Topics in this document that apply to this design process include:
• Embedded Software Development: Creating the software platform from the hardware
platform and developing the application code using the embedded CPU. Also covers XRT and
Graph APIs. Topics in this document that apply to this design process include:
• Hardware, IP, and Platform Development: Creating the PL IP blocks for the hardware
platform, creating PL kernels, functional simulation, and evaluating the AMD Vivado™ timing,
resource use, and power closure. Also involves developing the hardware platform for system
integration. Topics in this document that apply to this design process include:
• Board System Design: Designing a PCB through schematics and board layout. Also involves
power, thermal, and signal integrity considerations. Topics in this document that apply to this
design process include:
Core Overview
The AMD LogiCORE™ IP H.264/H.265 Video Decode Unit (VDU) core supports multi-standard
video decoding, including support for the High-Efficiency Video Coding (HEVC) and Advanced
Video Coding (AVC) H.264 standards. The unit contains decode (decompress) functions.
The VDU is an integrated block containing decoder interfaces located in the programmable logic
(PL) portion of Versal adaptive SoC (Versal AI Edge series and Versal AI Core series) devices.
Located in the programmable logic (PL), the VDU does not have any direct connections to the
processing system (PS).
VDU operation requires the application processing unit (APU) as a controller to service interrupts
and coordinate data transfer.
The decoder is controlled by the APU through a task list prepared in advance, and the APU
response time is not in the execution critical path. The VDU has no audio support. Audio
decoding can be done in the software using the PS or through soft IP in the PL. The following
figure shows the top-level block diagram with one instance of VDU core.
DPLL
VDU- One Instance
M_AXI_DEC0
DEC0
M_AXI_DEC1
M_AXI_DEC0
DEC1 M_AXI_DEC1
APM
S_AXI
Clock
MCU 4:1 AXI
Reset M_AXI_MCU
Crossbar
Interrupt
Controller VDU Host Interrupt
X26814-062922
Each VDU decoder instance is controlled by a microcontroller (MCU) subsystem. A 32-bit AXI4-
Lite slave interface is used by the APU to control the MCU. Two 128-bit AXI4 master interfaces
are used to move video data and metadata to and from the system memory. A 32-bit AXI4
master interface is used to fetch the MCU software (instruction cache interface) and load/store
additional MCU data (data cache interface). VDU applications running on the APU use the VDU
Control Software library API to interact with the decoder microcontroller. The microcontroller
firmware is not user modifiable.
The decoder includes control registers, a bridge unit, and a set of internal memories. The bridge
unit manages the request arbitration, burst addresses, and burst lengths for all external memory
accesses required by the decoder.
Applications
The VDU core is an embedded hard IP located in the PL to enable maximum flexibility for a wide
selection of use cases. Whether the application requires single 4K UHD@60 Hz or simultaneous
multiple stream decoding, and with memory bandwidth as a key driver, a system design and
memory topology can be implemented that balances performance, optimization, and integration
for the specific use case.
Unsupported Features
The following features of the standard are not supported in the core:
• H.264 (AVC):
○ Interlace video format
• H.265 (HEVC):
○ Dynamic chroma format/profile/level change within a single stream
Note: To verify that you need a license, check the License column of the IP Catalog. Included means that a
license is included with the AMD Vivado™ Design Suite; Purchase means that you have to purchase a
license to use the core.
Information about other AMD LogiCORE™ IP modules is available at the Intellectual Property
page. For information about pricing and availability of other AMD LogiCORE IP modules and
tools, contact your local sales representative.
License Checkers
If the IP requires a license key, the key must be verified. The AMD Vivado™ design tools have
several license checkpoints for gating licensed IP through the flow. If the license check succeeds,
the IP can continue generation. Otherwise, generation halts with an error. License checkpoints
are enforced by the following tools:
• Vivado Synthesis
• Vivado Implementation
• write_bitstream (Tcl command)
IMPORTANT! IP license level is ignored at checkpoints. The test confirms a valid license exists. It does not
check IP license level.
Chapter 3
Product Specification
Standards
This core adheres to the following standard(s):
• ISO/IEC 23008-2:2017 Information technology — High efficiency coding and media delivery
in heterogeneous environments — Part 2: High efficiency video coding
Performance
The following sections detail the performance characteristics of the H.264/H.265 Video Decode
Unit.
Maximum Frequencies
The typical clock frequencies for the target devices are described in Versal AI Core Series Data
Sheet: DC and AC Switching Characteristics (DS957) and Versal AI Edge Series Data Sheet: DC and
AC Switching Characteristics (DS958). The maximum achievable clock frequency of the system can
vary. The maximum achievable clock frequency and all resource counts can be affected by other
tool options, additional logic in the device, using a different version of AMD tools and other
factors.
Throughput
The VDU supports decoding up to four simultaneous instances of 4K UHD resolution at 60 Hz.
This throughput can be four streams at 4K UHD or can be divided into smaller streams. Several
combinations can be supported with different resolutions, provided the cumulative throughput
does not exceed four times 4K UHD at 60 Hz.
Core Interfaces
The Single Instance VDU core has the following interfaces:
The eight 128-bit AXI master interfaces are used for moving video data into and out of external
memory through the DDR memory interfaces.
Resource Use
Four Streams of 4K UHD at 60 Hz consume significant amounts of the bandwidth of the external
memory interfaces and significant amounts of the Arm® AMBA® AXI4 bus bandwidth through
NOC to DDR.
Port Descriptions
The VDU core top-level signaling interfaces are shown in the following figures.
Register Space
VDU is a hardened block in the programmable logic. The following table summarizes the soft IP
registers. These registers are accessible from the PS through the AXI4-Lite bus.
Note: For multi-stream use case, the registers in the following table represent the blended values that are
input in GUI.
Address Reset
Register Width Type Definition
Offset Value
Version control 0x0 8 Read 3:0- Minor revision 0×5
only 7:4- Major revision
LogiCORE Lock register 0x4 32 Write Lock code for VDU LogiCORE space. By 0×0
only default the space is locked.
Unlocked access returns 0 for register
reads and writes are ignored
0x7766DF77- unlock VDU register space
0x0- lock VDU register space
Control register 0x8 4 R/W 0- soft reset to VDU0 0×0
1- soft reset to VDU1
2- soft reset to VDU2
3- soft reset to VDU3
Secure control 0xC 4 R/W 0- Secure/non-secure configuration for 0×0
VDU0
1- Secure/non-secure configuration for
VDU1
2- Secure/non-secure configuration for
VDU2
3- Secure/non-secure configuration for
VDU3
SMID control register 0 0x10 10 RW 9:0- SMID for VDU0, DEC0 0×0
SMID control register 1 0x14 10 RW 9:0- SMID for VDU0, DEC1 0×0
SMID control register 2 0x18 10 RW 9:0- SMID for VDU0, MCU 0×0
SMID control register 3 0x1C 10 RW 9:0- SMID for VDU1, DEC0 0×0
SMID control register 4 0x20 10 RW 9:0- SMID for VDU1, DEC1 0×0
SMID control register 5 0x24 10 RW 9:0- SMID for VDU1, MCU 0×0
SMID control register 6 0x28 10 RW 9:0- SMID for VDU2, DEC0 0×0
SMID control register 7 0x2C 10 RW 9:0- SMID for VDU2, DEC1 0×0
SMID control register 8 0x30 10 RW 9:0- SMID for VDU2, MCU 0×0
SMID control register 9 0x34 10 RW 9:0- SMID for VDU3, DEC0 0×0
SMID control register 10 0x38 10 RW 9:0- SMID for VDU3, DEC1 0×0
SMID control register 11 0x4C 10 RW 9:0- SMID for VDU3, MCU 0×0
VDU_DECODER_ENABLE 0x41004 32 RO 1 = Enable 0×0
0 = Disable
Chapter 4
Core Architecture
Decoder Block
The decoder block is designed to process video streams using the H.265 (HEVC) and H.264
(AVC) standards. It provides support for these standards, including support for 8-bit and 10-bit
color depth, 4:0:0, 4:2:0, and 4:2:2 chroma formats, up to 4 streams of 4K UHD at 60 Hz
performance.
The IP hardware has a direct access to the system data bus through a high-bandwidth master
interface to transfer video data to and from an external memory.
The IP control software is partitioned into two layers. The VDU Control Software runs on the
APU while the MCU firmware runs on an MCU, which is embedded in the hardware IP. The APU
communicates with the embedded MCU through a slave interface, that is also connected to the
system bus. The IP hardware is controlled by the embedded MCU using a register map to set
decoding parameters through an internal peripheral bus.
Intra/Inter Mode
Picture
Selection
Buffering
Intra
Prediction
Motion
(b) decoder Compensation
X26885-070822
Features
The following table describes the single instance decoder block features.
Supported Supported
Configurable resolution
Minimum size: 128×128
Picture width and height multiple Minimum size: 80×96
Maximum width: 8184 (limited to 4,096
of eight Maximum width: 8,192 in level 4/4.1 or when WPP is enabled)
Maximum width or height: 8192 Maximum height: 8,192 Maximum height: 8,192
Maximum height: 8,192 picture size
of 33.5 MP
The following table describes the VDU maximum supported bit rates.
Functional Description
The following figure shows the block diagram of the decoder block.
AXI4 master
interface
AXI4 wrapper
DECODER
AXI4 master
interface
AXI4 wrapper
MCU AXI4
master interface
Interrupt
controller
Wrapper
Lite to MCU DP
APB Global
registers
IRQ
The decoder block includes the H.265/H.264 decompression engine, control registers, and an
interrupt controller block. The decoder block is controlled by an MCU subsystem. A 32-bit AXI4-
Lite slave interface is used by the system CPU to control the MCU to configure decoder
parameters, start processing of video frames, and to get status and results. Two 128-bit AXI4
master interfaces are used to fetch video input data and store video output data from/to the
system memory. An AXI4 master interface is used to fetch the MCU software and performs load/
store operation on additional MCU data.
Clocking
Refer to Clocking and Resets for more information on clocking.
Reset
Refer to Clocking and Resets for more information on resets.
Datapath
The master interface inputs several types of video data from the external memory:
• Bitstream
• Reference frame pixels
• Co-located picture motion vectors
• Headers and residual data
Control Path
The VDU slave interface is accessed once per frame by the APU, which sends a frame-level
command to the IP. This interface, therefore, does not require a fast data path. An interrupt is
generated at the end of each frame. These commands are processed by the embedded MCU,
which generates tile and slice-level commands to the Decoder block hardware.
Requirement Description
Input Buffer
Number of Buffers >=1
Contiguous No
Alignment 0
Size Any > 0
Output Buffer
Number of Buffers
• For AVC: AL_AVC_GetMinOutputBuffersNeeded (AL_TStreamSettings
tStreamSettings, int iStack).
Contiguous Yes
Alignment 32
Size stride * slice-height * chroma-mode.
Note: It is not possible to reduce the output buffer requirements because the VDU uses multiple internal
decoder engines.
• Video resolution
• Chroma sub-sampling
• Color depth
• Coding standard: H.264 or H.265
The following table shows the worst-case memory footprint required for different buffer sizes.
3840×2160 (MB) 665 582 597 513 524 466 473 414
1920×1080 (MB) 258 214 232 190 208 167 188 148
1280×720 (MB) 137 104 120 87 144 82 101 69
Memory Bandwidth
The decoder memory bandwidth depends on frame rate, resolution, color depth, chroma format,
and Decoder profile. The AMD LogiCORE™ IP provides an estimate of decoder bandwidth based
on the video parameters selected in the GUI.
AMD recommends using the fastest DDR4 memory interface possible. Specifically, the 8x8-bit
memory interface is more efficient than 4x16-bit memory interface because the x8 mode has
four bank groups, whereas the x16 mode has only two and DDR4 memory allows for
simultaneous bank group access.
Memory Format
The decoded picture buffer contains the decoded pixels. It contains two parts: luminance pixels (Luma) followed by chrominance pixels
(Chroma). Luma pixels are stored in pixel raster scan order. Chroma pixels are stored in U/V-interleaved pixel raster scan order; hence,
the Chroma part is half the size of the Luma part when using a 4:2:0 format and the same size as the Luma part when using a 4:2:2
format. The decoded picture buffer must be one contiguous memory region.
Note: Decoder output buffers width are in multiples of 256 bytes. Height is in multiples of 64. For example: Decoder output is 2048*1088 for
1920*1080 resolution.
Two packing formats are supported in external memory: eight bits per component or 10 bits per component, shown in the following
tables, respectively. The 8-bit format can only be used for an 8-bit component depth and the 10-bit format can only be used for a 10-
bit component depth. The following tables show the raster scan format supported by the decoder block for 8-bit and 10-bit color
depth.
The frame buffer width (pitch) may be larger than the frame width so that there are (pitch - width) ignored values between consecutive
pixel lines.
VDU
The VDU decoded format is shown in the following table.
Functional Description
The following figure shows the top-level interfaces and detailed architecture of the MCU.
MCU AXI4
Master interface
MCU SRAM
IC DC To decoder
(MCU-AXI4-Lite master)
DP
ILMB
MCU
DLMB
VDU AXI4 From decoder
Lite Slave Top Level Register
IRQ
Interface
The MCU interfaces to peripherals using a 32-bit AXI4-Lite master interface. It has a local
memory bus, an AXI4 32-bit instruction, and data cache interfaces.
The MCU block has a 32 KB local memory for internal operations that is shared with the CPU for
boot and mailbox communication. The MCU has a 32 KB instruction cache with 32-byte cache
line width. It has a 4 KB data cache with 16-byte cache line width. The data cache has a write-
through cache implementation.
The following table summarizes the AXI4-Lite slave interface ports of the MCU subsystem.
Control Flow
The MCU is kept in sleep mode after applying the reset until the firmware boot code is
downloaded by the kernel device driver into the internal memory of the MCU. After downloading
the boot code and completing the MCU initialization sequence, the control software
communicates with the MCU using a mailbox mechanism implemented in the internal SRAM of
the MCU. The MCU sends an acknowledgment to the control software and performs the
decoding operation. When the requested operation is complete, the MCU communicates the
status to the control software.
For more details about control software and MCU firmware, refer to Section III: Application
Software Development.
Note:
Reset
Register Offset Type Description
Value
MCU_RESET 0x9000 mixed(1) 0x000000 MCU Subsystem Reset
00
MCU_RESET_MODE 0x9004 mixed(1) 0x000000 MCU Reset Mode
01
MCU_STA 0x9008 mixed(1) 0x000000 MCU Status
00
MCU_WAKEUP 0x900C mixed(1) 0x000000 MCU Wake-up
00
MCU_ADDR_OFFSET_ 0x9010 RW 0x000000 MCU Instruction Cache Address Offset 0
IC0 00
MCU_ADDR_OFFSET_ 0x9014 RW 0x000000 MCU Instruction Cache Address Offset 1
IC1 00
MCU_ADDR_OFFSET_ 0x9018 RW 0x000000 MCU Data Cache Address Offset 0
DC0 00
MCU_ADDR_OFFSET_ 0x901C RW 0x000000 MCU Data Cache Address Offset 1
DC1 00
Notes:
1. Mixed registers have read only, write only, and read write bits grouped together.
The APM block is capable of measuring the number of read/write bytes and address based
transactions within a measurement window on the AXI master bus from the Decoder blocks. The
APM can additionally measure master ID based read and write latency within a measurement
window. The APM supports cumulative latency value along with the number of outstanding
transfers being considered for latency measurement. The APM has the ability to interrupt the
host processor when the status registers are ready to be read.
Functional Description
The following figure shows the VAPM.
VAPM VAPM
VDEC Cores
Reset MCU
(from PL)
Interrupt
32-bit APB-AXI4-Lite
AXI4-Lite Interface
SLCR Bridge Interface MCU AXI Int + Async
Bridge
VDU Wrapper
X26813-061722
The following sections describe the different operating modes of the VAPM.
Start/Stop Mode
In this mode, a 32-bit counter is used to generate a fixed length measurement window. When the
counter reaches the maximum value, it resets to a value specified in the
VDU_SLCR.APMn_TIMER (n = 0, 1, 2, 3) register. The measurement is continued until the 32-bit
counter reaches the value set in the APMn_TIMER register and a capture pulse is generated to
store the measured values in the VDU_SLCR result registers.
• AXI Read and Write Transaction Measurement: Two 32-bit registers count number of read
and write 128-bit AXI bus cycles transferred in a given timing window. The measured value is
transferred to the VDU_SLCR result register when a capture pulse is generated based on the
start/stop mode or the fixed duration timing window mode. To compute the number of bytes
transferred, VDU_SLCR must be multiplied by 16.
• AXI Read and Write Byte Count Measurement: Two 32-bit registers are implemented to
count the number of read and write bytes transferred in a given timing window. The register
content has to be multiplied by 16 to know the actual byte count transferred across AXI 128-
bit master interface. The measured value is transferred to the VDU_SLCR result register when
a capture pulse is generated, based on the start/stop mode or fixed duration timing window
mode.
• AXI Transaction Latency Measurement: Read and write latency can be measured based on
AXI master ID. Read latency is defined as AXI read address acknowledged to last read data
cycle. Write latency is defined as AXI write address acknowledged to write response
handshaking between master and slave. A 13-bit counter is implemented to measure the
latency on read and write bus. The timer is used to timestamp an event. The difference in the
timestamp between two events is used to calculate the latency.
Latency can be calculated on transaction ID basis. It is possible to select a single ID or all IDs for
latency calculation. For additional information, see the Versal Adaptive SoC AI Engine Register
Reference (AM015).
APM Registers
Note: VDU base address is 0XA4040000, APM registers should be base_address+offset (0xA4040000
+offset)
Chapter 5
The register programming interface of the VDU core connects to PS Master ports (M_AXI_FPD
or M_AXI_LPD). The VDU core clock can be used from PL or through an internal PLL inside the
VDU core.
Interrupts
Each VDU decoder instance uses one interrupt (vdu_host_interrupt*). There are options in
LogiCORE IP to use single interrupt per instance, and single interrupt for all instances. Clear
checkbox Combine VDU Interrupt to get single interrupt for all instances. This interrupt has to
be connected to either PL-PS-IRQ0[7:0] or PL-PS-IRQ1[7:0]. If there are other interrupts
in the design, the interrupt has to be concatenated along with the other interrupts and then
connected to the PS.
Note: All AXI clocks are supplied with clocks from external PL sources. These clocks are asynchronous to
core decoder block clock. All primary clocks in VDU are asynchronous to each other.
• Initially, while the PL is in power-up/configuration mode, the VDU core is held in reset.
• After the PL is fully configured, a PL based reset signal can be used to reset the VDU for
initialization and bring-up. Platform management unit (PMU) in the processing system can
drive this reset signal to control the reset state of the VDU.
• During partial reconfiguration (PR), the VDU block is kept under reset, if it is part of the
dynamically reconfigurable module.
Functional Description
Clocking
The following table describes the clock domains in the VDU core.
Max Freq
Domain Description
(MHz)
Core clock 800 Processing core, most of the logic and memories
MCU clock 571 Internal micro controllers
AXI Master Port clock 400 m_axi_dec_aclk
pl_vdu_axi_mcu_clk
AXI master port for memory access, 128-bit, typically connected to PS
AFI-FM (HP) port or to a soft memory controller in the PL
AXI4-Lite slave port clock 167 s_axi_lite_aclk, AXI4-Lite slave port (32-bit) for register
programming
NPI Clock 300 NPI interface clock
Note: All AXI clocks are supplied with clocks from external PL/PS sources. All primary clocks in VDU are
asynchronous to each other.
The following figure shows the clock generation options inside VDU block.
• pll_ref_clk is sourced externally to the device, typically by a programmable clock integrated circuit.
• Video decoder blocks work under the core_clk domain generated by the DPLL.
• MCU for decoder work under the MCU_clk domain generated by the DPLL.
• m_axi_dec_aclk is the AXI clock input from the PL for the 128-bit AXI master interfaces for the
decoder.
pll_ref_clk
Div
DPLL
Div
PL VDU
Decoder
s_axi_lite_aclk
m_axi_mcu_aclk
m_axi_dec_aclk
X26812-062922
The following clock frequency requirements must be met while providing clocks from PL:
DPLL Overview
The VDU core uses in-built DPLL for generating the following clocks:
The AMD Vivado™ wrapper for VDU should handle programming of DPLL to get the required
clocks. This programming is static.
fvco = frefclk × M
and
fclkout = fvco / O
where, M corresponds to the integer feedback divide value and O corresponds to the value of
output divide.
IMPORTANT! Select the PLL feedback multiplier value based on the supported VCO frequency range
(fvco).
Refer to the Versal AI Core Series Data Sheet: DC and AC Switching Characteristics (DS957) for more
information on the operating range of fvco.
Select the output divider (O) based on the required core clock or MCU clock frequency.
Reset Sequence
The state of the VDU during PL power up and the initialization sequence for the VDU are as
follows:
1. PL supply, VDU power supply, RAM supply is turned ON. There is no requirement on
sequence.
2. PMC releases POR pin to the VDU (por_pl_b).
3. POR IP kicks off to monitor VDU, PL, and RAM supplies. Once all powers are detected, the IP
releases POR reset, which is sent to PMC readback supply status.
4. PMC polls for the power status, and deasserts IPOR register after power ramp is complete.
At this step, Power on Reset to VDU is released (through the AND gate).
5. Send VDU enable information through eFUSE.
6. PMC removes isolation gasket controls through PCSR bits.
Note: The VDU clocks are available while the reset is released. The PL should be configured before
releasing the raw reset.
Additional initialization is done by software through programming the VDU core registers after the PL is
configured and core is in a reset release state.
Reset
The VDU hard block can be held under reset under the following conditions:
The VDU reset signal must be asserted for, at least, two clock cycles of the VDU DPLL reference
clock (the slowest clock input to the VDU). The VDU registers can be accessed after the reset
signal is deasserted.
Note:
• If software resets the VDU block in the middle of a frame, use the software to clear the physical
memory allocated for the VDU.
• The reset does not need to be asserted between changes to the configuration during run-time through
the control software.
• The vdu_resetn signal of AMD Versal™ VDU should be driven from Processor System Reset Module
(proc_sys_reset) which is driven by any of the 4 PS reset signals.
Chapter 6
• Vivado Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994)
• Vivado Design Suite User Guide: Designing with IP (UG896)
• Vivado Design Suite User Guide: Getting Started (UG910)
• Vivado Design Suite User Guide: Logic Simulation (UG900)
If you are customizing and generating the core in the Vivado IP integrator, see the Vivado Design
Suite User Guide: Designing IP Subsystems using IP Integrator (UG994) for detailed information. IP
integrator might auto-compute certain configuration values when validating or generating the
design. To check whether the values do change, see the description of the parameter in this
chapter. To view the parameter value, run the validate_bd_design command in the Tcl
console.
You can customize the IP for use in your design by specifying values for the various parameters
associated with the IP core using the following steps:
For details, see the Vivado Design Suite User Guide: Designing with IP (UG896) and the Vivado
Design Suite User Guide: Getting Started (UG910).
Figures in this chapter are illustrations of the Vivado IDE. The layout depicted here might vary
from the current version.
The VDU subsystem is controlled by software at runtime. Configuration options set in the VDU
GUI are used to estimate bandwidth, and calculate the buffer size.
• Bandwidth Summary: Reports the decoder buffer size. Reports bandwidth for the decoder.
• Maximum Number of Decoder Streams: Per VDU instance 32 streams are supported. For
Four VDU instances Maximum 100 streams are supported. Select one to 32 streams.
Determines memory requirements.
Note: The VDU supports 32 streams, but AMD recommends choosing the closest combination using
the available options in the GUI.
• 1280×720
• 1920×1080
• 3840×2160
• 4:0:0 - monochrome
• 4:2:0
• 4:2:2
2. Click Next on New Project wizard until you reach the Family Selection window.
3. Select a target device for the VDU core.
8. In Versal CIPS, enable pl1_ref_clk, pl2_ref_clk, pl3_ref_clk, and set the frequencies as
100,100,167 MHz respectively.
9. Configure Versal CIPS to enable AXI master interfaces, clocking, and PL-PS interrupt signal
per your design requirements. The configuration used in this tutorial is displayed below.
11. Configure the following parameters in GUI of VDU IP. Set number of decoder instances to 4.
22. Create a top-level Vivado wrapper by right-clicking on Block Design and selecting the Create
HDL Wrapper option as shown in the following figure.
Required Constraints
Note: 4K (3840x2160) and below is supported in all speed grades and 4K DCI (4096x2160) requires -2 or
-3 speed grade.
Clock Frequencies
There is no restriction for speed grade. All speed grades support the maximum frequency of
operation.
Clock Management
Clock Placement
Banking
Transceiver Placement
Simulation
Simulation of the H.264/H.265 Video Decode Unit not supported.
Section II
Chapter 7
Decoder Latency
The following figure shows the decoder latency.
X26888-070822
The overall latency of the decoder is the steady state latency, equal to the sum of the hardware
latency and the output latency. Hardware latency is the sum of the successive cancellation
decoding (SCD) latency, the entropy decoding latency, and the pixel decoding latency.
Initialization latency is the sum of the Coding picture Buffer (CPB) latency and the Decoder
Initialization (Dec Init) latency.
Section III
Chapter 8
Overview
The Video Decode Unit (VDU) software stack has a layered architecture programmable at several
levels of abstraction available to software developers, as shown in the following figure. The
application interfaces from high level to low level are:
• GStreamer
• OpenMAX Integration Layer
• VDU Control Software
The GStreamer is a cross platform open source multimedia framework. GStreamer provides the
infrastructure to integrate multiple multimedia components and create pipelines. The GStreamer
framework is implemented on the OpenMAX Integration Layer API-supported GStreamer version
is 1.20.5.
The OpenMAX Integration Layer API defines a royalty free standardized media component
interface to enable developers and platform providers to integrate and communicate with
multimedia codecs implemented in hardware or software.
The VDU Control Software is the lowest level software visible to VDU application developers. All
VDU applications must use an AMD provided VDU Control Software, directly or indirectly. The
VDU Control Software includes custom kernel modules, custom user space library, and the
ctrlsw_decoder application. The OpenMAX IL (OMX) layer is integrated on top of the VDU
Control Software.
User applications can use the layer or layers of the VDU software stack that are most appropriate
to their requirements.
GStreamer
OpenMAX Integration
Layer
Driver kernel
IP Hardware
X26884-070822
Software Prerequisites
All of the software prerequisites for using the VDU are included in AMD PetaLinux included in
AMD Vitis™ Software development platform release.
For the vanilla Linux kernel, refer to xilinx_versal_defconfig to enable and disable an AMD driver
in the Linux kernel. If the design enables or disables the AMD IP, the corresponding device-tree
node should be set to enable the driver to probe at run time kernel.
The application software using the VDU is written on top of the following libraries and modules,
shown in the following table.
Chapter 9
• These formats signify the memory layout of pixels. It applies at the decoder output side.
• For the decoder, you can specify the format to be used to write the pixel in memory by
specifying the corresponding Gstreamer video format using caps at decoder source pad.
• When the format is not supported between two elements, the cap negotiation fails and
Gstreamer returns an error. In that case, you can use the video convert element to perform
software conversion from one format to another.
The following table shows the GStreamer and V4L2 related formats that are supported.
Note:
1. The FourCC codes are the last four characters of V4L2 pixel formats (for example, GREY/XY10).
2. Y_Only 8-bit and Y_Only 10-bit are supported only at the control software layer for the decoder.
3. For more information on YUV 444 format support, refer this article.
Chapter 10
Now, the GStreamer, OMX, and Control Software pipelines can be run on the board.
1. Control Software
• ctrlsw_decoder -avc -i input.avc -noyuv --device /dev/allegroDecodeIP0
2. OMX
• omx_decoder input.avc -avc --device /dev/allegroDecodeIP0 -o /dev/null
3. Gstreamer
• gst-launch-1.0 filesrc location=input.avc ! h264parse ! queue ! omxh264dec device="/dev/
allegroDecodeIP0" ! queue max-size-bytes=0 ! fakevideosink
To overcome above DMA memory limitation, reserved memory can be used. For example, let’s
say petalinux hw design has two VDU instances (/dev/allegroDecodeIP0 & /dev/
allegroDecodeIP1). One instance can use CMA memory. By modifying the design, can be able to
assign specific reserved memory region in the DDR to the second VDU instance.
The below example shows reserving memory using ddr controllers with separate 4GB aligned
base address for allegroDecodeIP1. Here allegroDecodeIP0 will use normal CMA memory region.
So with this changes both VDU instances can decoder 4k streams separately.
reserved-memory {
#address-cells = <0x2>;
#size-cells = <0x2>;
ranges;
buffer@0 {
no-map;
reg = <0x8 0x0 0x0 0x80000000>;
};
ddr1_1: myddr@50000000000 {
compatible = "shared-dma-pool";
no-map;
reg = <0x500 0x0 0x0 0x80000000>;
};
};
amba_pl@0 {
#address-cells = <0x2>;
#size-cells = <0x2>;
compatible = "simple-bus";
ranges;
vdu:vdu@a4000000 {
clock-names = "s_axi_lite_aclk", "ref_clk",
"m_axi_mcu_aclk", "m_axi_dec_aclk";
clocks = <0x11 0x3 0x41 0x12 0x12>;
compatible = "xlnx,vdu-1.0";
reset-gpios = <0x13 0x0 0x1>;
xlnx,core_clk = <0x320>;
xlnx,enable_dpll;
xlnx,mcu_clk = <0x23b>;
xlnx,ref_clk = <0x64>;
};
al5d@a4020000 {
al,devicename = "allegroDecodeIP0";
compatible = "al,al5d";
interrupt-names = "vdu_host_interrupt0";
interrupt-parent = <0x5>;
interrupts = <0x0 0x54 0x4>;
reg = <0x0 0xa4020000 0x0 0x100000>;
xlnx,vdu = <&vdu>;
};
al5d@a4120000 {
al,devicename = "allegroDecodeIP1";
compatible = "al,al5d";
interrupt-names = "vdu_host_interrupt1";
interrupt-parent = <0x5>;
b. There are 5 different recipes files for gstreamer that downloads the code and compile
• gstreamer1.0_%.bbappend
• gstreamer1.0-omx_%.bbappend
• gstreamer1.0-plugins-bad_%.bbappend
• gstreamer1.0-plugins-base_%.bbappend
• gstreamer1.0-plugins-good_%.bbappend
c. Depending upon patches to which gstreamer package it belongs to, bbappend file for that
package needs to be created to get those patches applied and compiled on latest source
code. For example, if patch fix is for gst-omx, follow these steps
i. Create a gstreamer1.0-omx directory in the recipe-multimedia/gstreamer
folder
cd gstreamer
mkdir gstreamer1.0-omx
Create similar bbappend files and folder for other gstreamer package to integrate any
custom patches in PetaLinux build.
4. For VDU patches, follow these steps:
a. Create a vdu directory in the recipe-multimedia folder.
cd project-spec/meta-user/recipe-multimedia
mkdir vdu
b. There are four different recipes files for VDU that downloads the code and compile
• kernel-module-vdu_%.bbappend
• vdu-firmware_%.bbappend
• libvdu-xlnx_%.bbappend
• libomxil-xlnx_%.bbappend
c. Depending upon patches to which VDU source code it belongs to, bbappend file for that
code base needs to be created to get those patches applied and compiled on latest source
code. For example, if the patch fix is for VDU drivers, follow these steps:
i. Create a kernel-module-vdu directory in the recipe-multimedia/vdu folder
cd vdu
mkdir kernel-module-vdu
Create similar bbappend files and folder for other VDU component to integrate any
custom patches in PetaLinux build.
5. Follow PetaLinux build steps to generate updated binaries.
Note: If you are not compiling with PetaLinux, review the recipes for additional files necessary for
setting up GStreamer. For example, you must include the /etc/xdg/gstomx.conf in the root file system.
This file tells gst-omx where to find the OMX integration layer library - libOMX.allegro.core.so.1.
Chapter 11
GStreamer Pipelines
Examples of running GStreamer from the PetaLinux command line are as follows. To see the
description of gstreamer elements and properties used in each of them, use the gst-
inspect-1.0 command.
For example, to get description of each parameters for "omxh264dec" element, enter the
following at the command prompt:
gst-inspect-1.0 omxh264dec
H.264 Decoding
Decode H.264 based input file and display it over the monitor connected to the HDMI display.
H.265 Decoding
Decode H.265 based input file and display it over the monitor connected to the HDMI display.
• 4:2:0 8-bit
• 4:2:2 8-bit
• 4:2:0 10-bit
• 4:2:2 10-bit
The following command decodes an H.264 MP4 file using an increased number of internal
entropy buffers and displays it via HDMI.
Multi-Stream Decoding
• Decoding with single instance: Decode the H.265 input file using four decoder elements
simultaneously and saving them to separate files. It uses the single decoder instance i.e /dev/
allegroDecodeIP0
Note: The tee element is used to feed same input file into four decoder channels; you can use separate
gst-launch-1.0 application to feed different inputs as below.
• Decoding with multi decoder instances: Below example shows how to decode multiple
encoded streams on different decoder instances.
Element Description
filesink Writes incoming data to a file in the local file system
filesrc Reads data from a file in the local file system
h264parse Parses a H.264 encoded stream
h265parse Parses a H.265 encoded stream
kmssink Renders video frames directly in a plane of a DRM device
omxh264dec Decodes OpenMAX H.264 video
omxh265dec Decodes OpenMAX H.265 video
qtdemux Demuxes a .mov file into raw or compressed audio and/or
video streams.
queue Queuesdata until one of the limits specified by the "max-
size-buffers", “max-size-bytes” or “max-size-
time”properties has been reached
rtph264depay Extracts an H.264 video payload from an RTP packet stream
rtph264pay Encapsulates an H.264 video in an RTP packet stream
rtph265depay Extracts an H.265 video payload from an RTP packet stream
rtph265pay Encapsulates an H.265 video in an RTP packet stream
rtpjitterbuffer Reorders and removes duplicate RTP packets as they are
received from a network source
tee Splits data to multiple pads
udpsink Sinks UDP packets to the network
udpsrc Reads UDP packets from the network
v4l2src Captures video from v4l2 devices, like webcams and
television tuner cards
rawvideoparse Converts a byte stream into video frames
Chapter 12
Chapter 13
User-defined callbacks are sometimes notified of unusual conditions by passing NULL for a
pointer that is not normally NULL or do not provide any notification but assume the callback
itself uses one of the accessor functions to retrieve the error status from a decoder object.
In unusual or unexpected circumstances, some functions may report errors directly to the
console. These are system errors, and the messages contain the messages in the following table.
There are various ways encoded bitstream can be corrupted and detecting those errors in a
compressed bitstream is complex because of the syntax element coding and parsing
dependencies. The errors are usually not detected on corrupted bit but more likely on the
following syntax elements.
For example, an encoded bitstream has scaling matrices and "scaling matrices present bit" is
corrupted in the stream. When a decoder reads this bitstream, it first assumes that there is no
scaling matrices present in the stream and goes on parsing actual scaling matrix data as next
syntax element which may cause an error. Ideally, the error was corruption of scaling matrix bit,
but the decoder is not able to detect that, and such kind of scenarios are common in video
codecs.
Refer VPS/SPS/PPS parsing function for more details on error handling and reporting: https://
github.com/Xilinx/vdu-ctrl-sw/tree/xlnx_rel_v2023.1/lib_parsing
Error Resilience
Error resilience is handled either at control software level or at hardware level. As errors are
difficult to predict, it is possible that the hardware decoder hangs in an infinite loop. In that case,
a watchdog is used to reset the decoder in a safe way to restart the decoding for the next
frames.
The hardware IP only parses the slice data part of the bitstream. All headers are parsed and
managed by the control software.
The error resilience for the headers is managed by the software and the error resilience for the
slice data is managed by the hardware.
Error Detection
At slice header level, the software can detect different kinds of errors:
• Missing slices
• Inconsistent first LCU address syntax element.
When the software detects an error, a slice conceal command is sent to the hardware IP in order
to fill the intermediate buffer. The intermediate buffer must always be fully filled so as to avoid
dec timeout.
At slice data level, the hardware can detect different kinds of errors, like inconsistencies in the
number of LCUs or in the range of various syntax elements. When an error is detected, a
concealment flag is set in the corresponding LCU data in the intermediate buffer up the last LCU
of the slice.
Error Concealment
Error concealment is performed in the reconstruction process. When a concealment flag is set in
the intermediate buffer, the reconstruction of the LCU will be done using fixed parameters:
• If there is a reference picture available, the LCU is skipped using this picture as a reference.
• If there is no reference picture, the default intra prediction mode is applied.
When errors are detected by the hardware IP, it conceals the remaining part of the slice; there is
no error code, only a single flag indicating if the slice has been concealed or not.
However, the decoder should not hang even when decoding a corrupted bitstream, and it may be
difficult to guarantee that it will never happen. In such cases, a watchdog is used to soft reset the
decoder.
Memory Management
Memory operations are indirected through function pointers. The AL_Allocator default
implementation simply wraps malloc and free, etc.
Two higher level techniques are used for memory management: reference counted buffers and
buffer pools. A reference counted buffer is created with a zero-reference count. The
AL_Buffer_Ref and AL_Buffer_Unref functions increment and decrement the reference
count, respectively. The AL_Buffer interface separates the management of buffer metadata
from the management of the data memory associated with the buffer. Usage of the reference
count is optional.
The AL_TBufPool implementation manages a buffer pool with a ring buffer. Some ring buffers
have there sizes fixed at compile time. Exceeding the buffer pool size results in undefined
behavior. See AL_Decoder_PutDisplayPicture.
Note: For a complete list parameters, type the following in the command line:
ctrlsw_decoder --help
Chapter 14
Driver
There are multiple VDU modules. The VDU Init (xlnx_vdu) which is part of Linux Kernel and
which handles PL Registers such as VDU Gasket and the clocking. The other two kernel drivers
(al5d, allegro) together are the core VDU driver. The decoder driver is called al5d and the
common driver is called allegro.
All VDU driver modules (xlnx_vdu, allegro, al5d) are compiled as runtime kernel modules and are
loaded once kernel boot-up. Modules load in the following sequence.
Chapter 15
MCU Firmware
The MCU firmware running on the MCU has the following responsibilities:
• Transforming frame-level commands from VDU Control Software to slice level commands for
the hardware IP core.
• Configuring hardware registers for each command.
Chapter 16
Decoder Stack
The following figure shows the decoder software stack.
Decoder Frames
OMX based
Test Application
application
Decoder API
Decoder Library
Driver Interface
Decoder Driver
Mailbox Interface
MCU Firmware
Scheduler
IP Control API
VDU IP
X27975-041223
Application
The application can either be test pattern generator or an OpenMAX-based application that uses
the VDU decoder.
Decoder Library
The decoder library enables applications to communicate with the MCU firmware through the
decoder driver.
Decoder Driver
The decoder driver passes control information as well as buffer pointers of the video to the MCU
firmware. The decoder driver uses a mailbox communication technique to pass this information
to the MCU firmware.
MCU Firmware
The firmware receives control and buffer information through mailbox. Appropriate action is
taken and status is communicated back to decoder driver.
Scheduler
The scheduler, which is part of MCU firmware, programs the hardware IP, handles interrupts and
manages the multi-channel and multi-slice aspects of the decoding.
Chapter 17
Decoder Flow
The following figure shows an example of using the Xilinx VDU Control Software API.
AL_Decoder_Create
Yes is STOP No
Requested
Yes pDisplayedFrame No
AL_Decoder_PushBuffer
== NULL
Yes No
Is End of Stream
Output Frame Buffer
AL_Decoder_PutDisplayPicture
AL_Decoder_Flush
AL_Decoder_Destroy
X27974-041223
Decoder API
The Decoder API is defined in https://github.com/Xilinx/vdu-ctrl-sw/blob/xlnx_rel_v2023.1/
include/lib_decode The API is documented with Doxygen and it can be access by browsing the
vdu-ctrl-sw/Doxygen/doc/html/index.html.
Section IV
Appendices
Chapter 18
Chapter 19
IRQ Balancing
Various multimedia use-cases involving video codecs such as audio/video conferencing, video-
on-demand, playback, and record use-cases also involve multiple other peripherals such as
ethernet, video capture pipeline related IPs including image sensor and image signal processing
engines, DMA engines, and display pipeline related IP like video mixers and HDMI transmitters,
which in turn use unique interrupt lines for communicating with the CPU.
In these scenarios, it becomes important to distribute the interrupt processing load across
multiple CPU cores instead of utilizing the same core for all the peripherals/IP. Distributing the
IRQ across CPU cores optimizes the latency and performance of the running use-case as the IRQ
context switching and ISR handling load gets distributed across multiple CPU cores.
Each peripheral/IP is assigned a unique interrupt number by the Linux kernel. Whenever a
peripheralor IP needs to signal something to the CPU (like it has completed a task or detected
something), it sends a hardware signal to the CPU and the kernel retrieves the associated IRQ
number and then calls the associated interrupt service routine. The IRQ numbers can be
retrieved using the following command. This command also lists the number of interrupts
processed by each core, the interrupt type, and comma-delimited list of drivers registered to
receive that interrupt.
$cat /proc/interrupts
The Versal has 2 CPU cores available. If running a plain PetaLinux image withoutany irqbalance
daemon, then by default all IRQ requests are processed by CPU 0 by the Linux scheduler. To
assign a different CPU core to process a particular IRQ number, the IRQ affinity for that
particular interrupt needs to be changed. The IRQ affinity value defines which CPU cores are
allowed to process that particular IRQ. For more information, see https://www.kernel.org/doc/
Documentation/IRQ-affinity.txt.
By default, the IRQ affinity value for each peripheral is set to0xf, which means that all four CPU
cores are allowed to process interrupt as shown in following example using the IRQ number 42.
$cat /proc/irq/42/smp_affinity
output: f
To restrict this IRQ to a CPU core n, you have to set a mask for only the nth bit. For example, if
you want to route to only CPU core 1, then set the mask for the second bit using the value 0x2.
The following section shows how IRQ balancing can be performed before running a multistream
video conferencing use-case that involves multiple peripherals and video IP.
Let’s consider we have various DMA channels to capture different video streams, which in turn
also utilize different interrupt lines as depicted by the versal-dma blocks in the following figure.
CPU 1
CPU 0
Frame Buffer
VDU (4 decoder) Frame Buffer wr0 HDMI TX IP
wr1
X27977-041223
As seen in the previous figure, all interrupt requests from different peripherals goes to CPU 0 by
default.
To distribute the interrupt requests across different CPU cores as show in the following figure,
follow these steps:
CPU 0 CPU 1
X27976-041223
The numbers on the left are the IRQ numbers for the respective peripherals.
2. Assign CPU 0 to VDU IRQ with number 49.
echo 1 > /proc/irq/49/smp_affinity #VDU
By default, the interrupts for video1 xilinx_framebuffer DMA engine and various other
peripherals are already being processed by CPU 0 so there is no need to modify the
smp_affinity for the same. Using the previous commands, the IRQ is distributed as per the
scheme mentioned in the previous figure, which can also be seen by running the following
command when the use-case is running and observing whether interrupts for the peripherals
are going to respective CPU cores as intended or not. Likewise, similar scheme of distributing
interrupts can be followed for other use-cases too depending upon the peripherals being
used, system load, and intended performance.
$ cat /proc/interrupts
By default the interrupts for other peripherals will be processed by cpu 0 so there is no need
to modify the smp_affinity for the same. Using the preceding commands, the IRQ will get
distributed as per the scheme mentioned in which can also be seen by running the following
command when the use-case is running:
cat /proc/interrupts
12: 42151036 0 0 0 GICv2 156
Level zynqmp-dma
13: 31494805 10644207 0 0 GICv2 157
Level zynqmp-dma
14: 31483922 0 10643127 0 GICv2 158
Level zynqmp-dma
15: 31518024 0 0 10595920 GICv2 159
Level versal-dma
49: 1250127 47679 0 0 GICv2 127
Level a0120000.al5d, a0100000.al5e
52: 18662 0 822 0 GICv2 122
Level xilinx_framebuffer
Chapter 20
Debugging
This appendix includes details about resources available on the AMD Support website and
debugging tools.
If the IP requires a license key, the key must be verified. The AMD Vivado™ design tools have
several license checkpoints for gating licensed IP through the flow. If the license check succeeds,
the IP can continue generation. Otherwise, generation halts with an error. License checkpoints
are enforced by the following tools:
• Vivado Synthesis
• Vivado Implementation
• write_bitstream (Tcl command)
IMPORTANT! IP license level is ignored at checkpoints. The test confirms a valid license exists. It does not
check IP license level.
Documentation
This product guide is the main document associated with the core. This guide, along with
documentation related to all products that aid in the design process, can be found on the Support
web page or by using the AMD Adaptive Computing Documentation Navigator. Download the
Documentation Navigator from the Downloads page. For more information about this tool and
the features available, open the online help after installation.
Answer Records
Answer Records include information about commonly encountered problems, helpful information
on how to resolve these problems, and any known issues with an AMD Adaptive Computing
product. Answer Records are created and maintained daily to ensure that users have access to
the most accurate information available.
Answer Records for this core can be located by using the Search Support box on the main
Support web page. To maximize your search results, use keywords such as:
• Product name
• Tool message(s)
• Summary of the issue encountered
A filter search is available after results are returned to further target the results.
Technical Support
AMD Adaptive Computing provides technical support on the Community Forums for this AMD
LogiCORE™ IP product when used as described in the product documentation. AMD Adaptive
Computing cannot guarantee timing, functionality, or support if you do any of the following:
• Implement the solution in devices that are not defined in the documentation.
• Customize the solution beyond that allowed in the product documentation.
• Change any section of the design labeled DO NOT MODIFY.
Debug Tools
There are many tools available to address VDU design issues. It is important to know which tools
are useful for debugging various situations.
The Vivado logic analyzer is used to interact with the logic debug LogiCORE IP cores, including:
See the Vivado Design Suite User Guide: Programming and Debugging (UG908).
Interface Debug
AXI4-Lite Interfaces
To verify that the interface is functional, try reading from a register that does not have all 0s as
its default value. Output s_axi_arready asserts when the read address is valid, and output
s_axi_rvalid asserts when the read data/response is valid. If the interface is unresponsive,
ensure that the following conditions are met:
AXI4-Stream Interfaces
If data is not being transmitted or received, check the following conditions:
Chapter 21
The AMD Adaptive Computing Documentation Portal is an online tool that provides robust
search and navigation for documentation using your web browser. To access the Documentation
Portal, go to https://docs.xilinx.com.
Documentation Navigator
Documentation Navigator (DocNav) is an installed tool that provides access to AMD Adaptive
Computing documents, videos, and support resources, which you can filter and search to find
information. To open DocNav:
• From the AMD Vivado™ IDE, select Help → Documentation and Tutorials.
• On Windows, click the Start button and select Xilinx Design Tools → DocNav.
• At the Linux command prompt, enter docnav.
Design Hubs
AMD Design Hubs provide links to documentation organized by design tasks and other topics,
which you can use to learn key concepts and address frequently asked questions. To access the
Design Hubs:
Note: For more information on DocNav, see the Documentation Navigator webpage.
Support Resources
For support resources such as Answers, Documentation, Downloads, and Forums, see Support.
References
These documents provide supplemental material useful with this guide:
Revision History
The following table shows the revision history for this document.
Copyright
© Copyright 2022-2023 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, Versal,
Vivado, and combinations thereof are trademarks of Advanced Micro Devices, Inc. AMBA, AMBA
Designer, Arm, ARM1176JZ-S, CoreSight, Cortex, PrimeCell, Mali, and MPCore are trademarks of
Arm Limited in the US and/or elsewhere. Other product names used in this publication are for
identification purposes only and may be trademarks of their respective companies.