0% found this document useful (0 votes)
41 views96 pages

pg414 Vdu en Us 1.0

Uploaded by

东方
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views96 pages

pg414 Vdu en Us 1.0

Uploaded by

东方
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

H.264/H.

265 Video Decode


Unit Solutions v1.0

LogiCORE IP Product Guide


Vivado Design Suite

PG414 (v1.0) May 16, 2023

AMD Adaptive Computing is creating an environment where


employees, customers, and partners feel welcome and included. To
that end, we’re removing non-inclusive language from our products
and related collateral. We’ve launched an internal initiative to remove
language that could exclude people or reinforce historical biases,
including terms embedded in our software and IPs. You may still find
examples of non-inclusive language in our older products as we work
to make these changes and align with evolving industry standards.
Follow this link for more information.
Table of Contents
Section I: H.264/H.265 Video Decode Unit Solutions v1.0......................... 5

Chapter 1: Introduction......................................................................................... 6
Features...................................................................................................................................6
IP Facts.....................................................................................................................................7

Chapter 2: Overview.................................................................................................8
Navigating Content by Design Process............................................................................... 8
Core Overview........................................................................................................................ 9
Applications.......................................................................................................................... 11
Unsupported Features.........................................................................................................11
Licensing and Ordering....................................................................................................... 12

Chapter 3: Product Specification.....................................................................13


Standards.............................................................................................................................. 13
Performance......................................................................................................................... 13
Core Interfaces..................................................................................................................... 14
Resource Use........................................................................................................................ 14
Port Descriptions..................................................................................................................14
Register Space...................................................................................................................... 17

Chapter 4: Core Architecture.............................................................................18


Decoder Block.......................................................................................................................18
Microcontroller Unit Overview........................................................................................... 28
AXI Performance Monitor................................................................................................... 32

Chapter 5: Designing with the Core.............................................................. 38


General Design Guidelines..................................................................................................38
Interrupts.............................................................................................................................. 38
Clocking and Resets............................................................................................................. 38

Chapter 6: Design Flow Steps............................................................................ 43

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 2
Customizing and Generating the Core ............................................................................. 43
Constraining the Core..........................................................................................................52
Simulation............................................................................................................................. 53
Synthesis and Implementation.......................................................................................... 54

Section II: Performance and Debugging ........................................................ 55

Chapter 7: Decoder Latency...............................................................................56

Section III: Application Software Development..........................................57

Chapter 8: Overview............................................................................................... 58
Software Prerequisites........................................................................................................ 59

Chapter 9: Decoder Software Features........................................................61

Chapter 10: Preparing PetaLinux to Run VDU Applications............ 63


DMA Memory Limitation..................................................................................................... 63
Integrating the VDU and GStreamer Patches.................................................................. 65

Chapter 11: GStreamer Pipelines.................................................................... 68


H.264 Decoding.................................................................................................................... 68
H.265 Decoding.................................................................................................................... 68
High Bitrate Bitstream Decoding....................................................................................... 69
Multi-Stream Decoding....................................................................................................... 69
Verified GStreamer Elements............................................................................................. 70
Verified Containers Using GStreamer............................................................................... 71
Verified Streaming Protocols Using GStreamer............................................................... 71

Chapter 12: OpenMax Integration Layer....................................................72


OpenMax Integration Layer Sample Applications........................................................... 72

Chapter 13: VDU Control Software................................................................. 73


Xilinx VDU Control Software API.........................................................................................73
VDU Control Software Sample Application....................................................................... 76

Chapter 14: Driver....................................................................................................78

Chapter 15: MCU Firmware.................................................................................79

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 3
Chapter 16: Decoder Stack..................................................................................80

Chapter 17: Decoder Flow................................................................................... 82


Decoder API.......................................................................................................................... 83

Section IV: Appendices.............................................................................................. 84

Chapter 18: Codec Parameters for Different Use Cases.....................85

Chapter 19: IRQ Balancing.................................................................................. 86

Chapter 20: Debugging......................................................................................... 91


Finding Help with AMD Adaptive Computing Solutions..................................................91
Debug Tools.......................................................................................................................... 92
Interface Debug................................................................................................................... 93

Chapter 21: Additional Resources and Legal Notices..........................94


Finding Additional Documentation....................................................................................94
Support Resources............................................................................................................... 95
References.............................................................................................................................95
Revision History.................................................................................................................... 95
Please Read: Important Legal Notices.............................................................................. 96

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 4
Section I: H.264/H.265 Video Decode Unit Solutions v1.0

Section I

H.264/H.265 Video Decode Unit


Solutions v1.0

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 5
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 1: Introduction

Chapter 1

Introduction
The AMD LogiCORE™ IP H.264/H.265 Video Decode Unit (VDU) core is a Hard IP in AMD
Versal™ AI Edge and AMD Versal™ AI Core series. The VDU core has multiple instance decoder
cores (upto four cores) and are implemented as Hard IP in AMD Versal™ AI Edge and AMD
Versal™ AI Core series. Each instance of decoder core in VDU is capable of decompressing single
or multiple H.264/H.265 compliance video streams simultaneously at resolutions of up to
3840×2160 pixels at 60 frames per second (4K UHD at 60 Hz). In a multiple stream scenario, it
can decode H.265 only, H.264 only or both compliant streams simultaneously. Refer to the
product tables in Versal ACAP AI Edge Series Product Selection Guide (XMP464) and Versal AI Core
Series Product Selection Guide (XMP452) to know about the number of instance decoder cores
inside VDU for different device variants.

Features
The features of each instance of the decoder core in AMD LogiCORE IP H.264/H.265 Video
Decode Unit core are as follows:

• Multi-standard decoding support, including:


• ISO MPEG-4 Part 10: Advanced Video Coding (AVC)/ITU H.264
• ISO MPEG-H Part 2: High Efficiency Video Coding (HEVC)/ITU H.265
• HEVC: Main, Main Intra, Main10, Main10 Intra, Main 4:2:2 10, Main 4:2:2 10 Intra up to
Level 5.1 High Tier
• AVC: Baseline, Main, High, High10, High 4:2:2, High10 Intra, High 4:2:2 Intra up to Level
5.2
• Support simultaneous decoding of up to 32 streams per core, total 4 cores support maximum
of 100 streams with a maximum aggregated bandwidth of 3840×2160@60 Hz
• Progressive support for H.264 and H.265; Interlace support for H.265
• Video Format Support:
1. YCbCR 4:2:2, YCbCr 4:2:0, and Y-only
2. 8-bit and 10-bit per color channel

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 6
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 1: Introduction

IP Facts
AMD LogiCORE™ IP Facts Table
Core Specifics
Supported Device Family1 AMD Versal™ Adaptive SoC (AMD Versal™ AI Edge and AMD Versal™ AI Core
Series)
Supported User Interfaces AXI4-Lite, AXI4
Provided with Core
Design Files Encrypted RTL
Example Design Not Provided
Constraints File Xilinx Design Constraints File (XDC)
Simulation Model Not Provided
Supported S/W Driver Included in PetaLinux
Tested Design Flows2
Design Entry AMD Vivado™ Design Suite
Synthesis Vivado Synthesis
Support
Release Notes and Known Issues Master Answer Record: 000034162
All Vivado IP Change Logs Master Vivado IP Change Logs: 72775
Support web page
Notes:
1. For a complete list of supported devices, see the AMD Vivado™ IP catalog.
2. For the supported versions of third-party tools, see theVivado Design Suite User Guide: Release Notes, Installation, and
Licensing (UG973).

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 7
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 2: Overview

Chapter 2

Overview

Navigating Content by Design Process


AMD Adaptive Computing documentation is organized around a set of standard design
processes to help you find relevant content for your current development task. All AMD Versal™
adaptive SoC design process Design Hubs and the Design Flow Assistant materials can be found
on the Xilinx.com website. This document covers the following design processes:

• System and Solution Planning: Identifying the components, performance, I/O, and data
transfer requirements at a system level. Includes application mapping for the solution to PS,
PL, and AI Engine. Topics in this document that apply to this design process include:

• Section I: H.264/H.265 Video Decode Unit Solutions v1.0


○ Design Flow Steps

• Section II: Performance and Debugging


○ Decoder Latency

• Section III: Application Software Development


○ Overview

○ Decoder Software Features

• Section IV: Appendices


○ Interface Debug

• Embedded Software Development: Creating the software platform from the hardware
platform and developing the application code using the embedded CPU. Also covers XRT and
Graph APIs. Topics in this document that apply to this design process include:

• Section II: Performance and Debugging


○ Decoder Latency

• Section III: Application Software Development


○ Overview

○ Decoder Software Features

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 8
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 2: Overview

• Section IV: Appendices


○ Interface Debug

• Hardware, IP, and Platform Development: Creating the PL IP blocks for the hardware
platform, creating PL kernels, functional simulation, and evaluating the AMD Vivado™ timing,
resource use, and power closure. Also involves developing the hardware platform for system
integration. Topics in this document that apply to this design process include:

• Section I: H.264/H.265 Video Decode Unit Solutions v1.0

• Board System Design: Designing a PCB through schematics and board layout. Also involves
power, thermal, and signal integrity considerations. Topics in this document that apply to this
design process include:

• Designing with the Core


• Design Flow Steps

Core Overview
The AMD LogiCORE™ IP H.264/H.265 Video Decode Unit (VDU) core supports multi-standard
video decoding, including support for the High-Efficiency Video Coding (HEVC) and Advanced
Video Coding (AVC) H.264 standards. The unit contains decode (decompress) functions.

The VDU is an integrated block containing decoder interfaces located in the programmable logic
(PL) portion of Versal adaptive SoC (Versal AI Edge series and Versal AI Core series) devices.
Located in the programmable logic (PL), the VDU does not have any direct connections to the
processing system (PS).

VDU operation requires the application processing unit (APU) as a controller to service interrupts
and coordinate data transfer.

The decoder is controlled by the APU through a task list prepared in advance, and the APU
response time is not in the execution critical path. The VDU has no audio support. Audio
decoding can be done in the software using the PS or through soft IP in the PL. The following
figure shows the top-level block diagram with one instance of VDU core.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 9
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 2: Overview

Figure 1: Top-Level Block Diagram of Single Instance Decoder Block

DPLL
VDU- One Instance

M_AXI_DEC0
DEC0
M_AXI_DEC1

M_AXI_DEC0
DEC1 M_AXI_DEC1
APM

S_AXI

Clock
MCU 4:1 AXI
Reset M_AXI_MCU
Crossbar

Host Programmable Registers

Control and Status Registers


NPI

Interrupt
Controller VDU Host Interrupt

X26814-062922

Decoder Block Overview


The Decoder block is capable of decompressing HEVC (ISO/IEC 23008-2 High Efficiency Video
Coding) and AVC (ISO/IEC 14496-10 Advanced Video Coding) compliant streams. It provides
complete support for these standards, including support for 8-bit and 10-bit color depth, Y-only
(monochrome), 4:2:0 and 4:2:2 chroma formats, up to four simultaneous instances of 4K UHD at
60 Hz performance. It also contains global registers, an interrupt controller, and a timer.

Each VDU decoder instance is controlled by a microcontroller (MCU) subsystem. A 32-bit AXI4-
Lite slave interface is used by the APU to control the MCU. Two 128-bit AXI4 master interfaces
are used to move video data and metadata to and from the system memory. A 32-bit AXI4
master interface is used to fetch the MCU software (instruction cache interface) and load/store
additional MCU data (data cache interface). VDU applications running on the APU use the VDU
Control Software library API to interact with the decoder microcontroller. The microcontroller
firmware is not user modifiable.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 10
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 2: Overview

The decoder includes control registers, a bridge unit, and a set of internal memories. The bridge
unit manages the request arbitration, burst addresses, and burst lengths for all external memory
accesses required by the decoder.

Microcontroller Unit Overview


The decoder blocks implement a 32-bit microcontroller unit (MCU) to handle interaction with the
hardware blocks. The MCU receives commands from the APU, parses the command into multiple
slice- or tile-level commands, and executes them on the decoder blocks. After the command is
executed, the MCU communicates the status to the APU and the process is repeated.

Applications
The VDU core is an embedded hard IP located in the PL to enable maximum flexibility for a wide
selection of use cases. Whether the application requires single 4K UHD@60 Hz or simultaneous
multiple stream decoding, and with memory bandwidth as a key driver, a system design and
memory topology can be implemented that balances performance, optimization, and integration
for the specific use case.

Unsupported Features
The following features of the standard are not supported in the core:

• H.264 (AVC):
○ Interlace video format

○ Flexible macroblock ordering (FMO)

○ Arbitrary slice ordering (ASO)

○ Redundant slice (RS)

○ Dynamic chroma format/profile/level change within a single stream

• H.265 (HEVC):
○ Dynamic chroma format/profile/level change within a single stream

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 11
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 2: Overview

Licensing and Ordering


This AMD LogiCORE™ IP module is provided under the terms of the Core License Agreement.
The module is shipped as part of the AMD Vivado™ Design Suite. For full access to all core
functionalities in simulation and in hardware, you must purchase a license for the core. To
generate a full license, visit the product licensing web page. Evaluation licenses and hardware
timeout licenses might be available for this core. Contact your local sales representative for
information about pricing and availability.

Note: To verify that you need a license, check the License column of the IP Catalog. Included means that a
license is included with the AMD Vivado™ Design Suite; Purchase means that you have to purchase a
license to use the core.

Information about other AMD LogiCORE™ IP modules is available at the Intellectual Property
page. For information about pricing and availability of other AMD LogiCORE IP modules and
tools, contact your local sales representative.

License Checkers
If the IP requires a license key, the key must be verified. The AMD Vivado™ design tools have
several license checkpoints for gating licensed IP through the flow. If the license check succeeds,
the IP can continue generation. Otherwise, generation halts with an error. License checkpoints
are enforced by the following tools:

• Vivado Synthesis
• Vivado Implementation
• write_bitstream (Tcl command)

IMPORTANT! IP license level is ignored at checkpoints. The test confirms a valid license exists. It does not
check IP license level.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 12
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 3: Product Specification

Chapter 3

Product Specification

Standards
This core adheres to the following standard(s):

• ISO/IEC 14496-10:2014(en) Information technology — Coding of audio-visual objects — Part


10: Advanced Video Coding

• Recommendation ITU — T H.264 | International Standard ISO/IEC 14496-10: Advanced


video coding for generic audio visual services

• Recommendation ITU — T H.265 | International Standard ISO/IEC 23008-2: High efficiency


video coding

• ISO/IEC 23008-2:2017 Information technology — High efficiency coding and media delivery
in heterogeneous environments — Part 2: High efficiency video coding

Performance
The following sections detail the performance characteristics of the H.264/H.265 Video Decode
Unit.

Maximum Frequencies

The typical clock frequencies for the target devices are described in Versal AI Core Series Data
Sheet: DC and AC Switching Characteristics (DS957) and Versal AI Edge Series Data Sheet: DC and
AC Switching Characteristics (DS958). The maximum achievable clock frequency of the system can
vary. The maximum achievable clock frequency and all resource counts can be affected by other
tool options, additional logic in the device, using a different version of AMD tools and other
factors.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 13
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 3: Product Specification

Throughput

The VDU supports decoding up to four simultaneous instances of 4K UHD resolution at 60 Hz.
This throughput can be four streams at 4K UHD or can be divided into smaller streams. Several
combinations can be supported with different resolutions, provided the cumulative throughput
does not exceed four times 4K UHD at 60 Hz.

Core Interfaces
The Single Instance VDU core has the following interfaces:

• Two 128-bit AXI master interfaces to communicate with external memory


• One 32-bit AXI master interface for control communication
• An AXI4-Lite interface for communicating with the application processing unit

The eight 128-bit AXI master interfaces are used for moving video data into and out of external
memory through the DDR memory interfaces.

Resource Use
Four Streams of 4K UHD at 60 Hz consume significant amounts of the bandwidth of the external
memory interfaces and significant amounts of the Arm® AMBA® AXI4 bus bandwidth through
NOC to DDR.

Table 1: Resource Utilization

Number of Stream Single Stream Two Stream Four Stream


CLB LUT(520704) 932 1163 1449
CLB Registers(1041408) 1598 1625 1680
BITSLICE_RX_TX( 65088) 322 387 458

Port Descriptions
The VDU core top-level signaling interfaces are shown in the following figures.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 14
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 3: Product Specification

Figure 2: VDU Core Top-Level Signaling Interface for Single Instance

Figure 3: VDU Core Top-Level Signaling Interface for All Instances

VDU Interface Ports


Table 2: VDU Interfaces for Single Instance

Name Type Description


M_AXI_DEC0 Memory mapped AXI4 master interface 128-bit memory mapped interface for Decoder block.
M_AXI_DEC1 Memory mapped AXI4 master interface 128-bit memory mapped interface for Decoder block.
M_AXI_MCU Memory mapped AXI4 master interface 32-bit memory mapped interface for MCU.
S_AXI LITE Memory mappedAXI4-Lite slave AXI4-Lite memory mapped interface for external master
interface access.

Table 3: VDU Interfaces for All Instances

Name Type Description


M_AXI_DEC0_0 Memory mapped AXI4 master interface 128-bit memory mapped interface for Decoder block.
M_AXI_DEC0_1 Memory mapped AXI4 master interface 128-bit memory mapped interface for Decoder block.
M_AXI_DEC1_0 Memory mapped AXI4 master interface 128-bit memory mapped interface for Decoder block.
M_AXI_DEC1_1 Memory mapped AXI4 master interface 128-bit memory mapped interface for Decoder block.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 15
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 3: Product Specification

Table 3: VDU Interfaces for All Instances (cont'd)

Name Type Description


M_AXI_DEC2_0 Memory mapped AXI4 master interface 128-bit memory mapped interface for Decoder block.
M_AXI_DEC2_1 Memory mapped AXI4 master interface 128-bit memory mapped interface for Decoder block.
M_AXI_DEC3_0 Memory mapped AXI4 master interface 128-bit memory mapped interface for Decoder block.
M_AXI_DEC3_1 Memory mapped AXI4 master interface 128-bit memory mapped interface for Decoder block.
M_AXI_MCU Memory mapped AXI4 master interface 32-bit memory mapped interface for MCU.
S_AXI_LITE Memory mappedAXI4-Lite slave AXI4-Lite memory mapped interface for external master
interface access.

Common Interface Signals


The following tables summarizes the signals which are either shared by, or are not a part of the
dedicated AXI4 interfaces.

Table 4: VDU Ports

Port Name Direction Description


s_axi_lite_aclk Input AXI clock input for S_AXI_PL_VDU_LITE
ref_clk Input Reference clock input
vdu_resetn Input Active-Low reset input from PL
vdu_host_interrupt Output Active-High interrupt output from VDU. Can be mapped to
PL- PS interrupt pin.
m_axi_dec_aclk Input AXI input clock for M_AXI_VDU_DECODER0 and
M_AXI_VDU_DECODER1
m_axi_mcu_aclk Input Input clock for M_AXI_MCU interface

Table 5: Non-DPLL Clock Ports

Port Name Direction Description


s_axi_lite_aclk Input AXI clock input for S_AXI_PL_VDU_LITE
vdu_resetn Input Active-Low reset input from PL
vdu_host_interrupt Output Active-High interrupt output from VDU. Can be mapped to
PL- PS interrupt pin.
m_axi_dec_aclk Input AXI input clock for M_AXI_VDU_DECODER
m_axi_mcu_aclk Input Input clock for M_AXI_MCU interface
core_clk Input Input Core clock
mcu_clk Input Input MCU clock

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 16
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 3: Product Specification

Register Space
VDU is a hardened block in the programmable logic. The following table summarizes the soft IP
registers. These registers are accessible from the PS through the AXI4-Lite bus.

Note: For multi-stream use case, the registers in the following table represent the blended values that are
input in GUI.

Table 6: Soft IP Registers (Video Configuration)

Address Reset
Register Width Type Definition
Offset Value
Version control 0x0 8 Read 3:0- Minor revision 0×5
only 7:4- Major revision
LogiCORE Lock register 0x4 32 Write Lock code for VDU LogiCORE space. By 0×0
only default the space is locked.
Unlocked access returns 0 for register
reads and writes are ignored
0x7766DF77- unlock VDU register space
0x0- lock VDU register space
Control register 0x8 4 R/W 0- soft reset to VDU0 0×0
1- soft reset to VDU1
2- soft reset to VDU2
3- soft reset to VDU3
Secure control 0xC 4 R/W 0- Secure/non-secure configuration for 0×0
VDU0
1- Secure/non-secure configuration for
VDU1
2- Secure/non-secure configuration for
VDU2
3- Secure/non-secure configuration for
VDU3
SMID control register 0 0x10 10 RW 9:0- SMID for VDU0, DEC0 0×0
SMID control register 1 0x14 10 RW 9:0- SMID for VDU0, DEC1 0×0
SMID control register 2 0x18 10 RW 9:0- SMID for VDU0, MCU 0×0
SMID control register 3 0x1C 10 RW 9:0- SMID for VDU1, DEC0 0×0
SMID control register 4 0x20 10 RW 9:0- SMID for VDU1, DEC1 0×0
SMID control register 5 0x24 10 RW 9:0- SMID for VDU1, MCU 0×0
SMID control register 6 0x28 10 RW 9:0- SMID for VDU2, DEC0 0×0
SMID control register 7 0x2C 10 RW 9:0- SMID for VDU2, DEC1 0×0
SMID control register 8 0x30 10 RW 9:0- SMID for VDU2, MCU 0×0
SMID control register 9 0x34 10 RW 9:0- SMID for VDU3, DEC0 0×0
SMID control register 10 0x38 10 RW 9:0- SMID for VDU3, DEC1 0×0
SMID control register 11 0x4C 10 RW 9:0- SMID for VDU3, MCU 0×0
VDU_DECODER_ENABLE 0x41004 32 RO 1 = Enable 0×0
0 = Disable

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 17
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Chapter 4

Core Architecture

Decoder Block
The decoder block is designed to process video streams using the H.265 (HEVC) and H.264
(AVC) standards. It provides support for these standards, including support for 8-bit and 10-bit
color depth, 4:0:0, 4:2:0, and 4:2:2 chroma formats, up to 4 streams of 4K UHD at 60 Hz
performance.

The decoder block efficiently performs video decompression.

The IP hardware has a direct access to the system data bus through a high-bandwidth master
interface to transfer video data to and from an external memory.

The IP control software is partitioned into two layers. The VDU Control Software runs on the
APU while the MCU firmware runs on an MCU, which is embedded in the hardware IP. The APU
communicates with the embedded MCU through a slave interface, that is also connected to the
system bus. The IP hardware is controlled by the embedded MCU using a register map to set
decoding parameters through an internal peripheral bus.

The VDU block is shown in the following figure.

Figure 4: Decoder Block

Bitstream Inverse Quantization + Video


Entropy Deblocking
Input & Inverse Transform Output
Decoding Filtering
+

Intra/Inter Mode
Picture
Selection
Buffering

Intra
Prediction

Motion
(b) decoder Compensation

X26885-070822

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 18
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Features
The following table describes the single instance decoder block features.

Table 7: VDU Features

Video Coding Feature H.264 H.265


Performance
Profiles Baseline (except FMO/ASO/RS) Main Main
Constrained Baseline Intra Main
Main 10
High Main 10 Intra
High 10 Main 4:2:2 10
High 4:2:2 Main 4:2:2 10 Intra
Levels up to 5.2(2) Up to 5.1 High Tier(1)
Supported Supported
Performance at 800 MHz
100 streams at 720x480p at 30 Hz
50 streams at 1920x1080p at 30 Hz
25 streams at 1920x1080p at 60 Hz
12 streams at 3840x2160p at 30 Hz
6 stream at 3840x2160p at 60 Hz
24 stream at 7680x4320p at 15 Hz

Supported Supported
Configurable resolution
Minimum size: 128×128
Picture width and height multiple Minimum size: 80×96
Maximum width: 8184 (limited to 4,096
of eight Maximum width: 8,192 in level 4/4.1 or when WPP is enabled)
Maximum width or height: 8192 Maximum height: 8,192 Maximum height: 8,192
Maximum height: 8,192 picture size
of 33.5 MP

Configurable frame rate Supported Supported


Coding Tools
Sample bit depth: 8 bpc, 10 bpc Supported Supported
Chroma format: YCbCr 4:2:0, YCbCr Supported Supported
4:2:2, Y-only (monochrome)
Progressive format only Supported Supported
Notes:
1. Support of 8K15 uses a subset of level 6: maximum Luma picture size up to 225 samples, other constraints of Level 5.1
apply (e.g., maximum of 200 slices and 11×10 tiles), WPP is not supported for widths above 4,096.
2. Support of 8K15 uses a subset of Level 6: maximum Luma picture size up to 225 samples, other constraints of Level 5.2
apply, maximum slice size of 65,535 macroblocks so a minimum of two balanced slices must be used above 4K size.

The following table describes the VDU maximum supported bit rates.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 19
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Table 8: VDU Maximum Supported Bit Rates

Maximum Bit Rate


Standard Level Profile
(Mb/s)
H.264 4.2 (1080p60) Baseline, Main 50
High 62.5
High 10 150
High 4:2:2 200
5.2 (2160p60) Baseline, Main 240
High 300
High 10 720
High 4:2:2 960 (CAVLC)
720 (CABAC)
H.265 4.1 (1080p60) High Tier Main, Main 10 50
Main 4:2:2 10 84
Main 4:2:2 10 Intra 167
5.1 (2160p60) High Tier Main, Main 10 160
Main 4:2:2 10 267
Main 4:2:2 10 Intra 533

Functional Description
The following figure shows the block diagram of the decoder block.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 20
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Figure 5: Detailed Architecture of the Single Instance Decoder Block

AXI4 master
interface
AXI4 wrapper

DECODER

AXI4 master
interface
AXI4 wrapper

MCU AXI4
master interface

Interrupt
controller

AXI4 Lite AXI4

Wrapper
Lite to MCU DP
APB Global
registers
IRQ

The decoder block includes the H.265/H.264 decompression engine, control registers, and an
interrupt controller block. The decoder block is controlled by an MCU subsystem. A 32-bit AXI4-
Lite slave interface is used by the system CPU to control the MCU to configure decoder
parameters, start processing of video frames, and to get status and results. Two 128-bit AXI4
master interfaces are used to fetch video input data and store video output data from/to the
system memory. An AXI4 master interface is used to fetch the MCU software and performs load/
store operation on additional MCU data.

Interfaces and Ports


Applications that use the decoder must connect all the decoder ports (ports beginning with
m_axi_dec). The following table shows the decoder block AXI4 master interface ports.

Table 9: Decoder Ports

Name Width Direction Description


vdu_pl_dec_araddr0/1 64 Output AXI4 ARADDR signal
vdu_pl_dec_arburst0/1 2 Output AXI4 ARBURST signal
vdu_pl_dec_arid0/1 16 Output AXI4 ARID signal

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 21
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Table 9: Decoder Ports (cont'd)

Name Width Direction Description


vdu_pl_dec_arlen0/1 8 Output AXI4 ARLEN signal
pl_vdu_dec_arready0/1 1 Input AXI4 ARREADY signal
vdu_pl_dec_arsize0/1 3 Output AXI4 ARSIZE signal
vdu_pl_dec_arvalid0/1 1 Output AXI4 ARVALID signal
vdu_pl_dec_awaddr0/1 64 Output AXI4 AWADDR signal
vdu_pl_dec_awburst0/1 2 Output AXI4 AWBURST signal
vdu_pl_dec_awid0/1 16 Output AXI4 AWID signal
vdu_pl_dec_awlen0/1 8 Output AXI4 AWLEN signal
pl_vdu_dec_awready0/1 1 Input AXI4 AWREADY signal
vdu_pl_dec_awsize0/1 3 Output AXI4 AWSIZE signal
vdu_pl_dec_awvalid0/1 1 Output AXI4 AWVALID signal
pl_vdu_dec_bresp0/1 2 Input AXI4 BRESP signal
vdu_pl_dec_bready0/1 1 Output AXI4 BREADY signal
pl_vdu_dec_bvalid0/1 1 Input AXI4 BVALID signal
pl_vdu_dec_bid0/1 16 Input AXI4 BID signal
pl_vdu_dec_rdata0/1 128 Input AXI4 RDATA signal
pl_vdu_dec_rid0/1 16 Input AXI4 RID signal
pl_vdu_dec_rlast0/1 1 Input AXI4 RLAST signal
vdu_pl_dec_rready0/1 1 Output AXI4 RREADY signal
pl_vdu_dec_rresp0/1 2 Input AXI4 RRESP signal
pl_vdu_dec_rvalid0/1 1 Input AXI4 RVALID signal
vdu_pl_dec_wdata0/1 128 Output AXI4 WDATA signal
vdu_pl_dec_wlast0/1 1 Output AXI4 WLAST signal
pl_vdu_dec_wready0/1 1 Input AXI4 WREADY signal
vdu_pl_dec_wvalid0/1 1 Output AXI4 WVALID signal
vdu_pl_dec_awprot0/1 3 Output AXI4 AWPROT signal, controlled from
System Level Control Register (SLCR)
vdu_pl_dec_arprot0/1 3 Output AXI4 ARPROT signal, controlled from SLCR
vdu_pl_dec_awqos0/1 4 Output AXI4 AWQOS signal, controlled from SLCR
vdu_pl_dec_arqos0/1 4 Output AXI4 ARQOS signal, controlled from SLCR
vdu_pl_dec_awcache0/1 4 Output AXI4 AWCACHE signal, controlled from
SLCR
vdu_pl_dec_arcache0/1 4 Output AXI4 ARCACHE signal, controlled from SLCR
vdu_pl_dec_arlock0/1 1 Output AXI4 ARLOCK signal
vdu_pl_dec_arregion0/1 4 Output AXI4 ARREGION signa
vdu_pl_dec_aruser0/1 16 Output AXI4 ARUSER signal
vdu_pl_dec_awlock0/1 1 Output AXI4 AWLOCK signal
vdu_pl_dec_awregion0/1 4 Output AXI4 AWREGION signal
vdu_pl_dec_awuser0/1 16 Output AXI4 AWUSER signal
pl_vdu_dec_buser0/1 16 Input AXI4 BUSER signal

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 22
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Table 9: Decoder Ports (cont'd)

Name Width Direction Description


pl_vdu_dec_ruser0/1 16 Input AXI4 RUSER signal
vdu_pl_dec_wid0/1 16 Output AXI4 WID signal
vdu_pl_dec_wstrb0/1 16 Output AXI4 WSTRB signal
vdu_pl_dec_wuser0/1 16 Output AXI4 WUSER signal

Clocking
Refer to Clocking and Resets for more information on clocking.

Reset
Refer to Clocking and Resets for more information on resets.

Datapath
The master interface inputs several types of video data from the external memory:

• Bitstream
• Reference frame pixels
• Co-located picture motion vectors
• Headers and residual data

The master interface outputs:

• Decoded frame pixels


• Headers and residual data
• Decoded frame motion vectors, when the picture is later used as a co-located picture

Control Path
The VDU slave interface is accessed once per frame by the APU, which sends a frame-level
command to the IP. This interface, therefore, does not require a fast data path. An interrupt is
generated at the end of each frame. These commands are processed by the embedded MCU,
which generates tile and slice-level commands to the Decoder block hardware.

Decoder Buffer Requirements


The Decoder input and output requirements are shown in the following table.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 23
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Table 10: Decoder Buffer Requirements

Requirement Description
Input Buffer
Number of Buffers >=1
Contiguous No
Alignment 0
Size Any > 0
Output Buffer
Number of Buffers
• For AVC: AL_AVC_GetMinOutputBuffersNeeded (AL_TStreamSettings
tStreamSettings, int iStack).

• For HEVC: AL_HEVC_GetMinOutputBuffersNeeded


(AL_TStreamSettings tStreamSettings, int iStack).

Note: Dimension, chroma-mode and bit-depth are in-stream settings.

Contiguous Yes
Alignment 32
Size stride * slice-height * chroma-mode.

Note: stride and slice-height should be 64-aligned.

Note: It is not possible to reduce the output buffer requirements because the VDU uses multiple internal
decoder engines.

Memory Footprints Requirements


The Decoder block memory footprint depends on the decoding parameters. The buffer size
depends on the following:

• Video resolution
• Chroma sub-sampling
• Color depth
• Coding standard: H.264 or H.265

The number of buffers depends on the coding standard.

The following table shows the worst-case memory footprint required for different buffer sizes.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 24
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Table 11: Per Instance Decoder Block Memory Footprint

720p 1080p 2160p


Examples with Two
B-Frames Per Per Per
Buffers Total Buffers Total Buffers Total
Buffer Buffer Buffer
Input Bitstream Buffer 2 1.9 MB 3.8 MB 2 4.0 MB 8.0 MB 2 16.0 MB 32.0 MB
Circular Bitstream Buffer 1 9.5 MB 9.5 MB 1 20.1 MB 20.1 MB 1 80.0 MB 80.0 MB
Reference Frames 23 2.5 MB 56.6 MB 23 5.3 MB 122.2 12 21.1 MB 253.1
MB MB
Reconstructed Frame 1 2.5 MB 2.5 MB 1 5.3 MB 5.3 MB 1 21.1 MB 21.1 MB
Intermediate Buffers 5 4.9 MB 24.4 MB 5 11.1 MB 55.4 MB 5 44.0 MB 220.0
MB
Motion vector Buffer 23 0.2 MB 5.1 MB 23 0.5 MB 11.5 MB 12 2.0 MB 23.7 MB
Slice Parameters 5 6.1 MB 30.7 MB 5 6.1 MB 30.7 MB 5 6.1 MB 30.7 MB
Other Buffers 1 3.9 MB 3.9 MB 1 3.9 MB 3.9 MB 1 3.9 MB 3.9 MB
Total 137 MB 258 MB 665 MB

Contiguous Memory Access Size Requirements


The following table contains theoretical Contiguous Memory Access (CMA) buffer requirements
for the VDU based on resolution and format. The sizes below correspond to one instance of the
decoder. Multiply these by the number of streams for multistream use cases. Other elements
such as kmssink typically increase the CMA requirements by an additional 10% to 15%.

Table 12: VDU Decoder Instance CMA Requirements

4:2:2 4:2:2 4:2:2 4:2:2 4:2:0 4:2:0 4:2:0 4:2:0


Resolution 10- bit 10- bit 8- bits 8- bits 10- bit 10- bit 8- bit 8- bit
AVC HEVC AVC HEVC AVC HEVC AVC HEVC

3840×2160 (MB) 665 582 597 513 524 466 473 414
1920×1080 (MB) 258 214 232 190 208 167 188 148
1280×720 (MB) 137 104 120 87 144 82 101 69

Memory Bandwidth
The decoder memory bandwidth depends on frame rate, resolution, color depth, chroma format,
and Decoder profile. The AMD LogiCORE™ IP provides an estimate of decoder bandwidth based
on the video parameters selected in the GUI.

AMD recommends using the fastest DDR4 memory interface possible. Specifically, the 8x8-bit
memory interface is more efficient than 4x16-bit memory interface because the x8 mode has
four bank groups, whereas the x16 mode has only two and DDR4 memory allows for
simultaneous bank group access.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 25
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Memory Format
The decoded picture buffer contains the decoded pixels. It contains two parts: luminance pixels (Luma) followed by chrominance pixels
(Chroma). Luma pixels are stored in pixel raster scan order. Chroma pixels are stored in U/V-interleaved pixel raster scan order; hence,
the Chroma part is half the size of the Luma part when using a 4:2:0 format and the same size as the Luma part when using a 4:2:2
format. The decoded picture buffer must be one contiguous memory region.

Note: Decoder output buffers width are in multiples of 256 bytes. Height is in multiples of 64. For example: Decoder output is 2048*1088 for
1920*1080 resolution.

Two packing formats are supported in external memory: eight bits per component or 10 bits per component, shown in the following
tables, respectively. The 8-bit format can only be used for an 8-bit component depth and the 10-bit format can only be used for a 10-
bit component depth. The following tables show the raster scan format supported by the decoder block for 8-bit and 10-bit color
depth.

255 248 ... 31 24 23 16 15 8 7 0


Yx+31,y ... Yx+3,y Yx+2,y Yx+1,y Yx,y
...(all Luma in pixel raster scan order)...
255 248 247 240 ... 31 24 23 16 15 8 7 0
Vx+15,y Ux+15,y ... Vx+1,y Ux+1,y Vx,y Ux,y
...(all interleaved Chroma in pixel raster scan order)...

255 254 253 244 31 30 29 20 19 10 9 0


0 0 Vx+23,y 0 0 Yx+2,y Yx+1,y Yx,y
...(all Luma in pixel raster scan order)...
255 254 253 244 ... 63 62 61 52 51 42 41 32 31 30 29 20 19 10 9 0
0 0 Vx+11,y ... 0 0 Vx+2,y Ux+2,y Vx+1,y 0 0 Ux+1,y Vx,y Ux,y
...(all interleaved Chroma in pixel raster scan order)...

The frame buffer width (pitch) may be larger than the frame width so that there are (pitch - width) ignored values between consecutive
pixel lines.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 26
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

VDU
The VDU decoded format is shown in the following table.

Table 13: VDU Table

Input for VDU (Encoded Data) VDU Decoded Format

AVC or HEVC 420, 8-bit NV12


AVC or HEVC 422, 8-bit NV16
AVC or HEVC 420, 10-bit NV12_10LE32
AVC or HEVC 422, 10-bit NV16_10LE32

Note: Native output of VDU is always in semi-planar format.

Decoder Block Register Overview


The following table lists the decoder block registers. For additional information, see the Versal
Adaptive SoC AI Engine Register Reference (AM015).

Table 14: Per Instance Decoder Registers

Register Name Offset Type Reset Value Description


MCU_RESET 0x9000 Mixed(1) 0x00000000 MCU Subsystem Reset
MCU_RESET_MODE 0x9004 Mixed(1) 0x00000001 MCU Reset Mode
MCU_STA 0x9008 Mixed(1) 0x00000000 MCU Status
MCU_WAKEUP 0x900C Mixed(1) 0x00000000 MCU Wake-up
MCU_ADDR_OFFSET_IC0 0x9010 RW 0x00000000 MCU Instruction Cache
Address Offset 0
MCU_ADDR_OFFSET_IC1 0x9014 RW 0x00000000 MCU Instruction Cache
Address Offset 1
MCU_ADDR_OFFSET_DC0 0x9018 RW 0x00000000 MCU Data Cache Address
Offset 0
MCU_ADDR_OFFSET_DC1 0x901C RW 0x00000000 MCU Data Cache Address
Offset 1
ITC_MCU_IRQ 0x9100 Mixed(1) 0x00000000 MCU Interrupt Trigger
ITC_CPU_IRQ_MSK 0x9104 RW 0x00000000 CPU Interrupt Mask
ITC_CPU_IRQ_CLR 0x9108 Mixed(1) 0x00000000 CPU Interrupt Clear
ITC_CPU_IRQ_STA 0x910C Mixed(1) 0x00000000 CPU Interrupt Status
AXI_BW 0x9204 RW 0x00000000 AXI Bandwidth
Measurement Window
AXI_ADDR_OFFSET_IP 0x9208 RW 0x00000000 Video Data Address Offset
AXI_RBW0 0x9210 RO 0x00000000 AXI Read Bandwidth
Status 0

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 27
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Table 14: Per Instance Decoder Registers (cont'd)

Register Name Offset Type Reset Value Description


AXI_RBW1 0x9214 RO 0x00000000 AXI Read Bandwidth
Status 1
AXI_WBW0 0x9218 RO 0x00000000 AXI Write Bandwidth
Status 0
AXI_WBW1 0x921C RO 0x00000000 AXI Write Bandwidth
Status 1
AXI_RBL0 0x9220 RW 0x00000000 AXI Read Bandwidth
Limiter 0
AXI_RBL1 0x9224 RW 0x00000000 AXI Read Bandwidth
Limiter 1
Notes:
1. Mixed registers have read only, write only, and read write bits grouped together.
2. The VDU output streams are stored in the DDR memory and cannot be routed directly to the decoder. Hence, it is
required to use the DDR memory with the VDU.

Microcontroller Unit Overview


The VDU core includes one Microcontroller Unit (MCU) subsystem that runs the MCU firmware
and controls the decoder block. The decoder blocks use MCU to execute the firmware. The MCU
has a 32-bit Reduced Instruction Set Computer (RISC) architecture capable of executing
pipelined transactions. The MCU has internal instruction, data cache, and AXI master interface to
interface with the external memory.

Functional Description
The following figure shows the top-level interfaces and detailed architecture of the MCU.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 28
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Figure 6: MCU (Top-level)

MCU AXI4
Master interface

MCU SRAM

IC DC To decoder
(MCU-AXI4-Lite master)
DP

ILMB

MCU
DLMB
VDU AXI4 From decoder
Lite Slave Top Level Register
IRQ
Interface

The MCU interfaces to peripherals using a 32-bit AXI4-Lite master interface. It has a local
memory bus, an AXI4 32-bit instruction, and data cache interfaces.

The MCU block has a 32 KB local memory for internal operations that is shared with the CPU for
boot and mailbox communication. The MCU has a 32 KB instruction cache with 32-byte cache
line width. It has a 4 KB data cache with 16-byte cache line width. The data cache has a write-
through cache implementation.

Interfaces and Ports


The following table shows the AXI4 instruction and data cache interface ports of the MCU.

Table 15: MCU Ports

Port Size (bits) Direction Description


vdu_pl_mcu_m_axi_araddr 64 Output AXI4 read address
vdu_pl_mcu_m_axi_arburst 2 Output AXI4 read burst type
vdu_pl_mcu_m_axi_arcache 4 Output AXI4 ARCACHE value
vdu_pl_mcu_m_axi_arid 18 Output AXI4 read master ID
vdu_pl_mcu_m_axi_arlen 8 Output AXI4 read burst size
vdu_pl_mcu_m_axi_arlock 1 Output AXI4 ARLOCK signal
vdu_pl_mcu_m_axi_arprot 3 Output AXI4 ARPROT signal
vdu_pl_mcu_m_axi_arqos 4 Output AXI4 ARQOS signal
pl_vdu_mcu_m_axi_arready 1 Input AXI4 ARREADY signal
vdu_pl_mcu_m_axi_arsize 3 Output AXI4 ARSIZE signal
vdu_pl_mcu_m_axi_arvalid 1 Output AXI4 ARVALID signal

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 29
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Table 15: MCU Ports (cont'd)

Port Size (bits) Direction Description


vdu_pl_mcu_m_axi_awaddr 64 Output AXI4 AWADDR signal
vdu_pl_mcu_m_axi_awburst 2 Output AXI4 AWBURST signal
vdu_pl_mcu_m_axi_awcache 4 Output AXI4 AWCACHE signal
vdu_pl_mcu_m_axi_awid 18 Output AXI4 AWID signal
vdu_pl_mcu_m_axi_awlen 8 Output AXI4 AWLEN signal
vdu_pl_mcu_m_axi_awlock 1 Output AXI4 AWLOCK signal
vdu_pl_mcu_m_axi_awprot 3 Output AXI4 AWPROT signal
vdu_pl_mcu_m_axi_awqos 4 Output AXI4 AWQOS signal
pl_vdu_mcu_m_axi_awready 1 Input AXI4 AWREADY signal
vdu_pl_mcu_m_axi_awsize 3 Output AXI4 AWSIZE signal
vdu_pl_mcu_m_axi_awvalid 1 Output AXI4 AWVALID signal
pl_vdu_mcu_m_axi_bid 18 Input AXI4 BID signal
vdu_pl_mcu_m_axi_bready 1 Output AXI4 BREADY signal
pl_vdu_mcu_m_axi_bresp 2 Input AXI4 BRESP signal
pl_vdu_mcu_m_axi_bvalid 1 Input AXI4 BVALID signal
pl_vdu_mcu_m_axi_rdata 32 Input AXI4 RDATA signal
pl_vdu_mcu_m_axi_rid 18 Input AXI4 RID signal
pl_vdu_mcu_m_axi_rlast 1 Input AXI4 RLAST signal
vdu_pl_mcu_m_axi_rready 1 Output AXI4 RREADY signal
pl_vdu_mcu_m_axi_rresp 2 Input AXI4 RRESP signal
pl_vdu_mcu_m_axi_rvalid 1 Input AXI4 RVALID signal
vdu_pl_mcu_m_axi_wdata 32 Output AXI4 WDATA signal
vdu_pl_mcu_m_axi_wlast 1 Output AXI4 WLAST signal
pl_vdu_mcu_m_axi_wready 1 Input AXI4 WREADY signal
vdu_pl_mcu_m_axi_wstrb 4 Output AXI4 WSTRB signal
vdu_pl_mcu_m_axi_wvalid 1 Output AXI4 WVALID signal
vdu_pl_mcu_m_axi_arregion 4 Output AXI4 ARREGION signal
vdu_pl_mcu_m_axi_aruser 16 Output AXI4 ARUSER signal
vdu_pl_mcu_m_axi_awregion 4 Output AXI4 AWREGION signal
vdu_pl_mcu_m_axi_awuser 16 Output AXI4 AWUSER signal
pl_vdu_mcu_m_axi_buser 16 Input AXI4 BUSER signal
pl_vdu_mcu_m_axi_ruser 16 Input AXI4 RUSER signal
vdu_pl_mcu_m_axi_wuser 16 Output AXI4 WUSER signal

The following table summarizes the AXI4-Lite slave interface ports of the MCU subsystem.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 30
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Table 16: AXI4-Lite Slave Ports

Port Width Direction Description


pl_vdu_awaddr_axi_lite_apb 23 Input AXI4 AWADDR signal
pl_vdu_awprot_axi_lite_apb 3 Input AXI4 AWPROT signal
pl_vdu_awvalid_axi_lite_apb 1 Input AXI4 AWVALID signal
vdu_pl_awready_axi_lite_apb 1 Output AXI4 AWREADY signal
pl_vdu_wdata_axi_lite_apb 32 Input AXI4 WDATA signal
pl_vdu_wstrb_axi_lite_apb 4 Input AXI4 WSTRB signal
pl_vdu_wvalid_axi_lite_apb 1 Input AXI4 WVALID signal
vdu_pl_wready_axi_lite_apb 1 Output AXI4 WREADY signal
vdu_pl_bresp_axi_lite_apb 2 Output AXI4 BRESP signal
vdu_pl_bvalid_axi_lite_apb 1 Output AXI4 BVALID signal
pl_vdu_bready_axi_lite_apb 1 Input AXI4 BREADY signal
pl_vdu_araddr_axi_lite_apb 23 Input AXI4 ARADDR signal
pl_vdu_arprot_axi_lite_apb 3 Input AXI4 ARPROT signal
pl_vdu_arvalid_axi_lite_apb 1 Input AXI4 ARVALID signal
vdu_pl_arready_axi_lite_apb 1 Output AXI4 ARREADY signal
vdu_pl_rdata_axi_lite_apb 32 Output AXI4 RDATA signal
vdu_pl_rresp_axi_lite_apb 2 Output AXI4 RRESP signal
vdu_pl_rvalid_axi_lite_apb 1 Output AXI4 RVALID signal
pl_vdu_rready_axi_lite_apb 1 Input AXI4 RREADY signal

Control Flow
The MCU is kept in sleep mode after applying the reset until the firmware boot code is
downloaded by the kernel device driver into the internal memory of the MCU. After downloading
the boot code and completing the MCU initialization sequence, the control software
communicates with the MCU using a mailbox mechanism implemented in the internal SRAM of
the MCU. The MCU sends an acknowledgment to the control software and performs the
decoding operation. When the requested operation is complete, the MCU communicates the
status to the control software.

For more details about control software and MCU firmware, refer to Section III: Application
Software Development.

MCU Register Overview


The following table lists the MCU registers. For additional information, see the Versal Adaptive
SoC AI Engine Register Reference (AM015).

Note:

1. VDU0 base address is 0xA4020000

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 31
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

2. VDU1 base address is 0xA4120000


3. VDU2 base address is 0xA4220000
4. VDU3 base address is 0xA4320000. MCU registers should read as base_address of respective VDU
+offset.

Table 17: MCU Registers

Reset
Register Offset Type Description
Value
MCU_RESET 0x9000 mixed(1) 0x000000 MCU Subsystem Reset
00
MCU_RESET_MODE 0x9004 mixed(1) 0x000000 MCU Reset Mode
01
MCU_STA 0x9008 mixed(1) 0x000000 MCU Status
00
MCU_WAKEUP 0x900C mixed(1) 0x000000 MCU Wake-up
00
MCU_ADDR_OFFSET_ 0x9010 RW 0x000000 MCU Instruction Cache Address Offset 0
IC0 00
MCU_ADDR_OFFSET_ 0x9014 RW 0x000000 MCU Instruction Cache Address Offset 1
IC1 00
MCU_ADDR_OFFSET_ 0x9018 RW 0x000000 MCU Data Cache Address Offset 0
DC0 00
MCU_ADDR_OFFSET_ 0x901C RW 0x000000 MCU Data Cache Address Offset 1
DC1 00
Notes:
1. Mixed registers have read only, write only, and read write bits grouped together.

AXI Performance Monitor


Overview
The AXI Performance Monitor (APM) is implemented inside the embedded Video Decoder Unit
(VDU). The VDU AXI Performance Monitor (VAPM) allows access to system level behavior in a
non-invasive way and without burdening the design with additional soft IP.

The APM block is capable of measuring the number of read/write bytes and address based
transactions within a measurement window on the AXI master bus from the Decoder blocks. The
APM can additionally measure master ID based read and write latency within a measurement
window. The APM supports cumulative latency value along with the number of outstanding
transfers being considered for latency measurement. The APM has the ability to interrupt the
host processor when the status registers are ready to be read.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 32
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Functional Description
The following figure shows the VAPM.

Figure 7: VDU AXI Performance Monitor

AXI-128-bit AXI-128-bit AXI-32-bit

VAPM VAPM

Clock from DPLL

VDEC Cores

Reset MCU
(from PL)
Interrupt

32-bit APB-AXI4-Lite
AXI4-Lite Interface
SLCR Bridge Interface MCU AXI Int + Async
Bridge

VDU Wrapper

X26813-061722

The following sections describe the different operating modes of the VAPM.

Operating Timing Window Generation


The VAPM generates measurement parameters based on two user-selected operating modes.

Start/Stop Mode

In this mode, the measurement window is determined by the start/stop bit in


VDU_SLCR.APMn_TRG[start_stop] (n=0,1,2,3) bit. A measurement is triggered when
the start bit is set from 0 to 1 in this register and it is stopped when this bit is reset from 1 to 0.

Fixed Duration Timing Window

In this mode, a 32-bit counter is used to generate a fixed length measurement window. When the
counter reaches the maximum value, it resets to a value specified in the
VDU_SLCR.APMn_TIMER (n = 0, 1, 2, 3) register. The measurement is continued until the 32-bit
counter reaches the value set in the APMn_TIMER register and a capture pulse is generated to
store the measured values in the VDU_SLCR result registers.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 33
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

The VAPM is capable of doing the following measurements:

• AXI Read and Write Transaction Measurement: Two 32-bit registers count number of read
and write 128-bit AXI bus cycles transferred in a given timing window. The measured value is
transferred to the VDU_SLCR result register when a capture pulse is generated based on the
start/stop mode or the fixed duration timing window mode. To compute the number of bytes
transferred, VDU_SLCR must be multiplied by 16.

• AXI Read and Write Byte Count Measurement: Two 32-bit registers are implemented to
count the number of read and write bytes transferred in a given timing window. The register
content has to be multiplied by 16 to know the actual byte count transferred across AXI 128-
bit master interface. The measured value is transferred to the VDU_SLCR result register when
a capture pulse is generated, based on the start/stop mode or fixed duration timing window
mode.

• AXI Transaction Latency Measurement: Read and write latency can be measured based on
AXI master ID. Read latency is defined as AXI read address acknowledged to last read data
cycle. Write latency is defined as AXI write address acknowledged to write response
handshaking between master and slave. A 13-bit counter is implemented to measure the
latency on read and write bus. The timer is used to timestamp an event. The difference in the
timestamp between two events is used to calculate the latency.

Latency can be calculated on transaction ID basis. It is possible to select a single ID or all IDs for
latency calculation. For additional information, see the Versal Adaptive SoC AI Engine Register
Reference (AM015).

APM Registers
Note: VDU base address is 0XA4040000, APM registers should be base_address+offset (0xA4040000
+offset)

Table 18: APM Registers

Register Offset Width Type Reset Value Description


APM_InputT_GBL_CNTL 0×0090 32 Mixed 0x00000001 This register controls APM
timing window completion
interrupt.
APM0_CFG 0x0100 32 Mixed 0x00000002 APM0_CFG
APM0_TIMER 0x0104 32 RW 0x00000000 APM0_TIMER
APM0_TRG 0x0108 32 Mixed 0x00000000 APM0_TRG
APM0_RESULT0 0x010C 32 RO 0x00000000 APM0_RESULT0
APM0_RESULT1 0x0110 32 RO 0x00000000 APM0_RESULT1
APM0_RESULT2 0x0114 32 RO 0x00000000 APM0_RESULT2
APM0_RESULT3 0x0118 32 RO 0x00000000 APM0_RESULT3
APM0_RESULT4 0x011C 32 Mixed 0x00000000 APM0_RESULT4
APM0_RESULT5 0x0120 32 Mixed 0x00000000 APM0_RESULT5
APM0_RESULT6 0x0124 32 Mixed 0x00000000 APM0_RESULT6

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 34
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Table 18: APM Registers (cont'd)

Register Offset Width Type Reset Value Description


APM0_RESULT7 0x0128 32 Mixed 0x00000000 APM0_RESULT7
APM0_RESULT8 0x012C 32 Mixed 0x00000000 APM0_RESULT8
APM0_RESULT9 0x0130 32 Mixed 0x00000000 APM0_RESULT9
APM0_RESULT10 0x0134 32 Mixed 0x00000000 APM0_RESULT10
APM0_RESULT11 0x0138 32 Mixed 0x00000000 APM0_RESULT11
APM0_RESULT12 0x013C 32 Mixed 0x00000000 APM0_RESULT12
APM0_RESULT13 0x0140 32 Mixed 0x00000000 APM0_RESULT13
APM0_RESULT14 0x0144 32 Mixed 0x00000000 APM0_RESULT14
APM0_RESULT15 0x0148 32 Mixed 0x00000000 APM0_RESULT15
APM0_RESULT16 0x014C 32 Mixed 0x00000000 APM0_RESULT16
APM0_RESULT17 0x0150 32 Mixed 0x00000000 APM0_RESULT17
APM0_RESULT18 0x0154 32 Mixed 0x00000000 APM0_RESULT18
APM0_RESULT19 0x0158 32 Mixed 0x00000000 APM0_RESULT19
APM0_RESULT20 0x015C 32 Mixed 0x1FFF0000 APM0_RESULT20
APM0_RESULT21 0x0160 32 Mixed 0x1FFF0000 APM0_RESULT21
APM0_RESULT22 0x0164 32 Mixed 0x1FFF0000 APM0_RESULT22
APM0_RESULT23 0x0168 32 Mixed 0x1FFF0000 APM0_RESULT23
APM0_RESULT24 0x016C 32 Mixed 0x00000000 APM0_RESULT24
APM1_CFG 0x0200 32 Mixed 0x00000002 APM1_CFG
APM1_TIMER 0x0204 32 RW 0x00000000 APM1_TIMER
APM1_TRG 0x0208 32 Mixed 0x00000000 APM1_TRG
APM1_RESULT0 0x020C 32 RO 0x00000000 APM1_RESULT0
APM1_RESULT1 0x0210 32 RO 0x00000000 APM1_RESULT1
APM1_RESULT2 0x0214 32 RO 0x00000000 APM1_RESULT2
APM1_RESULT3 0x0218 32 Mixed 0x00000000 APM1_RESULT3
APM1_RESULT4 0x021C 32 Mixed 0x00000000 APM1_RESULT4
APM1_RESULT5 0x0220 32 Mixed 0x00000000 APM1_RESULT5
APM1_RESULT6 0x0224 32 Mixed 0x00000000 APM1_RESULT6
APM1_RESULT7 0x0228 32 Mixed 0x00000000 APM1_RESULT7
APM1_RESULT8 0x022C 32 Mixed 0x00000000 APM1_RESULT8
APM1_RESULT9 0x0230 32 Mixed 0x00000000 APM1_RESULT9
APM1_RESULT10 0x0234 32 Mixed 0x00000000 APM1_RESULT10
APM1_RESULT11 0x0238 32 Mixed 0x00000000 APM1_RESULT11
APM1_RESULT12 0x023C 32 Mixed 0x00000000 APM1_RESULT12
APM1_RESULT13 0x0240 32 Mixed 0x00000000 APM1_RESULT13
APM1_RESULT14 0x0244 32 Mixed 0x00000000 APM1_RESULT14
APM1_RESULT15 0x0248 32 Mixed 0x00000000 APM1_RESULT15
APM1_RESULT16 0x024C 32 Mixed 0x00000000 APM1_RESULT16
APM1_RESULT17 0x0250 32 Mixed 0x00000000 APM1_RESULT17

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 35
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Table 18: APM Registers (cont'd)

Register Offset Width Type Reset Value Description


APM1_RESULT18 0x0254 32 Mixed 0x00000000 APM1_RESULT18
APM1_RESULT19 0x0258 32 Mixed 0x00000000 APM1_RESULT19
APM1_RESULT20 0x025C 32 Mixed 0x1FFF0000 APM1_RESULT20
APM1_RESULT21 0x0260 32 Mixed 0x1FFF0000 APM1_RESULT21
APM1_RESULT22 0x0264 32 Mixed 0x1FFF0000 APM1_RESULT22
APM1_RESULT23 0x0268 32 Mixed 0x1FFF0000 APM1_RESULT23
APM1_RESULT24 0x026C 32 Mixed 0x00000000 APM1_RESULT24
APM2_CFG 0x0300 32 Mixed 0x00000002 APM2_CFG
APM2_TIMER 0x0304 32 RW 0x00000000 APM2_TIMER
APM2_TRG 0x0308 32 Mixed 0x00000000 APM2_TRG
APM2_RESULT0 0x030C 32 RO 0x00000000 APM2_RESULT0
APM2_RESULT1 0x0310 32 RO 0x00000000 APM2_RESULT1
APM2_RESULT2 0x0314 32 RO 0x00000000 APM2_RESULT2
APM2_RESULT3 0x0318 32 RO 0x00000000 APM2_RESULT3
APM2_RESULT4 0x031C 32 Mixed 0x00000000 APM2_RESULT4
APM2_RESULT5 0x0320 32 Mixed 0x00000000 APM2_RESULT5
APM2_RESULT6 0x0324 32 Mixed 0x00000000 APM2_RESULT6
APM2_RESULT7 0x0328 32 Mixed 0x00000000 APM2_RESULT7
APM2_RESULT8 0x032C 32 Mixed 0x00000000 APM2_RESULT8
APM2_RESULT9 0x0330 32 Mixed 0x00000000 APM2_RESULT9
APM2_RESULT10 0x0334 32 Mixed 0x00000000 APM2_RESULT10
APM2_RESULT11 0x0338 32 Mixed 0x00000000 APM2_RESULT11
APM2_RESULT12 0x033C 32 Mixed 0x00000000 APM2_RESULT12
APM2_RESULT13 0x0340 32 Mixed 0x00000000 APM2_RESULT13
APM2_RESULT14 0x0344 32 Mixed 0x00000000 APM2_RESULT14
APM2_RESULT15 0x0348 32 Mixed 0x00000000 APM2_RESULT15
APM2_RESULT16 0x034C 32 Mixed 0x00000000 APM2_RESULT16
APM2_RESULT17 0x0350 32 Mixed 0x00000000 APM2_RESULT17
APM2_RESULT18 0x0354 32 Mixed 0x00000000 APM2_RESULT18
APM2_RESULT19 0x0358 32 Mixed 0x00000000 APM2_RESULT19
APM2_RESULT20 0x035C 32 Mixed 0x1FFF0000 APM2_RESULT20
APM2_RESULT21 0x0360 32 Mixed 0x1FFF0000 APM2_RESULT21
APM2_RESULT22 0x0364 32 Mixed 0x1FFF0000 APM2_RESULT22
APM2_RESULT23 0x0368 32 Mixed 0x1FFF0000 APM2_RESULT23
APM2_RESULT24 0x036C 32 Mixed 0x00000000 APM2_RESULT24
APM3_CFG 0x0400 32 Mixed 0x00000002 APM3_CFG
APM3_TIMER 0x0404 32 RW 0x00000000 APM3_TIMER
APM3_TRG 0x0408 32 Mixed 0x00000000 APM3_TRG
APM3_RESULT0 0x040C 32 RO 0x00000000 APM3_RESULT0

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 36
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 4: Core Architecture

Table 18: APM Registers (cont'd)

Register Offset Width Type Reset Value Description


APM3_RESULT1 0x0410 32 RO 0x00000000 APM3_RESULT1
APM3_RESULT2 0x0414 32 RO 0x00000000 APM3_RESULT2
APM3_RESULT3 0x0418 32 RO 0x00000000 APM3_RESULT3
APM3_RESULT4 0x041C 32 Mixed 0x00000000 APM3_RESULT4
APM3_RESULT5 0x0420 32 Mixed 0x00000000 APM3_RESULT5
APM3_RESULT6 0x0424 32 Mixed 0x00000000 APM3_RESULT6
APM3_RESULT7 0x0428 32 Mixed 0x00000000 APM3_RESULT7
APM3_RESULT8 0x042C 32 Mixed 0x00000000 APM3_RESULT8
APM3_RESULT9 0x0430 32 Mixed 0x00000000 APM3_RESULT9
APM3_RESULT10 0x0434 32 Mixed 0x00000000 APM3_RESULT10
APM3_RESULT11 0x0438 32 Mixed 0x00000000 APM3_RESULT11
APM3_RESULT12 0x043C 32 Mixed 0x00000000 APM3_RESULT12
APM3_RESULT13 0x0440 32 Mixed 0x00000000 APM3_RESULT13
APM3_RESULT14 0x0444 32 Mixed 0x00000000 APM3_RESULT14
APM3_RESULT15 0x0448 32 Mixed 0x00000000 APM3_RESULT15
APM3_RESULT16 0x044C 32 Mixed 0x00000000 APM3_RESULT16
APM3_RESULT17 0x0450 32 Mixed 0x00000000 APM3_RESULT17
APM3_RESULT18 0x0454 32 Mixed 0x00000000 APM3_RESULT18
APM3_RESULT19 0x0458 32 Mixed 0x00000000 APM3_RESULT19
APM3_RESULT20 0x045C 32 Mixed 0x1FFF0000 APM3_RESULT20
APM3_RESULT21 0x0460 32 Mixed 0x1FFF0000 APM3_RESULT21
APM3_RESULT22 0x0464 32 Mixed 0x1FFF0000 APM3_RESULT22
APM3_RESULT23 0x0468 32 Mixed 0x1FFF0000 APM3_RESULT23
APM3_RESULT24 0x046C 32 Mixed 0x00000000 APM3_RESULT24
Notes:
1. Mixed registers have read only, write only, and read write bits grouped together.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 37
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 5: Designing with the Core

Chapter 5

Designing with the Core


This section includes guidelines and additional information to facilitate designing with the core.

General Design Guidelines


The Video Decode Unit core is an embedded hard IP in AMD Versal™ Adaptive SoC (Versal AI
Edge and Versal AI Core Series). All interfaces are connected through AXI interconnect blocks in
the PL. The VDU core is AXI4 compliant on its AXI master interfaces. It can be connected to the
slave ports of Versal NoC hard block which is connected to DDR memory controller. There are
no direct (hardwired) connections from the VDU to the processing system (PS).

The register programming interface of the VDU core connects to PS Master ports (M_AXI_FPD
or M_AXI_LPD). The VDU core clock can be used from PL or through an internal PLL inside the
VDU core.

Interrupts
Each VDU decoder instance uses one interrupt (vdu_host_interrupt*). There are options in
LogiCORE IP to use single interrupt per instance, and single interrupt for all instances. Clear
checkbox Combine VDU Interrupt to get single interrupt for all instances. This interrupt has to
be connected to either PL-PS-IRQ0[7:0] or PL-PS-IRQ1[7:0]. If there are other interrupts
in the design, the interrupt has to be concatenated along with the other interrupts and then
connected to the PS.

Clocking and Resets


The Video Decode Unit (VDU) core supports one clocking topology, the internal phase locked
loop (PLL). An internal DPLL drives the high frequency core (800 MHz) and MCU (571 MHz)
clocks based on an input reference clock from the programmable logic (PL) or PS. The internal
PLL generates a clock for the decoder block.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 38
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 5: Designing with the Core

Note: All AXI clocks are supplied with clocks from external PL sources. These clocks are asynchronous to
core decoder block clock. All primary clocks in VDU are asynchronous to each other.

The VDU core is reset under the following conditions:

• Initially, while the PL is in power-up/configuration mode, the VDU core is held in reset.
• After the PL is fully configured, a PL based reset signal can be used to reset the VDU for
initialization and bring-up. Platform management unit (PMU) in the processing system can
drive this reset signal to control the reset state of the VDU.
• During partial reconfiguration (PR), the VDU block is kept under reset, if it is part of the
dynamically reconfigurable module.

Functional Description
Clocking
The following table describes the clock domains in the VDU core.

Table 19: VDU Clock Domains

Max Freq
Domain Description
(MHz)
Core clock 800 Processing core, most of the logic and memories
MCU clock 571 Internal micro controllers
AXI Master Port clock 400 m_axi_dec_aclk
pl_vdu_axi_mcu_clk
AXI master port for memory access, 128-bit, typically connected to PS
AFI-FM (HP) port or to a soft memory controller in the PL
AXI4-Lite slave port clock 167 s_axi_lite_aclk, AXI4-Lite slave port (32-bit) for register
programming
NPI Clock 300 NPI interface clock

Note: All AXI clocks are supplied with clocks from external PL/PS sources. All primary clocks in VDU are
asynchronous to each other.

The following figure shows the clock generation options inside VDU block.

Note: The following blocks work on a single clock domain:

• pll_ref_clk is sourced externally to the device, typically by a programmable clock integrated circuit.
• Video decoder blocks work under the core_clk domain generated by the DPLL.
• MCU for decoder work under the MCU_clk domain generated by the DPLL.
• m_axi_dec_aclk is the AXI clock input from the PL for the 128-bit AXI master interfaces for the
decoder.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 39
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 5: Designing with the Core

• s_axi_lite_aclk is the AXI4-Lite clock from the PL and PS.


• m_axi_mcu_aclk is the MCU AXI master clock from the PL.

Note that the following blocks work on a single clock domain:

Figure 8: Clock Generation Options

pll_ref_clk

Div
DPLL

Div
PL VDU
Decoder

s_axi_lite_aclk

m_axi_mcu_aclk

m_axi_dec_aclk

X26812-062922

The following clock frequency requirements must be met while providing clocks from PL:

• The AXI clock for decoder interface is limited to 400 MHz.


• The following ratio requirements need to be met:
○ s_axi_lite_aclk ≤ 2 × m_axi_dec_aclk

Refer to Microcontroller Unit Overview for more information on the MCU.

The core_clk is generated based on the VDU DPLL.

The mcu_clk is generated based on the VDU DPLL.

DPLL Overview
The VDU core uses in-built DPLL for generating the following clocks:

• VDU Decoder core clock (upto 800 MHz)


• VDU MCU clock (upto 571 MHz)

The AMD Vivado™ wrapper for VDU should handle programming of DPLL to get the required
clocks. This programming is static.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 40
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 5: Designing with the Core

Generation of Primary Clock


The PLL has a Voltage Controlled Oscillator (VCO) block which generates an output clock based
on the input reference clock. The output clock from VCO is generated based on a frequency
multiplier value. The output clock of the VCO is divided by an output divider to generate the final
clock.

VCO Frequency and MF Value


The VCO operating frequency can be determined by using the following relationship:

fvco = frefclk × M

and

fclkout = fvco / O

where, M corresponds to the integer feedback divide value and O corresponds to the value of
output divide.

Note: The PLL does not support fractional divider values.

IMPORTANT! Select the PLL feedback multiplier value based on the supported VCO frequency range
(fvco).

Refer to the Versal AI Core Series Data Sheet: DC and AC Switching Characteristics (DS957) for more
information on the operating range of fvco.

Select the output divider (O) based on the required core clock or MCU clock frequency.

Reset Sequence
The state of the VDU during PL power up and the initialization sequence for the VDU are as
follows:

1. PL supply, VDU power supply, RAM supply is turned ON. There is no requirement on
sequence.
2. PMC releases POR pin to the VDU (por_pl_b).
3. POR IP kicks off to monitor VDU, PL, and RAM supplies. Once all powers are detected, the IP
releases POR reset, which is sent to PMC readback supply status.
4. PMC polls for the power status, and deasserts IPOR register after power ramp is complete.
At this step, Power on Reset to VDU is released (through the AND gate).
5. Send VDU enable information through eFUSE.
6. PMC removes isolation gasket controls through PCSR bits.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 41
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 5: Designing with the Core

7. Once the clocks are stable, deassert INITSTATE.


8. It has to be ensured that the reset to VDU is removed only after the clocks are stable.

Note: The VDU clocks are available while the reset is released. The PL should be configured before
releasing the raw reset.

Additional initialization is done by software through programming the VDU core registers after the PL is
configured and core is in a reset release state.

Reset
The VDU hard block can be held under reset under the following conditions:

• When external reset input vdu_resetn signal is asserted.


• During PL configuration.
• When the VDU to PL isolation is not removed.

The VDU reset signal must be asserted for, at least, two clock cycles of the VDU DPLL reference
clock (the slowest clock input to the VDU). The VDU registers can be accessed after the reset
signal is deasserted.

Note:

• If software resets the VDU block in the middle of a frame, use the software to clear the physical
memory allocated for the VDU.
• The reset does not need to be asserted between changes to the configuration during run-time through
the control software.
• The vdu_resetn signal of AMD Versal™ VDU should be driven from Processor System Reset Module
(proc_sys_reset) which is driven by any of the 4 PS reset signals.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 42
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 6: Design Flow Steps

Chapter 6

Design Flow Steps


This section describes customizing and generating the core, constraining the core, and the
simulation, synthesis, and implementation steps that are specific to this IP core. More detailed
information about the standard AMD Vivado™ design flows and the IP integrator can be found in
the following Vivado Design Suite user guides:

• Vivado Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994)
• Vivado Design Suite User Guide: Designing with IP (UG896)
• Vivado Design Suite User Guide: Getting Started (UG910)
• Vivado Design Suite User Guide: Logic Simulation (UG900)

Customizing and Generating the Core


This section includes information about using AMD tools to customize and generate the core in
the AMD Vivado™ Design Suite.

If you are customizing and generating the core in the Vivado IP integrator, see the Vivado Design
Suite User Guide: Designing IP Subsystems using IP Integrator (UG994) for detailed information. IP
integrator might auto-compute certain configuration values when validating or generating the
design. To check whether the values do change, see the description of the parameter in this
chapter. To view the parameter value, run the validate_bd_design command in the Tcl
console.

You can customize the IP for use in your design by specifying values for the various parameters
associated with the IP core using the following steps:

1. Select the IP from the IP catalog.


2. Double-click the selected IP or select the Customize IP command from the toolbar or right-
click menu.

For details, see the Vivado Design Suite User Guide: Designing with IP (UG896) and the Vivado
Design Suite User Guide: Getting Started (UG910).

Figures in this chapter are illustrations of the Vivado IDE. The layout depicted here might vary
from the current version.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 43
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 6: Design Flow Steps

Basic Configuration Tab


The Basic Configuration tab shown in the following figure lets you select video parameters used
to calculate the total dynamic power used by decoder blocks.

Figure 9: VDU Basic Configuration Tab

The VDU subsystem is controlled by software at runtime. Configuration options set in the VDU
GUI are used to estimate bandwidth, and calculate the buffer size.

The parameters on the Basic Configuration tab are as follows:

• Component Name: Component name is set automatically by IP integrator

• Bandwidth Summary: Reports the decoder buffer size. Reports bandwidth for the decoder.

• Number of Decoder Instances: Select 1 to 4

• Maximum Number of Decoder Streams: Per VDU instance 32 streams are supported. For
Four VDU instances Maximum 100 streams are supported. Select one to 32 streams.
Determines memory requirements.

Note: The VDU supports 32 streams, but AMD recommends choosing the closest combination using
the available options in the GUI.

• Coding Standard: Select AVC or HEVC.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 44
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 6: Design Flow Steps

• Coding Type: Select the GOP structure to use for decoding:

• Intra frame only: I-frame only


• Intra and inter frame: I-frame, B-frame, and P-frames

• Resolution: Select one of the following resolutions:

• 1280×720
• 1920×1080
• 3840×2160

• Frames Per Second: Select 15, 30, 45, or 60 fps.

• Color Format: Select one of the following color formats:

• 4:0:0 - monochrome
• 4:2:0
• 4:2:2

• Color Depth: Select 8 or 10 bits per channel.

Interfacing the Core with Versal Devices


To integrate the VDU core into an IP integrator (IPI) block design, follow these steps:

1. Launch the AMD Vivado™ IDE and create a new project.

2. Click Next on New Project wizard until you reach the Family Selection window.
3. Select a target device for the VDU core.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 45
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 6: Design Flow Steps

4. Click Create Block Design.


5. Click Add IP and type VDU. The following IP appears.

6. Add VDU to the block design.


7. Add AMD Versal™ Control, Interfaces, and Processing System (CIPS) IP to the block design as
shown.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 46
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 6: Design Flow Steps

8. In Versal CIPS, enable pl1_ref_clk, pl2_ref_clk, pl3_ref_clk, and set the frequencies as
100,100,167 MHz respectively.

9. Configure Versal CIPS to enable AXI master interfaces, clocking, and PL-PS interrupt signal
per your design requirements. The configuration used in this tutorial is displayed below.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 47
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 6: Design Flow Steps

10. Enable PL to PS interrupts IRQ0 [0-3] ports.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 48
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 6: Design Flow Steps

11. Configure the following parameters in GUI of VDU IP. Set number of decoder instances to 4.

12. Manually make the connections as follows.


• Connect ‘vdu_host_interrupt0’ of VDU to ‘pl_ps_irq0’ of Versal CIPS.
• Connect ‘vdu_host_interrupt1’ of VDU to ‘pl_ps_irq1’ of Versal CIPS.
• Connect ‘vdu_host_interrupt2’ of VDU to ‘pl_ps_irq2’ of Versal CIPS.
• Connect ‘vdu_host_interrupt3’ of VDU to ‘pl_ps_irq3’ of Versal CIPS.
13. Add a SmartConnect IP to the block design. In the GUI set no of master interfaces to 2, slave
interface to 1, clock inputs to 2.

14. Then instantiate a Processor System Reset. Connect ‘ext_reset_in’ of proc_sys_reset_0 to


‘pl0_resetn’ of Versal CIPS.
15. Make the following connections manually:
• Connect ‘M_AXI_FPD’ interface of Versal CIPS to ‘S00_AXI’ interface of smartconnect_0.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 49
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 6: Design Flow Steps

• Connect ‘pl3_ref_clk’ pin of Versal CIPS to ‘aclk’ pin of smartconnect_0.


• Connect ‘pl3_ref_clk’ pin of Versal CIPS to ‘s_axi_lite_aclk’ pin of VDU.
• Connect ‘pl1_ref_clk’ pin of Versal CIPS to ‘ref_clk’ pin of VDU.
• Connect ‘M00_AXI’ interface of smartconnect_0 to ‘S_AXI_LITE’ interface of VDU.
• Connect ‘pl3_ref_clk’ pin of Versal CIPS to ‘slowest_sync_clk’ pin of
proc_sys_reset_0.
• Connect ‘peripheral_aresetn’ pin of proc_sys_reset_0 to ‘vdu_resetn’ pin of VDU.
• Connect ‘interconnect_aresetn’ pin of proc_sys_reset_0 to ‘aresetn’ pin of
smartconnect_0.
The address and description of PL_AXI_FPD can be found in Versal Adaptive SoC Register
Reference (AM012). The address is 0xFD360000. The register is used to configure QoS and
the FIFO. It is part of the AFIFM Module. The AFIFM Module documentation provides
relative addresses and values for fields defining traffic priority and maximum number of read
or write commands.
16. Instantiate a second AXI smart connect IP (smartconnect_1). In the GUI change the number
of master interfaces to 1, slave interfaces to 10. Make the following connections:
• Connect ‘M_AXI_DEC0_0’ interface of VDU to ‘S00_AXI’ interface of smartconnect_1
• Connect ‘M_AXI_DEC0_1’ interface of VDU to ‘S01_AXI’ interface of smartconnect_1
• Connect ‘M_AXI_DEC1_0’ interface of VDU to ‘S02_AXI’ interface of smartconnect_1
• Connect ‘M_AXI_DEC1_1’ interface of VDU to ‘S03_AXI’ interface of smartconnect_1
• Connect ‘M_AXI_DEC2_0’ interface of VDU to ‘S04_AXI’ interface of smartconnect_1
• Connect ‘M_AXI_DEC2_1’ interface of VDU to ‘S05_AXI’ interface of smartconnect_1
• Connect ‘M_AXI_DEC3_0’ interface of VDU to ‘S06_AXI’ interface of smartconnect_1
• Connect ‘M_AXI_DEC3_1’ interface of VDU to ‘S07_AXI’ interface of smartconnect_1
• Connect ‘M_AXI_MCU’ interface of VDU to ‘S08_AXI’ interface of smartconnect_1
• Connect ‘M01_AXI’ interface of smartconnect_0 to ‘S09_AXI’ interface of
smartconnect_1
17. In add IP section instantiate clocking wizard IP. In the GUI enable reset, under optional ports
tab and set it as Active Low.
• In output clocks tab, set ‘clk_out1’ frequency to 400MHz.
• Connect ‘clk_in1’ pin of clock_wizard_0 to ‘pl2_ref_clk’ pin of Versal CIPS.
• Connect ‘clk_out1’ pin of clock_wizard_0 to ‘m_axi_mcu_aclk’ and
‘m_axi_dec_aclk’ pins of VDU.
• Connect ‘clk_out1’ pin of clock_wizard_0 to ‘aclk1’ pin of smartconnect_0.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 50
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 6: Design Flow Steps

• Connect ‘resetn’ pin of clock_wizard_0 to ‘pl0_resetn’ of Versal CIPS.


• Connect ‘aclk’ pin of smartconnect_1 to ‘clk_out1’ pin of clock_wizard_0.
18. Add AXI BRAM Controller and Embedded memory generator from add IP section.
• In GUI set number of slave interfaces to 1, number of BRAM interfaces to 1.
• Connect BRAM_PORTA of BRAM Controller to BRAM_PORTA of Embedded memory
generator.
• Connect ‘s_axi_aclk’ of axi_bram_ctrl_0 to ‘clk_out1’ of clock_wizard_0.
19. Manually connect the reset pins of each IP as specified:
• Instantiate a new processor system reset (proc_sys_reset_1)
• Connect ‘clk_out1’ of clock_wizard_0 to ‘slowest_sync_clk’ of proc_sys_reset_1.
• Connect ‘ext_reset_in’ of proc_sys_reset_1 to ‘pl0_resetn’ of Versal CIPS.
• Connect ‘aresetn’ of smartconnect_1 to ‘interconnect_aresetn’ of
proc_sys_reset_1.
• Connect ‘peripheral_aresetn’ of proc_sys_reset_1 to ‘s_axi_aresetn’ of
axi_bram_ctrl_0.
20. In the Address Editor tab, expand Data address segment and auto assign the addresses. The
following table shows an example address editor.

21. Click Validate Block Design to validate the connections.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 51
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 6: Design Flow Steps

22. Create a top-level Vivado wrapper by right-clicking on Block Design and selecting the Create
HDL Wrapper option as shown in the following figure.

23. Add the constraints file to the project.


24. Add the constraints file (.xdc) from the board support package if available. If no constraints
file is available, several settings must be changed from their default values to enable error
free bitstream generation.
25. As discussed in the Clocking section, all primary clocks in the VDU are asynchronous to each
other. Add below clocking constraint:
• Set_clock_groups -name async_axi_lite_clk_mcu_clk -asynchronous -group clk_pl_3 -group
{clk_pl_2 clkout1_primitive}
26. Click the Run Synthesis, Run Implementation, or the Generate Bitstream option.

Constraining the Core


The necessary XDC constraints are delivered with the core generation in the AMD Vivado™
Design Suite.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 52
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 6: Design Flow Steps

Required Constraints

This section is not applicable for this IP core.

Device, Package, and Speed Grade Selections

This section is not applicable for this IP core.

Note: 4K (3840x2160) and below is supported in all speed grades and 4K DCI (4096x2160) requires -2 or
-3 speed grade.

Clock Frequencies

There is no restriction for speed grade. All speed grades support the maximum frequency of
operation.

Clock Management

This section is not applicable for this IP core.

Clock Placement

This section is not applicable for this IP core.

Banking

This section is not applicable for this IP core.

Transceiver Placement

This section is not applicable for this IP core.

I/O Standard and Placement

This section is not applicable for this IP core.

Simulation
Simulation of the H.264/H.265 Video Decode Unit not supported.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 53
Section I: H.264/H.265 Video Decode Unit Solutions v1.0
Chapter 6: Design Flow Steps

Synthesis and Implementation


For details about synthesis and implementation, see the Vivado Design Suite User Guide: Designing
with IP (UG896).

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 54
Section II: Performance and Debugging

Section II

Performance and Debugging

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 55
Section II: Performance and Debugging
Chapter 7: Decoder Latency

Chapter 7

Decoder Latency
The following figure shows the decoder latency.

Figure 10: Decoder Latency

CPB Dec Entropy Pixel Decoded


Scd
delay Init decoding decoding Picture buffer

CPB latency Dec init HW latency O/P (reordering


latency latency

X26888-070822

The overall latency of the decoder is the steady state latency, equal to the sum of the hardware
latency and the output latency. Hardware latency is the sum of the successive cancellation
decoding (SCD) latency, the entropy decoding latency, and the pixel decoding latency.
Initialization latency is the sum of the Coding picture Buffer (CPB) latency and the Decoder
Initialization (Dec Init) latency.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 56
Section III: Application Software Development

Section III

Application Software Development

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 57
Section III: Application Software Development
Chapter 8: Overview

Chapter 8

Overview
The Video Decode Unit (VDU) software stack has a layered architecture programmable at several
levels of abstraction available to software developers, as shown in the following figure. The
application interfaces from high level to low level are:

• GStreamer
• OpenMAX Integration Layer
• VDU Control Software

The GStreamer is a cross platform open source multimedia framework. GStreamer provides the
infrastructure to integrate multiple multimedia components and create pipelines. The GStreamer
framework is implemented on the OpenMAX Integration Layer API-supported GStreamer version
is 1.20.5.

The OpenMAX Integration Layer API defines a royalty free standardized media component
interface to enable developers and platform providers to integrate and communicate with
multimedia codecs implemented in hardware or software.

The VDU Control Software is the lowest level software visible to VDU application developers. All
VDU applications must use an AMD provided VDU Control Software, directly or indirectly. The
VDU Control Software includes custom kernel modules, custom user space library, and the
ctrlsw_decoder application. The OpenMAX IL (OMX) layer is integrated on top of the VDU
Control Software.

User applications can use the layer or layers of the VDU software stack that are most appropriate
to their requirements.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 58
Section III: Application Software Development
Chapter 8: Overview

Figure 11: VDU Software Stack

GStreamer

OpenMAX Integration
Layer

VDU Control Software User space

Driver kernel

IP Hardware

X26884-070822

Software Prerequisites
All of the software prerequisites for using the VDU are included in AMD PetaLinux included in
AMD Vitis™ Software development platform release.

For the AMD Linux kernel, refer to xilinx_versal_defconfig at linux-xlnx/arch/arm64/


configs/xilinx_versal_defconfig, where all the AMD driver configurations options are
enabled.

For the vanilla Linux kernel, refer to xilinx_versal_defconfig to enable and disable an AMD driver
in the Linux kernel. If the design enables or disables the AMD IP, the corresponding device-tree
node should be set to enable the driver to probe at run time kernel.

The application software using the VDU is written on top of the following libraries and modules,
shown in the following table.

Table 20: Application Software

Software Version Source


Gstreamer core library and plugins 1.20.5 https://github.com/Xilinx/
gstreamer/tree/xlnx-rebase-
v1.20.5

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 59
Section III: Application Software Development
Chapter 8: Overview

Table 20: Application Software (cont'd)

Software Version Source


OpenMAX Integration Layer API 1.1.2 https://github.com/Xilinx/
vdu-omx-il/tree/
xlnx_rel_v2023.1
VDU Control Software Library version: 0.31.0 https://github.com/Xilinx/
Application version: 1.0.68 vdu-ctrl-sw/tree/
xlnx_rel_v2023.1
VDU firmware 1.0.0 https://github.com/Xilinx/
vdu-firmware/tree/
xlnx_rel_v2023.1
VDU kernel modules https://github.com/Xilinx/
vdu-modules/tree/
xlnx_rel_v2023.1
VDU recipe files https://github.com/Xilinx/
meta-xilinx/tree/rel-v2023.1/
meta-xilinx-core/recipes-
multimedia/vdu
Gstreamer recipe files Core recipes: https://
github.com/Xilinx/poky/
tree/rel-v2023.1/meta/
recipes-multimedia/
gstreamer
bbappend files: https://
github.com/Xilinx/meta-
%20petalinux/tree/rel-
v2023.1/recipes-multimedia

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 60
Section III: Application Software Development
Chapter 9: Decoder Software Features

Chapter 9

Decoder Software Features


Table 21: Decoder Features Per Instance

Video Coding Parameter H.265 (HEVC) H.264 (AVC)


Profiles Main Baseline (Except FMO/ASO)
Main Intra Main
Main10 High
Main10 Intra High10
Main 4:2:2 10 High 4:2:2
Main 4:2:2 10 Intra High10 Intra
High 4:2:2 Intra
Levels Up to 5.1 High Tier Up to 5.2
Resolutions 4096x2160p60 with specific 4096x2160p60 with specific
device(-2,-3) device(-2,-3)
Up to 3840x2160p60 Up to 3840x2160p60
Chroma format 4:0:0, 4:2:0, 4:2:2 4:0:0, 4:2:0, 4:2:2
Bit Depth 8-bit, 10-bit 8-bit, 10-bit

GStreamer Decoding Parameters

Table 22: GStreamer Decoding Parameters

Parameter GStreamer property Description


Entropy Buffers internal-entropy-buffers Specifies decoder internal entropy
buffers, used to smooth out entropy
decoding performance. Specify values
in integer between 2 and 16. Increasing
buffering-count increases decoder
memory footprint.
Default value: 5.
Set this value to 10 for higher bit-rate
use cases. For example,uses cases
where the bitrate is more than 100
Mb/s.
Latency latency-mode Specifies decoder latency mode.
(0) : false - Normal mode
(1) : true- If alignment is AU, reduced-
latency mode and if alignment is NAL,
low-latency mode
Default value: 0.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 61
Section III: Application Software Development
Chapter 9: Decoder Software Features

Table 22: GStreamer Decoding Parameters (cont'd)

Parameter GStreamer property Description


Split-input mode split-input When enabled, the decoder has 1 to 1
mapping for input and output buffers.
When disabled, the decoder copies all
the input buffers to internal circular
buffer and processes them.
Default: FALSE
QoS Qos Drop late frames based on QOS events.
Default:FALSE
Device device Select appropriate VDU device.
device=/dev/allegroDecodeIP0 or 1, 2, 3
Default value: /dev/allegroDecadeIP0

Gstreamer and V4L2 Formats

The Gstreamer and V4L2 Formats are described in this section.

• These formats signify the memory layout of pixels. It applies at the decoder output side.
• For the decoder, you can specify the format to be used to write the pixel in memory by
specifying the corresponding Gstreamer video format using caps at decoder source pad.
• When the format is not supported between two elements, the cap negotiation fails and
Gstreamer returns an error. In that case, you can use the video convert element to perform
software conversion from one format to another.

The following table shows the GStreamer and V4L2 related formats that are supported.

Table 23: Gstreamer and V4L2 Formats

Pixel Format V4L2 GStreamer


YUV 400 8-bit V4L2_PIX_FMT_GREY GST_VIDEO_FORMAT_GRAY8
YUV 400 10-bit V4L2_PIX_FMT_XY10 GST_VIDEO_FORMAT_GRAY10_LE32
YUV 420 8-bit V4L2_PIX_FMT_NV12 GST_VIDEO_FORMAT_NV12
YUV 420 10-bit V4L2_PIX_FMT_XV15 GST_VIDEO_FORMAT_NV12_10LE32
YUV 422 8-bit V4L2_PIX_FMT_NV16 GST_VIDEO_FORMAT_NV16
YUV 422 10-bit V4L2_PIX_FMT_XV20 GST_VIDEO_FORMAT_NV16_10LE32

Note:

1. The FourCC codes are the last four characters of V4L2 pixel formats (for example, GREY/XY10).

2. Y_Only 8-bit and Y_Only 10-bit are supported only at the control software layer for the decoder.

3. For more information on YUV 444 format support, refer this article.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 62
Section III: Application Software Development
Chapter 10: Preparing PetaLinux to Run VDU Applications

Chapter 10

Preparing PetaLinux to Run VDU


Applications
1. source <path/to/petalinux-installer>/tool/petalinux-v2023.1-final/settings.sh
2. petalinux-create -t project -s <path/to/petalinux-installer>/bsp/release/xilinx-vek280-es1-
v2023.1-final.bsp
3. cd xilinx-vek280-es1-v2023.1-final
4. Petalinux-build
5. Boot the board using a PetaLinux pre-built image
6. Login with username:petalinux and password:root

Now, the GStreamer, OMX, and Control Software pipelines can be run on the board.

Below are the sample commands to decode a stream:

1. Control Software
• ctrlsw_decoder -avc -i input.avc -noyuv --device /dev/allegroDecodeIP0
2. OMX
• omx_decoder input.avc -avc --device /dev/allegroDecodeIP0 -o /dev/null
3. Gstreamer
• gst-launch-1.0 filesrc location=input.avc ! h264parse ! queue ! omxh264dec device="/dev/
allegroDecodeIP0" ! queue max-size-bytes=0 ! fakevideosink

DMA Memory Limitation


Decoder uses the DMA memory for decoding the streams. By default, CMA memory region is
used for DMA buffer allocation. But this can vary for BSP and is limited. So while decoding
multiple streams with multiple decoders having high resolutions, the system may run out of CMA
memory.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 63
Section III: Application Software Development
Chapter 10: Preparing PetaLinux to Run VDU Applications

To overcome above DMA memory limitation, reserved memory can be used. For example, let’s
say petalinux hw design has two VDU instances (/dev/allegroDecodeIP0 & /dev/
allegroDecodeIP1). One instance can use CMA memory. By modifying the design, can be able to
assign specific reserved memory region in the DDR to the second VDU instance.

The below example shows reserving memory using ddr controllers with separate 4GB aligned
base address for allegroDecodeIP1. Here allegroDecodeIP0 will use normal CMA memory region.
So with this changes both VDU instances can decoder 4k streams separately.

reserved-memory {
#address-cells = <0x2>;
#size-cells = <0x2>;
ranges;
buffer@0 {
no-map;
reg = <0x8 0x0 0x0 0x80000000>;
};

ddr1_1: myddr@50000000000 {
compatible = "shared-dma-pool";
no-map;
reg = <0x500 0x0 0x0 0x80000000>;
};
};

amba_pl@0 {
#address-cells = <0x2>;
#size-cells = <0x2>;
compatible = "simple-bus";
ranges;

vdu:vdu@a4000000 {
clock-names = "s_axi_lite_aclk", "ref_clk",
"m_axi_mcu_aclk", "m_axi_dec_aclk";
clocks = <0x11 0x3 0x41 0x12 0x12>;
compatible = "xlnx,vdu-1.0";
reset-gpios = <0x13 0x0 0x1>;
xlnx,core_clk = <0x320>;
xlnx,enable_dpll;
xlnx,mcu_clk = <0x23b>;
xlnx,ref_clk = <0x64>;
};

al5d@a4020000 {
al,devicename = "allegroDecodeIP0";
compatible = "al,al5d";
interrupt-names = "vdu_host_interrupt0";
interrupt-parent = <0x5>;
interrupts = <0x0 0x54 0x4>;
reg = <0x0 0xa4020000 0x0 0x100000>;
xlnx,vdu = <&vdu>;
};

al5d@a4120000 {
al,devicename = "allegroDecodeIP1";
compatible = "al,al5d";
interrupt-names = "vdu_host_interrupt1";
interrupt-parent = <0x5>;

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 64
Section III: Application Software Development
Chapter 10: Preparing PetaLinux to Run VDU Applications

interrupts = <0x0 0x55 0x4>;


reg = <0x0 0xa4120000 0x0 0x100000>;
memory-region = <&ddr1_1>;
xlnx,vdu = <&vdu>;
};
};

Integrating the VDU and GStreamer Patches


1. Extract PetaLinux BSP.
2. Create "recipe-multimedia" folder in project-spec/meta-user folder.
cd project-spec/meta-user
mkdir recipe-multimedia

3. For gstreamer patches, follow below steps


a. Create “gstreamer” directory in recipe-multimedia folder.
cd recipe-multimedia
mkdir gstreamer

b. There are 5 different recipes files for gstreamer that downloads the code and compile
• gstreamer1.0_%.bbappend
• gstreamer1.0-omx_%.bbappend
• gstreamer1.0-plugins-bad_%.bbappend
• gstreamer1.0-plugins-base_%.bbappend
• gstreamer1.0-plugins-good_%.bbappend
c. Depending upon patches to which gstreamer package it belongs to, bbappend file for that
package needs to be created to get those patches applied and compiled on latest source
code. For example, if patch fix is for gst-omx, follow these steps
i. Create a gstreamer1.0-omx directory in the recipe-multimedia/gstreamer
folder
cd gstreamer
mkdir gstreamer1.0-omx

ii. Copy gst-omx patches in gstreamer1.0-omx directory.


cp test1.patch recipe-multimedia/gstreamer/gstreamer1.0-omx
cp test2.patch recipe-multimedia/gstreamer/gstreamer1.0-omx

iii. Create a gstreamer1.0-omx_%.bbappend file in recipe-multimedia/


gstreamer folder.
vi gstreamer1.0-omx_%.bbappend

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 65
Section III: Application Software Development
Chapter 10: Preparing PetaLinux to Run VDU Applications

iv. Append the following lines in the gstreamer1.0-omx_%.bbappend file.


FILESEXTRAPATHS_prepend: = "${THISDIR}/ kernel-module-vdu:"
SRC_URI_append = " \
file://test1.patch \
file://test2.patch \“

Create similar bbappend files and folder for other gstreamer package to integrate any
custom patches in PetaLinux build.
4. For VDU patches, follow these steps:
a. Create a vdu directory in the recipe-multimedia folder.
cd project-spec/meta-user/recipe-multimedia
mkdir vdu

b. There are four different recipes files for VDU that downloads the code and compile
• kernel-module-vdu_%.bbappend
• vdu-firmware_%.bbappend
• libvdu-xlnx_%.bbappend
• libomxil-xlnx_%.bbappend
c. Depending upon patches to which VDU source code it belongs to, bbappend file for that
code base needs to be created to get those patches applied and compiled on latest source
code. For example, if the patch fix is for VDU drivers, follow these steps:
i. Create a kernel-module-vdu directory in the recipe-multimedia/vdu folder
cd vdu
mkdir kernel-module-vdu

ii. Copy VDU driver patches to the kernel-module-vdu directory.


cp test1.patch recipe-multimedia/vdu/kernel-module-vdu
cp test2.patch recipe-multimedia/vdu/kernel-module-vdu

iii. Create a kernel-module-vdu_%.bbappend file in the recipe-


multimedia/vdu folder:
vi kernel-module-vdu_%.bbappend

iv. Append the following lines to the kernel-module-vdu_%.bbappend file.


FILESEXTRAPATHS_prepend: = "${THISDIR}/ kernel-module-vdu:"
SRC_URI_append = " \
file://test1.patch \
file://test2.patch \“

Create similar bbappend files and folder for other VDU component to integrate any
custom patches in PetaLinux build.
5. Follow PetaLinux build steps to generate updated binaries.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 66
Section III: Application Software Development
Chapter 10: Preparing PetaLinux to Run VDU Applications

Note: If you are not compiling with PetaLinux, review the recipes for additional files necessary for
setting up GStreamer. For example, you must include the /etc/xdg/gstomx.conf in the root file system.
This file tells gst-omx where to find the OMX integration layer library - libOMX.allegro.core.so.1.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 67
Section III: Application Software Development
Chapter 11: GStreamer Pipelines

Chapter 11

GStreamer Pipelines
Examples of running GStreamer from the PetaLinux command line are as follows. To see the
description of gstreamer elements and properties used in each of them, use the gst-
inspect-1.0 command.

For example, to get description of each parameters for "omxh264dec" element, enter the
following at the command prompt:

gst-inspect-1.0 omxh264dec

H.264 Decoding
Decode H.264 based input file and display it over the monitor connected to the HDMI display.

gst-launch-1.0 filesrc location="input-file.mp4" ! qtdemux name=demux


demux.video_0 ! h264parse ! omxh264dec device=”/dev/allegroDecodeIP0” !
queue max-size-bytes=0 ! kmssink bus-id="<hdmi-bus-id>" fullscreen-overlay=1

H.265 Decoding
Decode H.265 based input file and display it over the monitor connected to the HDMI display.

gst-launch-1.0 filesrc location="input-file.mp4" ! qtdemux name=demux


demux.video_0 ! h265parse ! omxh265dec device=”/dev/allegroDecodeIP0” !
queue max-size-bytes=0 ! kmssink bus-id="<hdmi-bus-id>" fullscreen-overlay=1

Note: Input-file.mp4 can be of any of the following formats:

• 4:2:0 8-bit
• 4:2:2 8-bit
• 4:2:0 10-bit
• 4:2:2 10-bit

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 68
Section III: Application Software Development
Chapter 11: GStreamer Pipelines

High Bitrate Bitstream Decoding


To reduce frame decoding time for bitstreams greater than 100 Mb/s at 4kP30, use the following
options:

• Increase internal decoder buffers (internal-entropy-buffers parameter) to 9 or 10


• Add a queue at the decoder input side

The following command decodes an H.264 MP4 file using an increased number of internal
entropy buffers and displays it via HDMI.

gst-launch-1.0 filesrc location="input-file.mp4" ! qtdemux name=demux


demux.video_0 ! h264parse ! queue max-size-bytes=0 ! omxh264dec internal-
entropy-buffers=10 ! queue max-size-bytes=0 ! bus-id="<hdmi-bus-id>"
fullscreen-overlay=1

Multi-Stream Decoding
• Decoding with single instance: Decode the H.265 input file using four decoder elements
simultaneously and saving them to separate files. It uses the single decoder instance i.e /dev/
allegroDecodeIP0

gst-launch-1.0 filesrc location=input_1920x1080.mp4 ! qtdemux !


h265parse ! tee name=t
t. ! queue ! omxh265dec device="/dev/allegroDecodeIP0" ! queue max-size-
bytes=0 ! filesink location="output_0_1920x1080.yuv"
t. ! queue ! omxh265dec device="/dev/allegroDecodeIP0" ! queue max-size-
bytes=0 ! filesink location="output_1_1920x1080.yuv"
t. ! queue ! omxh265dec device="/dev/allegroDecodeIP0" ! queue max-size-
bytes=0 ! filesink location="output_2_1920x1080.yuv"
t. ! queue ! omxh265dec ! queue max-size-bytes=0 ! filesink
location="output_3_1920x1080.yuv"

Note: The tee element is used to feed same input file into four decoder channels; you can use separate
gst-launch-1.0 application to feed different inputs as below.

gst-launch-1.0 filesrc location=input_0_1920x1080.mp4 ! qtdemux !


h265parse ! omxh265dec device="/dev/allegroDecodeIP0" ! queue max-size-
bytes=0 ! filesink location="output_0_1920x1080.yuv" &
gst-launch-1.0 filesrc location=input_1_1920x1080.mp4 ! qtdemux !
h265parse ! omxh265dec device="/dev/allegroDecodeIP0" ! queue max-size-
bytes=0 ! filesink location="output_1_1920x1080.yuv" &
gst-launch-1.0 filesrc location=input_2_1920x1080.mp4 ! qtdemux !
h265parse ! omxh265dec device="/dev/allegroDecodeIP0" ! queue max-size-
bytes=0 ! filesink location="output_2_1920x1080.yuv" &
gst-launch-1.0 filesrc location=input_3_1920x1080.mp4 ! qtdemux !
h265parse ! omxh265dec device="/dev/allegroDecodeIP0" ! queue max-size-
bytes=0 ! filesink location="output_3_1920x1080.yuv" &

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 69
Section III: Application Software Development
Chapter 11: GStreamer Pipelines

• Decoding with multi decoder instances: Below example shows how to decode multiple
encoded streams on different decoder instances.

gst-launch-1.0 filesrc location=input_0_1920x1080.mp4 ! qtdemux !


h265parse ! omxh265dec device="/dev/allegroDecodeIP0" ! queue max-size-
bytes=0 ! filesink location="output_0_1920x1080.yuv" &
gst-launch-1.0 filesrc location=input_1_1920x1080.mp4 ! qtdemux !
h265parse ! omxh265dec device="/dev/allegroDecodeIP1" ! queue max-size-
bytes=0 ! filesink location="output_1_1920x1080.yuv" &
gst-launch-1.0 filesrc location=input_2_1920x1080.mp4 ! qtdemux !
h265parse ! omxh265dec device="/dev/allegroDecodeIP2" ! queue max-size-
bytes=0 ! filesink location="output_2_1920x1080.yuv" &
gst-launch-1.0 filesrc location=input_3_1920x1080.mp4 ! qtdemux !
h265parse ! omxh265dec device="/dev/allegroDecodeIP3" ! queue max-size-
bytes=0 ! filesink location="output_3_1920x1080.yuv" &

Verified GStreamer Elements


Table 24: Verified GStreamer Elements

Element Description
filesink Writes incoming data to a file in the local file system
filesrc Reads data from a file in the local file system
h264parse Parses a H.264 encoded stream
h265parse Parses a H.265 encoded stream
kmssink Renders video frames directly in a plane of a DRM device
omxh264dec Decodes OpenMAX H.264 video
omxh265dec Decodes OpenMAX H.265 video
qtdemux Demuxes a .mov file into raw or compressed audio and/or
video streams.
queue Queuesdata until one of the limits specified by the "max-
size-buffers", “max-size-bytes” or “max-size-
time”properties has been reached
rtph264depay Extracts an H.264 video payload from an RTP packet stream
rtph264pay Encapsulates an H.264 video in an RTP packet stream
rtph265depay Extracts an H.265 video payload from an RTP packet stream
rtph265pay Encapsulates an H.265 video in an RTP packet stream
rtpjitterbuffer Reorders and removes duplicate RTP packets as they are
received from a network source
tee Splits data to multiple pads
udpsink Sinks UDP packets to the network
udpsrc Reads UDP packets from the network
v4l2src Captures video from v4l2 devices, like webcams and
television tuner cards
rawvideoparse Converts a byte stream into video frames

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 70
Section III: Application Software Development
Chapter 11: GStreamer Pipelines

Verified Containers Using GStreamer


Table 25: Verified Containers Using GStreamer

Format AVC (Supported: Yes/No) HEVC (Supported: Yes/No)


MP4 Yes Yes
MPEG2-TS Yes Yes
MKV Yes Yes
AVI Yes No
MPEG2-PS Yes No
FLV Yes No
3GP Yes No

Verified Streaming Protocols Using GStreamer


Table 26: Verified Streaming Protocols Using GStreamer

AVC (Supported: HEVC (Supported:


Protocol Format
Yes/No) Yes/No)
RTP Yes Yes TS
UDP Yes Yes TS
TCP Yes Yes TS
HTTP Yes Yes M3U8, TS, MP4

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 71
Section III: Application Software Development
Chapter 12: OpenMax Integration Layer

Chapter 12

OpenMax Integration Layer

OpenMax Integration Layer Sample


Applications
Two sample applications built using OpenMax Integration Layer are available. The source code
for the OpenMax sample application omx_decoder are at https://github.com/Xilinx/vdu-omx-il/
tree/xlnx-rel-v2023.1/exe_omx/decoder

• H.265Decoding File to File:The omx_decoder–help command shows all the options.


omx_decoder input-file.h265 --device /dev/allegroDecodeIP0 -hevc -o
out.yuv

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 72
Section III: Application Software Development
Chapter 13: VDU Control Software

Chapter 13

VDU Control Software


The VDU Control Software operates on the frame or slice levels. Its responsibilities are:

• Parsing NAL units for decoder.


• Composing and queuing commands for each frame to the MCU Firmware.
• Retrieves status of each frame.
• Concatenates video bit stream generated by hardware and software.

Xilinx VDU Control Software API


The Xilinx VDU Control Software API is containing a decoder library in user space, which user
space applications link with to control VDU hardware.

Error Checking and Reporting


There are several error handling mechanisms in the Xilinx VDU Control Software API. The most
common mechanism is for functions to return a status value, such as a boolean or a pointer that
is NULL in the failing case.

The decoder objects store an error code to be accessed with AL_Decoder_GetFrameError.

User-defined callbacks are sometimes notified of unusual conditions by passing NULL for a
pointer that is not normally NULL or do not provide any notification but assume the callback
itself uses one of the accessor functions to retrieve the error status from a decoder object.

In unusual or unexpected circumstances, some functions may report errors directly to the
console. These are system errors, and the messages contain the messages in the following table.

Table 27: Decoder Error Messages

Error Types Description


AL_SUCCESS The operation succeeded without encountering any error
AL_ERROR Unknown error

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 73
Section III: Application Software Development
Chapter 13: VDU Control Software

Table 27: Decoder Error Messages (cont'd)

Error Types Description


AL_ERR_NO_MEMORY No memory left so couldn't allocate resource. This can be
dma memory, mcu specific memory if available or simply
virtual memory shortage.
AL_ERR_STREAM_OVERFLOW The generated stream couldn't fit inside the allocated
stream buffer
AL_ERR_TOO_MANY_SLICES If SliceSize mode is supported, the constraint couldn't be
respected as too many slices were required to respect it
AL_ERR_CHAN_CREATION_NO_CHANNEL_AVAILABLE The scheduler can't handle more channel (fixed limit of
AL_SCHEDULER_MAX_CHANNEL)
AL_ERR_CHAN_CREATION_RESOURCE_UNAVAILABLE The processing power of the available cores is insufficient to
handle this channel
AL_ERR_CHAN_CREATION_NOT_ENOUGH_CORES Couldn't spread the load on enough cores (a special case of
ERROR_RESOURCE_UNAVAILABLE) or the load can'tbe
spread so much (each core has a requirement onthe
minimal number of resources it can handle)
AL_ERR_REQUEST_MALFORMED Some parameters in the request have an invalid value
AL_ERR_CMD_NOT_ALLOWED The dynamic command is not allowed in some configuration
AL_ERR_INVALID_CMD_VALUE The value associated with the command is invalid (in the
current configuration)

There are various ways encoded bitstream can be corrupted and detecting those errors in a
compressed bitstream is complex because of the syntax element coding and parsing
dependencies. The errors are usually not detected on corrupted bit but more likely on the
following syntax elements.

For example, an encoded bitstream has scaling matrices and "scaling matrices present bit" is
corrupted in the stream. When a decoder reads this bitstream, it first assumes that there is no
scaling matrices present in the stream and goes on parsing actual scaling matrix data as next
syntax element which may cause an error. Ideally, the error was corruption of scaling matrix bit,
but the decoder is not able to detect that, and such kind of scenarios are common in video
codecs.

Refer VPS/SPS/PPS parsing function for more details on error handling and reporting: https://
github.com/Xilinx/vdu-ctrl-sw/tree/xlnx_rel_v2023.1/lib_parsing

lib_parsing/AvcParser.c and lib_parsing/HevcParser.c and check the calls to


the macro COMPLY.

In addition, monitor the AL_AVC_ParseSliceHeader and AL_HEVC_ParseSliceHeader


functions in lib_parsing/SliceHdrParsing.c and check the return false paths.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 74
Section III: Application Software Development
Chapter 13: VDU Control Software

Error Resilience
Error resilience is handled either at control software level or at hardware level. As errors are
difficult to predict, it is possible that the hardware decoder hangs in an infinite loop. In that case,
a watchdog is used to reset the decoder in a safe way to restart the decoding for the next
frames.

The hardware IP only parses the slice data part of the bitstream. All headers are parsed and
managed by the control software.

The error resilience for the headers is managed by the software and the error resilience for the
slice data is managed by the hardware.

Error Detection

At slice header level, the software can detect different kinds of errors:

• Missing slices
• Inconsistent first LCU address syntax element.

When the software detects an error, a slice conceal command is sent to the hardware IP in order
to fill the intermediate buffer. The intermediate buffer must always be fully filled so as to avoid
dec timeout.

At slice data level, the hardware can detect different kinds of errors, like inconsistencies in the
number of LCUs or in the range of various syntax elements. When an error is detected, a
concealment flag is set in the corresponding LCU data in the intermediate buffer up the last LCU
of the slice.

Error Concealment

Error concealment is performed in the reconstruction process. When a concealment flag is set in
the intermediate buffer, the reconstruction of the LCU will be done using fixed parameters:

• If there is a reference picture available, the LCU is skipped using this picture as a reference.
• If there is no reference picture, the default intra prediction mode is applied.

When errors are detected by the hardware IP, it conceals the remaining part of the slice; there is
no error code, only a single flag indicating if the slice has been concealed or not.

Decoder Hang Detection

However, the decoder should not hang even when decoding a corrupted bitstream, and it may be
difficult to guarantee that it will never happen. In such cases, a watchdog is used to soft reset the
decoder.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 75
Section III: Application Software Development
Chapter 13: VDU Control Software

Memory Management
Memory operations are indirected through function pointers. The AL_Allocator default
implementation simply wraps malloc and free, etc.

Two higher level techniques are used for memory management: reference counted buffers and
buffer pools. A reference counted buffer is created with a zero-reference count. The
AL_Buffer_Ref and AL_Buffer_Unref functions increment and decrement the reference
count, respectively. The AL_Buffer interface separates the management of buffer metadata
from the management of the data memory associated with the buffer. Usage of the reference
count is optional.

The AL_TBufPool implementation manages a buffer pool with a ring buffer. Some ring buffers
have there sizes fixed at compile time. Exceeding the buffer pool size results in undefined
behavior. See AL_Decoder_PutDisplayPicture.

API and Structure Definitions


All details for about ctrlsw structures and API are doxygen documented. It can be generated with
below commands:

• git clone https://github.com/Xilinx/vdu-ctrl-sw.git


• cd vdu-ctrl-sw/Doxygen
• ./doxygen.sh

Doxygen documents can be access by browsing the vdu-ctrl-sw/Doxygen/doc/html/


index.html

VDU Control Software Sample Application


The ctrlsw_decoder is complete sample application that decode video. This application is
intended as a learning aid for the Xilinx VDU Control Software API and for troubleshooting. The
source code for the ctrlsw_decoder application is at https://github.com/Xilinx/vdu-ctrl-sw.
The example commands for VDU control software commands are given below.

• H.264 Decoding File to File


ctrlsw_decoder --device /dev/allegroDecodeIP0 -avc -in input-avc-
file.h264 -out ouput.yuv

• H.265 Decoding File to File


ctrlsw_decoder --device /dev/allegroDecodeIP0 -hevc -in input-hevc-
file.h265 -out ouput.yuv

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 76
Section III: Application Software Development
Chapter 13: VDU Control Software

Note: For a complete list parameters, type the following in the command line:

ctrlsw_decoder --help

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 77
Section III: Application Software Development
Chapter 14: Driver

Chapter 14

Driver
There are multiple VDU modules. The VDU Init (xlnx_vdu) which is part of Linux Kernel and
which handles PL Registers such as VDU Gasket and the clocking. The other two kernel drivers
(al5d, allegro) together are the core VDU driver. The decoder driver is called al5d and the
common driver is called allegro.

The allegro driver has the following responsibilities:

• Loading the MCU firmware


• Initiating the MCU boot sequence
• Writing mailbox messages into memory shared between APU and MCU
• Providing notification of new mailbox messages.

The VDU Init driver source code is at https://github.com/Xilinx/linux-xlnx/blob/


xlnx_rebase_v4.19/drivers/soc/xilinx/

The VDU modules (allegro, al5d) source code is at: https://github.com/Xilinx/vdu-modules

All VDU driver modules (xlnx_vdu, allegro, al5d) are compiled as runtime kernel modules and are
loaded once kernel boot-up. Modules load in the following sequence.

1. The VDU Init driver is loaded (xlnx_vdu).


2. The VDU Init driver loads the Allegro modules.
3. Allegro modules are loaded in the following order:
a. allegro
b. al5d
You can use the lsmod command to verify whether the VDU modules were loaded properly.
To load the modules, use the modprobe<drivername> command and load the drivers in
the above mentioned sequence.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 78
Section III: Application Software Development
Chapter 15: MCU Firmware

Chapter 15

MCU Firmware
The MCU firmware running on the MCU has the following responsibilities:

• Transforming frame-level commands from VDU Control Software to slice level commands for
the hardware IP core.
• Configuring hardware registers for each command.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 79
Section III: Application Software Development
Chapter 16: Decoder Stack

Chapter 16

Decoder Stack
The following figure shows the decoder software stack.

Figure 12: Decoder Overview

Decoder Frames

OMX based
Test Application
application

Decoder API

Decoder Library

Driver Interface

Decoder Driver

Mailbox Interface

MCU Firmware

Scheduler

IP Control API

VDU IP

X27975-041223

Application

The application can either be test pattern generator or an OpenMAX-based application that uses
the VDU decoder.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 80
Section III: Application Software Development
Chapter 16: Decoder Stack

Decoder Library

The decoder library enables applications to communicate with the MCU firmware through the
decoder driver.

Decoder Driver

The decoder driver passes control information as well as buffer pointers of the video to the MCU
firmware. The decoder driver uses a mailbox communication technique to pass this information
to the MCU firmware.

MCU Firmware

The firmware receives control and buffer information through mailbox. Appropriate action is
taken and status is communicated back to decoder driver.

Scheduler

The scheduler, which is part of MCU firmware, programs the hardware IP, handles interrupts and
manages the multi-channel and multi-slice aspects of the decoding.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 81
Section III: Application Software Development
Chapter 17: Decoder Flow

Chapter 17

Decoder Flow
The following figure shows an example of using the Xilinx VDU Control Software API.

Figure 13: Decoder API Workflow Example

AL_Decoder_Create

Allocate Stream Buffers using


Allocate decoded buffers using
AL_Buffer_Create_And_Allocate
or AL_Buffer_Create_And_Allocate
AL_Buffer_Create or
AL_Buffer_Create
And push them in decoder using
AL_Decoder_PutDisplayPicture

Yes is STOP No
Requested

Fill one Stream Buffer

Yes pDisplayedFrame No
AL_Decoder_PushBuffer
== NULL

Yes No
Is End of Stream
Output Frame Buffer

AL_Decoder_PutDisplayPicture
AL_Decoder_Flush

Wait End of Decoding Signal End of Decoding

AL_Decoder_Destroy

Free Stream and Decode Buffers

X27974-041223

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 82
Section III: Application Software Development
Chapter 17: Decoder Flow

Decoder API
The Decoder API is defined in https://github.com/Xilinx/vdu-ctrl-sw/blob/xlnx_rel_v2023.1/
include/lib_decode The API is documented with Doxygen and it can be access by browsing the
vdu-ctrl-sw/Doxygen/doc/html/index.html.

The example application ctrlsw_decoder demonstrates how to use the API.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 83
Section IV: Appendices

Section IV

Appendices

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 84
Section IV: Appendices
Chapter 18: Codec Parameters for Different Use Cases

Chapter 18

Codec Parameters for Different Use


Cases
The parameters for different use cases are given in the following table:

Table 28: Codec Parameters

Use Case Quality Latency Mode Decoder Settings


Playback High Normal internal-entropy-buffers=5
(9-10 if target-bitrate>=100
Mb/s) low-latency=0
Stream In, Video Low Reduced Latency, Low low-latency=1
Conferencing Latency

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 85
Section IV: Appendices
Chapter 19: IRQ Balancing

Chapter 19

IRQ Balancing
Various multimedia use-cases involving video codecs such as audio/video conferencing, video-
on-demand, playback, and record use-cases also involve multiple other peripherals such as
ethernet, video capture pipeline related IPs including image sensor and image signal processing
engines, DMA engines, and display pipeline related IP like video mixers and HDMI transmitters,
which in turn use unique interrupt lines for communicating with the CPU.

In these scenarios, it becomes important to distribute the interrupt processing load across
multiple CPU cores instead of utilizing the same core for all the peripherals/IP. Distributing the
IRQ across CPU cores optimizes the latency and performance of the running use-case as the IRQ
context switching and ISR handling load gets distributed across multiple CPU cores.

Each peripheral/IP is assigned a unique interrupt number by the Linux kernel. Whenever a
peripheralor IP needs to signal something to the CPU (like it has completed a task or detected
something), it sends a hardware signal to the CPU and the kernel retrieves the associated IRQ
number and then calls the associated interrupt service routine. The IRQ numbers can be
retrieved using the following command. This command also lists the number of interrupts
processed by each core, the interrupt type, and comma-delimited list of drivers registered to
receive that interrupt.

$cat /proc/interrupts

The Versal has 2 CPU cores available. If running a plain PetaLinux image withoutany irqbalance
daemon, then by default all IRQ requests are processed by CPU 0 by the Linux scheduler. To
assign a different CPU core to process a particular IRQ number, the IRQ affinity for that
particular interrupt needs to be changed. The IRQ affinity value defines which CPU cores are
allowed to process that particular IRQ. For more information, see https://www.kernel.org/doc/
Documentation/IRQ-affinity.txt.

By default, the IRQ affinity value for each peripheral is set to0xf, which means that all four CPU
cores are allowed to process interrupt as shown in following example using the IRQ number 42.

$cat /proc/irq/42/smp_affinity
output: f

To restrict this IRQ to a CPU core n, you have to set a mask for only the nth bit. For example, if
you want to route to only CPU core 1, then set the mask for the second bit using the value 0x2.

echo 2 > /proc/irq/42/smp_affinity

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 86
Section IV: Appendices
Chapter 19: IRQ Balancing

The following section shows how IRQ balancing can be performed before running a multistream
video conferencing use-case that involves multiple peripherals and video IP.

Let’s consider we have various DMA channels to capture different video streams, which in turn
also utilize different interrupt lines as depicted by the versal-dma blocks in the following figure.

Figure 14: Default IRQ Assignment to CPU Core 0

CPU 1
CPU 0

Frame Buffer
VDU (4 decoder) Frame Buffer wr0 HDMI TX IP
wr1

HDMI RX Video Mixer IP

versal-dma1 versal-dma2 versal-dma3 versal-dma4

X27977-041223

As seen in the previous figure, all interrupt requests from different peripherals goes to CPU 0 by
default.

To distribute the interrupt requests across different CPU cores as show in the following figure,
follow these steps:

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 87
Section IV: Appendices
Chapter 19: IRQ Balancing

Figure 15: Distributed Interrupt Layout

CPU 0 CPU 1

VDU Frame buffer Frame buffer HDMI


(4 decoder) wr0 wr1 Tx

HDMI Video Mixer


Rx

Versal Versal Versal Versal


DMA1 DMA2 DMA3 DMA4

X27976-041223

1. Find the IRQ numbers for each of the above peripherals.

root@vek280:~/ # cat /proc/interrupts | grep al5


49: 1250127 47679 GICv2 127
Level a0120000.al5d
root@vek280:~/# cat /proc/interrupts | grep xilinx_frame

52: 18662 0 GICv2 122 Level


xilinx_framebuffer
53: 19170 0 interrupt-
controller@a0055000 3 Level -level xilinx_framebuffer
54: 18825 0 interrupt-
controller@a0055000 0 Level -level xilinx_framebuffer
55: 18463 0 interrupt-
controller@a0055000 1 Level -level xilinx_framebuffer
57: 0 0 GICv2 121 Level
xilinx_framebuffer

root@vek280:~/ # cat /proc/interrupts | grep xilinx-hdmi


56: 544834 0 GICv2 123 Level
xilinx-hdmi-rx
58: 86730 0 GICv2 125 Level
xilinx-hdmitxss

root@vek280:~/ # cat /proc/interrupts | grep mixer


59: 86752 0 GICv2 128 Level
xlnx-mixer

root@ vek280:~/ # cat /proct/interrupts | grep versal-dma

12: 42151036 0 GICv2 156 Level


versal-dma
13: 31494805 10644207 GICv2 157 Level
versal-dma

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 88
Section IV: Appendices
Chapter 19: IRQ Balancing

14: 31483922 0 GICv2 158 Level


versal-dma
15: 31518024 0 GICv2 159 Level
versal-dma
NOTE: Here there are multiple versal -dma interrupt lines so to check
which ones are getting, you first need to run the usecase and then check
which interrupt lines are getting utilized.

The numbers on the left are the IRQ numbers for the respective peripherals.
2. Assign CPU 0 to VDU IRQ with number 49.
echo 1 > /proc/irq/49/smp_affinity #VDU

3. Assign CPU 0 to HDMI RX and the framebuffer write IP


echo 1 > /proc/irq/52/smp_affinity #Frame buffer
echo 1 > /proc/irq/56/smp_affinity #Primary HDMI Rx

4. Assign CPU 1 to HDMI TX and Video mixer IP


echo 2 > /proc/irq/58/smp_affinity #Tx
echo 2 > /proc/irq/59/smp_affinity #Mixer

By default, the interrupts for video1 xilinx_framebuffer DMA engine and various other
peripherals are already being processed by CPU 0 so there is no need to modify the
smp_affinity for the same. Using the previous commands, the IRQ is distributed as per the
scheme mentioned in the previous figure, which can also be seen by running the following
command when the use-case is running and observing whether interrupts for the peripherals
are going to respective CPU cores as intended or not. Likewise, similar scheme of distributing
interrupts can be followed for other use-cases too depending upon the peripherals being
used, system load, and intended performance.
$ cat /proc/interrupts

5. Assign a unique CPU to each versal-dma channel if possible.


echo 1 > /proc/irq/12/smp_affinity #versal-dma1
echo 1 > /proc/irq/13/smp_affinity # versal-dma2
echo 2 > /proc/irq/14/smp_affinity # versal-dma3
echo 2 > /proc/irq/15/smp_affinity # versal-dma4

By default the interrupts for other peripherals will be processed by cpu 0 so there is no need
to modify the smp_affinity for the same. Using the preceding commands, the IRQ will get
distributed as per the scheme mentioned in which can also be seen by running the following
command when the use-case is running:

cat /proc/interrupts
12: 42151036 0 0 0 GICv2 156
Level zynqmp-dma
13: 31494805 10644207 0 0 GICv2 157
Level zynqmp-dma
14: 31483922 0 10643127 0 GICv2 158
Level zynqmp-dma
15: 31518024 0 0 10595920 GICv2 159

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 89
Section IV: Appendices
Chapter 19: IRQ Balancing

Level versal-dma
49: 1250127 47679 0 0 GICv2 127
Level a0120000.al5d, a0100000.al5e
52: 18662 0 822 0 GICv2 122
Level xilinx_framebuffer

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 90
Section IV: Appendices
Chapter 20: Debugging

Chapter 20

Debugging
This appendix includes details about resources available on the AMD Support website and
debugging tools.

If the IP requires a license key, the key must be verified. The AMD Vivado™ design tools have
several license checkpoints for gating licensed IP through the flow. If the license check succeeds,
the IP can continue generation. Otherwise, generation halts with an error. License checkpoints
are enforced by the following tools:

• Vivado Synthesis
• Vivado Implementation
• write_bitstream (Tcl command)

IMPORTANT! IP license level is ignored at checkpoints. The test confirms a valid license exists. It does not
check IP license level.

Finding Help with AMD Adaptive Computing


Solutions
To help in the design and debug process when using the core, the Support web page contains key
resources such as product documentation, release notes, answer records, information about
known issues, and links for obtaining further product support. The Community Forums are also
available where members can learn, participate, share, and ask questions about AMD Adaptive
Computing solutions.

Documentation
This product guide is the main document associated with the core. This guide, along with
documentation related to all products that aid in the design process, can be found on the Support
web page or by using the AMD Adaptive Computing Documentation Navigator. Download the
Documentation Navigator from the Downloads page. For more information about this tool and
the features available, open the online help after installation.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 91
Section IV: Appendices
Chapter 20: Debugging

Answer Records
Answer Records include information about commonly encountered problems, helpful information
on how to resolve these problems, and any known issues with an AMD Adaptive Computing
product. Answer Records are created and maintained daily to ensure that users have access to
the most accurate information available.

Answer Records for this core can be located by using the Search Support box on the main
Support web page. To maximize your search results, use keywords such as:

• Product name
• Tool message(s)
• Summary of the issue encountered

A filter search is available after results are returned to further target the results.

Master Answer Record for the Core


AR 000034162.

Technical Support
AMD Adaptive Computing provides technical support on the Community Forums for this AMD
LogiCORE™ IP product when used as described in the product documentation. AMD Adaptive
Computing cannot guarantee timing, functionality, or support if you do any of the following:

• Implement the solution in devices that are not defined in the documentation.
• Customize the solution beyond that allowed in the product documentation.
• Change any section of the design labeled DO NOT MODIFY.

To ask questions, navigate to the Community Forums.

Debug Tools
There are many tools available to address VDU design issues. It is important to know which tools
are useful for debugging various situations.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 92
Section IV: Appendices
Chapter 20: Debugging

Vivado Design Suite Debug Feature


The AMD Vivado™ Design Suite debug feature inserts logic analyzer and virtual I/O cores
directly into your design. The debug feature also allows you to set trigger conditions to capture
application and integrated block port signals in hardware. Captured signals can then be analyzed.
This feature in the Vivado IDE is used for logic debugging and validation of a design running in
AMD devices.

The Vivado logic analyzer is used to interact with the logic debug LogiCORE IP cores, including:

• ILA 2.0 (and later versions)


• VIO 2.0 (and later versions)

See the Vivado Design Suite User Guide: Programming and Debugging (UG908).

Interface Debug
AXI4-Lite Interfaces
To verify that the interface is functional, try reading from a register that does not have all 0s as
its default value. Output s_axi_arready asserts when the read address is valid, and output
s_axi_rvalid asserts when the read data/response is valid. If the interface is unresponsive,
ensure that the following conditions are met:

• The s_axi_aclk and aclk inputs are connected and toggling.


• The interface is not being held in reset, and s_axi_areset is an active-Low reset.
• The interface is enabled, and s_axi_aclken is active-High (if used).
• The main core clocks are toggling and that the enables are also asserted.

AXI4-Stream Interfaces
If data is not being transmitted or received, check the following conditions:

• If transmit <interface_name>_tready is stuck Low following the


<interface_name>_tvalid input being asserted, the core cannot send data.
• If the receive <interface_name>_tvalid is stuck Low, the core is not receiving data.
• Check that the aclk inputs are connected and toggling.
• Check that the AXI4-Stream waveforms are being followed.
• Check the core configuration.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 93
Section IV: Appendices
Chapter 21: Additional Resources and Legal Notices

Chapter 21

Additional Resources and Legal


Notices

Finding Additional Documentation


Documentation Portal

The AMD Adaptive Computing Documentation Portal is an online tool that provides robust
search and navigation for documentation using your web browser. To access the Documentation
Portal, go to https://docs.xilinx.com.

Documentation Navigator

Documentation Navigator (DocNav) is an installed tool that provides access to AMD Adaptive
Computing documents, videos, and support resources, which you can filter and search to find
information. To open DocNav:

• From the AMD Vivado™ IDE, select Help → Documentation and Tutorials.
• On Windows, click the Start button and select Xilinx Design Tools → DocNav.
• At the Linux command prompt, enter docnav.

Design Hubs

AMD Design Hubs provide links to documentation organized by design tasks and other topics,
which you can use to learn key concepts and address frequently asked questions. To access the
Design Hubs:

• In DocNav, click the Design Hubs View tab.


• Go to the Design Hubs webpage.

Note: For more information on DocNav, see the Documentation Navigator webpage.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 94
Section IV: Appendices
Chapter 21: Additional Resources and Legal Notices

Support Resources
For support resources such as Answers, Documentation, Downloads, and Forums, see Support.

References
These documents provide supplemental material useful with this guide:

1. Versal AI Core Series Data Sheet: DC and AC Switching Characteristics (DS957)


2. Versal Adaptive SoC AI Engine Register Reference (AM015)
3. Vivado Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994)
4. Vivado Design Suite User Guide: Designing with IP (UG896)
5. Vivado Design Suite User Guide: Getting Started (UG910)
6. Vivado Design Suite User Guide: Logic Simulation (UG900)
7. Versal Adaptive SoC Register Reference (AM012)
8. Versal AI Edge Series Data Sheet: DC and AC Switching Characteristics (DS958)
9. Versal ACAP AI Edge Series Product Selection Guide (XMP464)
10. Versal AI Core Series Product Selection Guide (XMP452)

Revision History
The following table shows the revision history for this document.

Section Revision Summary


05/16/2023 Version 1.0
Section III: Application Software Development Added and updated topics in this section.
07/08/2022 Version 1.0
Initial Release N/A

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 95
Section IV: Appendices
Chapter 21: Additional Resources and Legal Notices

Please Read: Important Legal Notices


The information presented in this document is for informational purposes only and may contain
technical inaccuracies, omissions, and typographical errors. The information contained herein is
subject to change and may be rendered inaccurate for many reasons, including but not limited to
product and roadmap changes, component and motherboard version changes, new model and/or
product releases, product differences between differing manufacturers, software changes, BIOS
flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities
that cannot be completely prevented or mitigated. AMD assumes no obligation to update or
otherwise correct or revise this information. However, AMD reserves the right to revise this
information and to make changes from time to time to the content hereof without obligation of
AMD to notify any person of such revisions or changes. THIS INFORMATION IS PROVIDED "AS
IS." AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE
CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES,
ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY
DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR
FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY
PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL
DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF
AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

AUTOMOTIVE APPLICATIONS DISCLAIMER

AUTOMOTIVE PRODUCTS (IDENTIFIED AS "XA" IN THE PART NUMBER) ARE NOT


WARRANTED FOR USE IN THE DEPLOYMENT OF AIRBAGS OR FOR USE IN APPLICATIONS
THAT AFFECT CONTROL OF A VEHICLE ("SAFETY APPLICATION") UNLESS THERE IS A
SAFETY CONCEPT OR REDUNDANCY FEATURE CONSISTENT WITH THE ISO 26262
AUTOMOTIVE SAFETY STANDARD ("SAFETY DESIGN"). CUSTOMER SHALL, PRIOR TO USING
OR DISTRIBUTING ANY SYSTEMS THAT INCORPORATE PRODUCTS, THOROUGHLY TEST
SUCH SYSTEMS FOR SAFETY PURPOSES. USE OF PRODUCTS IN A SAFETY APPLICATION
WITHOUT A SAFETY DESIGN IS FULLY AT THE RISK OF CUSTOMER, SUBJECT ONLY TO
APPLICABLE LAWS AND REGULATIONS GOVERNING LIMITATIONS ON PRODUCT
LIABILITY.

Copyright

© Copyright 2022-2023 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, Versal,
Vivado, and combinations thereof are trademarks of Advanced Micro Devices, Inc. AMBA, AMBA
Designer, Arm, ARM1176JZ-S, CoreSight, Cortex, PrimeCell, Mali, and MPCore are trademarks of
Arm Limited in the US and/or elsewhere. Other product names used in this publication are for
identification purposes only and may be trademarks of their respective companies.

PG414 (v1.0) May 16, 2023


Send Feedback
H.264/H.265 Video Decode Unit Solutions 96

You might also like