VPI data transformations

agonzalez10 · May 26, 2025, 12:15pm

I have an image processing code written in OpenCV with CUDA. I want to optimize my code for my Jetson OrinNX module by using the NVIDIA specialized libraries.

I am trying to implement some algorithms in VPI such as the Optical Flow.

By checking the benchmarks from the oficial documentation, the algorithm from VPI is faster than the OpenCV implementation.

The problem I am encountering is that, since my original code uses OpenCV, to implement any VPI function, I need to transform the needed data from cv::cuda::GpuMat to VPIImage, which is a time consuming operation, that is, it worsens the performance of my application.

This is a general problem when I try to implement any VPI function, but particularly, with the Optical Flow there are more complications. These complications arise since the algorithm needs the data in block linear and cv::cuda::GpuMat is in pitch linear format.

Is there a way to avoid these transformations? If not, which is the best way to transform the data?

AastaLLL · May 27, 2025, 5:01am

Hi,

Dense optical flow uses OFA so the data format is expected to be block linear.
Usually, the format conversion can be done with VIC (pyramid version).

CHECK_STATUS(vpiSubmitConvertImageFormatPyramid(stream, VPI_BACKEND_VIC, prevPyrTmp, prevPyrBL, NULL));

To wrap the pre-allocated CUDA buffer, you can try vpiImageCreateWrapper which supports VPI_IMAGE_BUFFER_CUDA_PITCH_LINEAR:

Thanks.

agonzalez10 · May 30, 2025, 7:32am

Hi,
Thanks for yor response.

I tried to pre-allocate the CUDA buffer using vpiImageCreateWrapper as suggested; however, when trying to convert the format of the wrapped image (vpiSubmitConvertImageFormat), the check status functions returns the error: VPI_ERROR_INTERNAL: (NvError_NotSupported).

When I use cv::Mat instead of cv::cuda::GpuMat and vpiImageCreateWrapperOpenCVMat this problem does not appear.

Also, you suggested using the pyramid version, to do soy I followed the following logic:

Wrap the input matrices as VPIImage (I think wrapping a matrix to a VPIPyramid is not supported, but I’m not sure)
Convert the VPIImage to VPIPYramid with vpiSubmitGaussianPyramidGenerator (with this conversion I still have the data in pitch linear format and for the optical flow I need block linear format)
Convert the pyramid to block linear format with vpiSubmitConvertImageFormatPyramid

Since more conversion algorithms are needed, the execution time increases around 30ms per frame. Considering that the optical flow algorithm should not last longer than 10ms, the VPI implementation worsens the performance due to the needed data transformations. How am I suppose to improve the performance of my Orin NX if the library meant to do so need this type of data transformations?

Do you have any suggestions in how to implement some VPI algorithms in a OpenCV code without worsening the performance of the Jetson?

Thanks in advanced.

AastaLLL · June 2, 2025, 9:11am

Hi,

You can find the benchmark score in the link below:

For a 1920x1080 input, you will need to increase grid size >= 4 to meet your requirement.

Thanks.

agonzalez10 · June 9, 2025, 9:37am

Hi,
I increased the grid size, however, my main issue with performance are the needed data transformations, that is, tranforming GpuMat to Mat to create a VPIImage and then generate de VPIPyramid.

Is there a way to implement VPI algorithms in an OpenCV program without worsening performance due to the data transformations? The data transformations are needed since VPI has special data types.

patrick2 · June 12, 2025, 10:15am

Where do you get the cv::cuda::GpuMat data from?

You can try something like this to wrap the GpuMat to a VPIImage:

    cv::cuda::GpuMat d_mat;
    ....
    // fill d_mat somehow, RGA8 is used here
    ....
   // wrap to VPIImange
    VPIImageData imgData;
    memset(&imgData, 0, sizeof(imgData));
    imgData.bufferType = VPI_IMAGE_BUFFER_CUDA_PITCH_LINEAR;
    imgData.buffer.pitch.format = CV_8UC4;
    imgData.buffer.pitch.numPlanes = 1;
    imgData.buffer.pitch.planes[0].width = d_mat.cols;
    imgData.buffer.pitch.planes[0].height = d_mat.rows;
    imgData.buffer.pitch.planes[0].pitchBytes = d_mat.step;
    imgData.buffer.pitch.planes[0].data = d_mat.data;

    if (image != nullptr)
    {
         // update existing wrapper. will throw if image was not created as wrapped before
         CHECK_STATUS(vpiImageSetWrapper(image, &imgData));
    }
    else
    {
        // create new Image wrapper
        CHECK_STATUS(vpiImageCreateWrapper(&imgData, nullptr, 0, &image));
    }

system · July 2, 2025, 1:41am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Convert VPIImage to cv::cuda::GpuMat Jetson AGX Xavier vpi	5	973	October 18, 2021
VPI CUDA,OpenCV Interoperablity Jetson TX2 camera , opencv , cuda , tensorflow , ubuntu , gstreamer	2	707	August 22, 2023
Jetson-utils与vpi，opencv-cuda Jetson Orin NX camera , chinese	3	433	February 5, 2024
VPI library with Jetsons? Jetson AGX Orin vpi	9	1241	November 20, 2022
VPI vs Visionworks (September 2020) Jetson AGX Xavier vpi	6	1433	October 18, 2021
Wrap cv::cuda::GpuMat with VPIImage Jetson AGX Xavier opencv , cuda , vpi	3	1269	October 18, 2021
How to implement cv::imdecode with VPI 2.0.14? Computer Vision & Image Processing vpi , jetson	5	1318	January 30, 2023
VPIimage conversion GPU-Accelerated Libraries cuda , vpi , computer-vision-cv , jetson , image-processing	2	396	April 26, 2024
VPI pipeline with CSI camera and output render on screen Jetson Xavier NX vpi	6	2565	December 15, 2021
Optimal way to copy data between a VPI pipeline and CUDA Kernel in PyCUDA Jetson Xavier NX kernel , python , pycuda , vpi	7	1212	April 24, 2023

VPI data transformations

Related topics