VPI data transformations

I have an image processing code written in OpenCV with CUDA. I want to optimize my code for my Jetson OrinNX module by using the NVIDIA specialized libraries.

I am trying to implement some algorithms in VPI such as the Optical Flow.

By checking the benchmarks from the oficial documentation, the algorithm from VPI is faster than the OpenCV implementation.

The problem I am encountering is that, since my original code uses OpenCV, to implement any VPI function, I need to transform the needed data from cv::cuda::GpuMat to VPIImage, which is a time consuming operation, that is, it worsens the performance of my application.

This is a general problem when I try to implement any VPI function, but particularly, with the Optical Flow there are more complications. These complications arise since the algorithm needs the data in block linear and cv::cuda::GpuMat is in pitch linear format.

Is there a way to avoid these transformations? If not, which is the best way to transform the data?

Hi,

Dense optical flow uses OFA so the data format is expected to be block linear.
Usually, the format conversion can be done with VIC (pyramid version).

CHECK_STATUS(vpiSubmitConvertImageFormatPyramid(stream, VPI_BACKEND_VIC, prevPyrTmp, prevPyrBL, NULL));

To wrap the pre-allocated CUDA buffer, you can try vpiImageCreateWrapper which supports VPI_IMAGE_BUFFER_CUDA_PITCH_LINEAR:

Thanks.

Hi,
Thanks for yor response.

I tried to pre-allocate the CUDA buffer using vpiImageCreateWrapper as suggested; however, when trying to convert the format of the wrapped image (vpiSubmitConvertImageFormat), the check status functions returns the error: VPI_ERROR_INTERNAL: (NvError_NotSupported).

When I use cv::Mat instead of cv::cuda::GpuMat and vpiImageCreateWrapperOpenCVMat this problem does not appear.

Also, you suggested using the pyramid version, to do soy I followed the following logic:

  1. Wrap the input matrices as VPIImage (I think wrapping a matrix to a VPIPyramid is not supported, but I’m not sure)
  2. Convert the VPIImage to VPIPYramid with vpiSubmitGaussianPyramidGenerator (with this conversion I still have the data in pitch linear format and for the optical flow I need block linear format)
  3. Convert the pyramid to block linear format with vpiSubmitConvertImageFormatPyramid

Since more conversion algorithms are needed, the execution time increases around 30ms per frame. Considering that the optical flow algorithm should not last longer than 10ms, the VPI implementation worsens the performance due to the needed data transformations. How am I suppose to improve the performance of my Orin NX if the library meant to do so need this type of data transformations?

Do you have any suggestions in how to implement some VPI algorithms in a OpenCV code without worsening the performance of the Jetson?

Thanks in advanced.

Hi,

You can find the benchmark score in the link below:

For a 1920x1080 input, you will need to increase grid size >= 4 to meet your requirement.

Thanks.

Hi,
I increased the grid size, however, my main issue with performance are the needed data transformations, that is, tranforming GpuMat to Mat to create a VPIImage and then generate de VPIPyramid.

Is there a way to implement VPI algorithms in an OpenCV program without worsening performance due to the data transformations? The data transformations are needed since VPI has special data types.

Where do you get the cv::cuda::GpuMat data from?

You can try something like this to wrap the GpuMat to a VPIImage:

    cv::cuda::GpuMat d_mat;
    ....
    // fill d_mat somehow, RGA8 is used here
    ....
   // wrap to VPIImange
    VPIImageData imgData;
    memset(&imgData, 0, sizeof(imgData));
    imgData.bufferType = VPI_IMAGE_BUFFER_CUDA_PITCH_LINEAR;
    imgData.buffer.pitch.format = CV_8UC4;
    imgData.buffer.pitch.numPlanes = 1;
    imgData.buffer.pitch.planes[0].width = d_mat.cols;
    imgData.buffer.pitch.planes[0].height = d_mat.rows;
    imgData.buffer.pitch.planes[0].pitchBytes = d_mat.step;
    imgData.buffer.pitch.planes[0].data = d_mat.data;

    if (image != nullptr)
    {
         // update existing wrapper. will throw if image was not created as wrapped before
         CHECK_STATUS(vpiImageSetWrapper(image, &imgData));
    }
    else
    {
        // create new Image wrapper
        CHECK_STATUS(vpiImageCreateWrapper(&imgData, nullptr, 0, &image));
    }

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.