How to use DLA correctly?

,

I am using YOLOv7-Tiny on the Orin NX for inference and have tested two approaches:

  1. Using only the GPU for inference, which achieves 238 FPS.
  2. Using both the GPU and two DLAs for inference, where the GPU reaches 141 FPS, and each DLA achieves 42 FPS, totaling 225 FPS. However, in this case, some layers are not supported by the DLAs, causing computations to fall back to the GPU, which slows down its inference speed.

How can I achieve better performance when using both the GPU and DLAs together than when using only the GPU?

Dear @bob19,
When both GPU and DLA are used, we need to make sure the intermediate data transfers are less to boost the throughout. Ideally when we have multiple models, we recommend to run DL model entirely in GPU/DLA to avoid intermediate data transfers.
Could you share the model ONNX file?

Ok, I have uploaded a test model.

yolov7-tiny_384x640.zip (20.2 MB)

Dear @bob19,
I notice many layers are offloaded DLA and 3 subgraphs are offloaded to DLA per profile information using trtexec.
To avoid intermediate data transfer and push more layers to DLA, you may need to replace layers with supported layers. Please see GitHub - NVIDIA/Deep-Learning-Accelerator-SW: NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications. if it helps.
Also, Please run the below command before benchmarking deep learning use case

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Is there a model zoo where I can check which models are suitable for DLA inference?

You can find DLA compatible models at GitHub - NVIDIA/Deep-Learning-Accelerator-SW: NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.