I am using YOLOv7-Tiny on the Orin NX for inference and have tested two approaches:
Using only the GPU for inference, which achieves 238 FPS.
Using both the GPU and two DLAs for inference, where the GPU reaches 141 FPS, and each DLA achieves 42 FPS, totaling 225 FPS. However, in this case, some layers are not supported by the DLAs, causing computations to fall back to the GPU, which slows down its inference speed.
How can I achieve better performance when using both the GPU and DLAs together than when using only the GPU?
Dear @bob19,
When both GPU and DLA are used, we need to make sure the intermediate data transfers are less to boost the throughout. Ideally when we have multiple models, we recommend to run DL model entirely in GPU/DLA to avoid intermediate data transfers.
Could you share the model ONNX file?