Why the queue time is negative in Nsight System?

Hi everyone:
I use profile to mesure the cudaLaunchkernels queue latency,and the above picture shows the queue latency is negative? why? I cant understand and can anyone to explain this? Thks!

Move to “Nsight System”

So on the CPU side, the CUDA kernel is run, and it launches some CUDA work on the GPU. Because the GPU is incredibly underutilized, there is no wait. The work is actually starting on the GPU before the rest of the CPU side cleanup is finished.