Nvidia-smi GPU T.Limit/ GPU Shutdown T.Limit Temp

Hello,

The output of nvidia-smi has changed from

Example: Ref 1

 Temperature
         GPU Current Temp                  : 30 C
         GPU Shutdown Temp                 : 95 C
         GPU Slowdown Temp                 : 92 C
         GPU Max Operating Temp            : 88 C
         GPU Target Temperature            : 83 C

to

 Temperature
        GPU Current Temp                  : 39 C
        GPU T.Limit Temp                  : 50 C
        GPU Shutdown T.Limit Temp         : -7 C
        GPU Slowdown T.Limit Temp         : -2 C
        GPU Max Operating T.Limit Temp    : 0 C
        GPU Target Temperature            : 90 C
        Memory Current Temp               : N/A
        Memory Max Operating T.Limit Temp : N/A

Some other references I’ve found that have not answered this question yet. Ref 2 Ref 3

Not sure how to interpret the T.Limit values as they don’t make sense to me.

Old, but in case anyone else is tearing their hair out like I was, I think I found the answer.

nvidia-smi documentation is here: https://docs.nvidia.com/deploy/nvidia-smi/index.html

Below is a summary of my understanding of the “Temperature” part of the docs. I’m not an expert so I may be slightly off.

  • GPU T. Limit Temp is “how many degrees left before the GPU Target Temperature”. It counts down as the current temperature rises.
  • GPU Current Temp + GPU T.Limit Temp should equal GPU Target Temperature (plus or minus some rounding). At GPU T.Limit Temp = 0, you are at GPU Target Temperature.
  • The other “T.Limit Temp” values are thresholds - if GPU T.Limit Temp goes below that value, the system will take action to manage temperatures.
  • “Max Operating T.Limit” is the threshold for SOFTWARE throttling. “Slowdown T.Limit” is the threshold for HARDWARE throttling. “Shutdown T.Limit” will actually shut down the GPU to avoid damaging it.

So In your case:

  • GPU Target Temperature is 90 C. 39 + 50 = 89, the difference is probably rounding.
  • The Shutdown threshold is 90 + 7 = 97 C. At this temperature, GPU T.Limit Temp would read -7. If it reached -8 (or current temp 98 C), the core would shut down.
  • The HARDWARE Slowdown threshold is 90 + 2 = 92 C. At this temperature GPU T.Limit Temp would read -2. If it reached -3 (or current temp 93 C), the HARDWARE would throttle itself to reduce temperatures.
  • The SOFTWARE Slowdown threshold is 90 + 0 = 90 C. At this temperature GPU T.Limit Temp would read 0. If it reached -1 (or current temp 91 C), the SOFTWARE would throttle the GPU to reduce temperatures.
1 Like