Problem
Tensorborad memory logging is from last PP rank (where loss is available). But from the perspective to understand peak memory, they are not intuitive. Need improvements.
Minimal repro
Expected behavior
Logging first rank PP usage.
Affected area
area:recipe
Regression?
No
Environment
No response
Logs
Problem
Tensorborad memory logging is from last PP rank (where loss is available). But from the perspective to understand peak memory, they are not intuitive. Need improvements.
Minimal repro
Expected behavior
Logging first rank PP usage.
Affected area
area:recipe
Regression?
No
Environment
No response
Logs