Dlprof with pytorch's distributed dataparallel

brasilino · September 28, 2022, 3:49pm

Hello,

I’m starting to profile Pytorch’s distributed dataparallel models with dlprof and I’ve noticed that it takes forever to generate the sqlite and it is huge: 22GB. I’m using a ResNet18 model with CIFAR10 dataset and only 3 epochs.

I wonder if it is a best practice to profile only one rank, say rank 0, and assume that other ranks would behave the same. Is that assumption accurate?
If not, is there any ‘best practices’ information/documentation on using dlprof with distributed dataparallel models (DDP) ?

Thanks!

tgerdes · September 28, 2022, 4:08pm

Please note that DLProf has been sunsetted for many months and is no longer supported. Please use nsight systems or the native pytorch profiler.

brasilino · September 28, 2022, 5:34pm

Too bad… I’ve used nsight systems and found a little convoluted how to get insights from it. DLProf at least could summarize many information. I also found it more useful than pytorch profiler.

Thanks for your reply!

Topic		Replies	Views
Profiling and Optimizing Deep Neural Networks with DLProf and PyProf Technical Blog	13	1571	August 11, 2021
Dlprof unable to create dlprof_dldb.sqlite Profiling Linux Targets nsight , deep-learning-profiler	1	2325	July 1, 2022
Dlprof not generating event files Profiling Linux Targets nsight	0	676	May 18, 2021
DLProf Pytorch NVTX annotations overhead Profiling Linux Targets nsight , pytorch	0	1086	September 9, 2021
Final release of dlprof Frameworks (archived)	1	1177	December 26, 2022
Error in sampling pytroch profile with nsys and dlprof Profiling Linux Targets nsight	3	2108	October 7, 2023
DLProf crash Profiling Linux Targets nsight , deep-learning-profiler	10	2139	September 1, 2021
Profiling DLRM ML training using nsight system Profiling Linux Targets	3	632	November 29, 2023
No GPU associated to the given UUID Profiling Linux Targets	6	798	July 18, 2024
Unable to capture iterations on dlprof Visual Profiler and nvprof	2	565	April 16, 2024

Dlprof with pytorch's distributed dataparallel

Related topics