-
Notifications
You must be signed in to change notification settings - Fork 1.5k
CacheDataset gets stuck when initializing cache content #3962
Copy link
Copy link
Closed
Labels
enhancementNew feature or requestNew feature or request
Milestone
Description
Is your feature request related to a problem? Please describe.
Feedback from Andriy and Daguang when using NVIDIA Clara MMARs:
one peculiar issue if we mount dataset from NGC, then training gets stuck in 90% of times (both train.sh and train_multi_gpu.sh)
I believe it has something to do with SmartCache (since it’s the only MMAR that uses it) and ngc-dataset interoperability.
It says : “Loading Data: 0%” and just gets stuck, sometimes it goes to “Loading Data: 6%” and then gets stuck.
Is it something we should be worried about?
PS: If I copy data to local DGX folder first, then training runs fine.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request