Is your feature request related to a problem? Please describe.
When I was introducing GPU transforms and the associated ThreadDataLoader to users, got several times feedback about the num_workers arg, which is confusing that users think it means the multi-threads in ThreadDataLoader, but actually it's the multi-processing workers of PyTorch DataLoader.
Would be nice to clarify this arg and the use cases.