MONAI dataloader does not exhibit random behaviour when using np.random

**Describe the bug**
There is a known issue with PyTorch's dataloader with >0 workers and using `np.random` to generate random numbers, described [here](https://tanelp.github.io/posts/a-bug-that-plagues-thousands-of-open-source-ml-projects/). This issue does not occur if you use `torch.randint` to generate the random numbers.

I had though MONAI's dataloader fixes this issue by setting a default `worker_init_fn`, but it seems like this isn't the case - the correct behaviour is observed when using `torch.randint` but not with `np.random`.

I understand the simple solution is to always use torch's rng, but I suspect lots of users won't know about this and will be caught out by this issue.

**To Reproduce**
Steps to reproduce the behavior:
Run this code, switching between torch/np in `__getitem__`:
```python
import numpy as np
import torch
from monai.data import Dataset, DataLoader
#from torch.utils.data import DataLoader

class RandomDataset(Dataset):
    def __getitem__(self, index):
        #return np.random.randint(0, 1000, (1,))
        return torch.randint(0, 1000, (1,))

    def __len__(self):
        return 16


dataset = RandomDataset([])
dataloader = DataLoader(dataset, batch_size=2, num_workers=4)
for epoch in range(2):
    for i, batch in enumerate(dataloader):
        print(epoch, i, batch.data.numpy().flatten().tolist())
```

**Expected behavior**
Data should be random between batches, and between epochs. This is seen using `torch`:
```python
0 0 [613, 264]
0 1 [642, 526]
0 2 [265, 338]
0 3 [988, 574]
0 4 [138, 602]
0 5 [577, 24]
0 6 [172, 986]
0 7 [902, 680]
1 0 [610, 146]
1 1 [898, 486]
1 2 [178, 408]
1 3 [679, 366]
1 4 [302, 361]
1 5 [698, 83]
1 6 [65, 102]
1 7 [615, 643]
```

but not when using `numpy`:
```python
0 0 [819, 130]
0 1 [819, 130]
0 2 [819, 130]
0 3 [819, 130]
0 4 [24, 498]
0 5 [24, 498]
0 6 [24, 498]
0 7 [24, 498]
1 0 [819, 130]
1 1 [819, 130]
1 2 [819, 130]
1 3 [819, 130]
1 4 [24, 498]
1 5 [24, 498]
1 6 [24, 498]
1 7 [24, 498]
```

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Environment**

Ensuring you use the relevant python executable, please paste the output of:

```
python -c 'import monai; monai.config.print_debug_info()'
================================
Printing MONAI config...
================================
MONAI version: 0.5.1+1.g36d855c6.dirty
Numpy version: 1.19.5
Pytorch version: 1.8.1+cu102
MONAI flags: HAS_EXT = False, USE_COMPILED = False
MONAI rev id: 36d855c690f03a44554f7017417cf7fdb12b8477

Optional dependencies:
Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION.
Nibabel version: 3.2.1
scikit-image version: NOT INSTALLED or UNKNOWN VERSION.
Pillow version: 8.2.0
Tensorboard version: 2.5.0
gdown version: NOT INSTALLED or UNKNOWN VERSION.
TorchVision version: 0.9.1+cu102
ITK version: NOT INSTALLED or UNKNOWN VERSION.
tqdm version: 4.60.0
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies


================================
Printing system config...
================================
`psutil` required for `print_system_info`

================================
Printing GPU config...
================================
Num GPUs: 3
Has CUDA: True
CUDA version: 10.2
cuDNN enabled: True
cuDNN version: 7605
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70']
GPU 0 Name: TITAN RTX
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 72
GPU 0 Total memory (GB): 23.7
GPU 0 CUDA capability (maj.min): 7.5
GPU 1 Name: Quadro RTX 8000
GPU 1 Is integrated: False
GPU 1 Is multi GPU board: False
GPU 1 Multi processor count: 72
GPU 1 Total memory (GB): 47.5
GPU 1 CUDA capability (maj.min): 7.5
GPU 2 Name: GeForce GT 1030
GPU 2 Is integrated: False
GPU 2 Is multi GPU board: False
GPU 2 Multi processor count: 3
GPU 2 Total memory (GB): 2.0
GPU 2 CUDA capability (maj.min): 6.1
```

**Additional context**
Already discussed with @ericspod who I believe has also discussed with @rijobro - thought it would be useful to have the discussion here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MONAI dataloader does not exhibit random behaviour when using np.random #2099

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MONAI dataloader does not exhibit random behaviour when using np.random #2099

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions