-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Describe the bug
When using a dataloader with num_workers > 1 the batches are constructed in parallel to speed up the data-loading. I would expect that the dataloader returns the first-available data (FIFO-queue) to make sure that it runs as fast as possible. However it seems that each process will take turns returning data which slows don't data-loading quite significantly.
Below is a minimal code example:
import torch
import math
import time
class MyIterableDataset(torch.utils.data.IterableDataset):
def __init__(self, start, end):
super(MyIterableDataset).__init__()
assert end > start, "this example code only works with end >= start"
self.start = start
self.end = end
def give_data(self, start, end):
for i in range(start, end):
if i > 10:
time.sleep(2)
yield i
def __iter__(self):
worker_info = torch.utils.data.get_worker_info()
if worker_info is None: # single-process data loading, return the full iterator
iter_start = self.start
iter_end = self.end
else: # in a worker process
# split workload
per_worker = int(math.ceil((self.end - self.start) / float(worker_info.num_workers)))
worker_id = worker_info.id
iter_start = self.start + worker_id * per_worker
iter_end = min(iter_start + per_worker, self.end)
return self.give_data(iter_start, iter_end)
if __name__ == "__main__":
ds = MyIterableDataset(start=0, end=20)
# Mult-process loading with two worker processes
for item in (torch.utils.data.DataLoader(ds, num_workers=2, batch_size=2)):
print(item)The result of this script is:
tensor([0, 1]) # Loaded fast
tensor([10, 11]) # Loaded slowly
tensor([2, 3]) # Loaded fast
tensor([12, 13]) # Loaded slowly
tensor([4, 5]) # Loaded fast
tensor([14, 15]) # Loaded slowly
tensor([6, 7]) # Loaded fast
tensor([16, 17]) # Loaded slowly
tensor([8, 9]) # Loaded fast
tensor([18, 19]) # Loaded slowlyHowever I would expect something like to be the result:
tensor([0, 1]) # Loaded fast
tensor([2, 3]) # Loaded fast
tensor([4, 5]) # Loaded fast
tensor([6, 7]) # Loaded fast
tensor([10, 11]) # Loaded slowly
tensor([8, 9]) # Loaded fast
tensor([12, 13]) # Loaded slowly
tensor([14, 15]) # Loaded slowly
tensor([16, 17]) # Loaded slowly
tensor([18, 19]) # Loaded slowlyVersions
Collecting environment information...
PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 13.3.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: version 3.24.1
Libc version: N/A
Python version: 3.11.0 (main, Nov 30 2022, 13:48:51) [Clang 13.1.6 (clang-1316.0.21.2.5)] (64-bit runtime)
Python platform: macOS-13.3.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M1 Pro
Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy==1.3.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.25.0
[pip3] pytorch-lightning==2.0.3
[pip3] torch==2.0.1
[pip3] torchmetrics==0.11.4
[conda] Could not collect