-
Notifications
You must be signed in to change notification settings - Fork 27.4k
Dataloader crashes if num_worker>0 #25302
Copy link
Copy link
Closed
Labels
module: crashProblem manifests as a hard crash, as opposed to a RuntimeErrorProblem manifests as a hard crash, as opposed to a RuntimeErrormodule: dataloaderRelated to torch.utils.data.DataLoader and SamplerRelated to torch.utils.data.DataLoader and Samplermodule: multiprocessingRelated to torch.multiprocessingRelated to torch.multiprocessingtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
If I define my dataloader as follows:
X_train = torch.tensor(X_train).to(device)
y_train = torch.tensor(y_train).to(device)
train = torch.utils.data.TensorDataset(X_train,y_train)
train_loader = torch.utils.data.DataLoader(
train, batch_size = 16, shuffle = True,pin_memory= True, num_workers= 4)
My training loop runs well.
If however I Implement my dataset class:
class ExampleDataset(torch.utils.data.Dataset):
def __init__(self, root):
self.root = root
# some code
def __getitem__(self, idx):
# some code
return x , label
def __len__(self):
return len(self.labels)
dataset = ExampleDataset(root)
dataset_train = torch.utils.data.Subset(dataset, indices[:-20])
data_loader = DataLoader(
dataset_train, batch_size=16, shuffle=True, pin_memory= True, num_workers=4)
the code crashes! I tried different choices:
- pin_memory = False
- non_blocking=True/False during fetching the dataset
- CUDA 10.0 with Pytorch 1.1.# or 1.2.0 and Python 3.6.9 or 3.7
here's the error that pops out:
BrokenPipeError Traceback (most recent call last)
<ipython-input-10-344640e27da1> in <module>
----> 1 final_model, hist = train_model(model, dataloaders_dict, criterion, optimizer)
<ipython-input-9-fdf91f815fa7> in train_model(model, dataloaders, criterion, optimizer, num_epochs)
23 # Iterate over data.
24 end = time.time()
---> 25 for i, (inputs, labels) in enumerate(dataloaders[phase]):
26 inputs = inputs.to(device, non_blocking=True)
27 labels = labels.to(device , non_blocking=True)
~\Anaconda3\envs\py_gpu\lib\site-packages\torch\utils\data\dataloader.py in __iter__(self)
276 return _SingleProcessDataLoaderIter(self)
277 else:
--> 278 return _MultiProcessingDataLoaderIter(self)
279
280 @property
~\Anaconda3\envs\py_gpu\lib\site-packages\torch\utils\data\dataloader.py in __init__(self, loader)
680 # before it starts, and __del__ tries to join but will get:
681 # AssertionError: can only join a started process.
--> 682 w.start()
683 self.index_queues.append(index_queue)
684 self.workers.append(w)
~\Anaconda3\envs\py_gpu\lib\multiprocessing\process.py in start(self)
110 'daemonic processes are not allowed to have children'
111 _cleanup()
--> 112 self._popen = self._Popen(self)
113 self._sentinel = self._popen.sentinel
114 # Avoid a refcycle if the target function holds an indirect
~\Anaconda3\envs\py_gpu\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):
~\Anaconda3\envs\py_gpu\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):
~\Anaconda3\envs\py_gpu\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
87 try:
88 reduction.dump(prep_data, to_child)
---> 89 reduction.dump(process_obj, to_child)
90 finally:
91 set_spawning_popen(None)
~\Anaconda3\envs\py_gpu\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #
BrokenPipeError: [Errno 32] Broken pipe
Thank you
cc @ssnl
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
module: crashProblem manifests as a hard crash, as opposed to a RuntimeErrorProblem manifests as a hard crash, as opposed to a RuntimeErrormodule: dataloaderRelated to torch.utils.data.DataLoader and SamplerRelated to torch.utils.data.DataLoader and Samplermodule: multiprocessingRelated to torch.multiprocessingRelated to torch.multiprocessingtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module