Skip to content

Dataloader crashes if num_worker>0 #25302

@ily-R

Description

@ily-R

If I define my dataloader as follows:

X_train = torch.tensor(X_train).to(device)
y_train = torch.tensor(y_train).to(device)
train = torch.utils.data.TensorDataset(X_train,y_train)
train_loader = torch.utils.data.DataLoader(
       train, batch_size = 16, shuffle = True,pin_memory= True,  num_workers= 4)

My training loop runs well.
If however I Implement my dataset class:

class ExampleDataset(torch.utils.data.Dataset):
    def __init__(self, root):
        self.root = root
       # some code
   
    def __getitem__(self, idx):    
       # some code
        return x , label
 
    def __len__(self):
        return len(self.labels)
dataset = ExampleDataset(root)
dataset_train = torch.utils.data.Subset(dataset, indices[:-20])
data_loader = DataLoader(
    dataset_train, batch_size=16, shuffle=True, pin_memory= True, num_workers=4)

the code crashes! I tried different choices:

  • pin_memory = False
  • non_blocking=True/False during fetching the dataset
  • CUDA 10.0 with Pytorch 1.1.# or 1.2.0 and Python 3.6.9 or 3.7

here's the error that pops out:

BrokenPipeError                           Traceback (most recent call last)
<ipython-input-10-344640e27da1> in <module>
----> 1 final_model, hist = train_model(model, dataloaders_dict, criterion, optimizer)

<ipython-input-9-fdf91f815fa7> in train_model(model, dataloaders, criterion, optimizer, num_epochs)
     23             # Iterate over data.
     24             end = time.time()
---> 25             for i, (inputs, labels) in enumerate(dataloaders[phase]):
     26                 inputs = inputs.to(device, non_blocking=True)
     27                 labels = labels.to(device , non_blocking=True)

~\Anaconda3\envs\py_gpu\lib\site-packages\torch\utils\data\dataloader.py in __iter__(self)
    276             return _SingleProcessDataLoaderIter(self)
    277         else:
--> 278             return _MultiProcessingDataLoaderIter(self)
    279 
    280     @property

~\Anaconda3\envs\py_gpu\lib\site-packages\torch\utils\data\dataloader.py in __init__(self, loader)
    680             #     before it starts, and __del__ tries to join but will get:
    681             #     AssertionError: can only join a started process.
--> 682             w.start()
    683             self.index_queues.append(index_queue)
    684             self.workers.append(w)

~\Anaconda3\envs\py_gpu\lib\multiprocessing\process.py in start(self)
    110                'daemonic processes are not allowed to have children'
    111         _cleanup()
--> 112         self._popen = self._Popen(self)
    113         self._sentinel = self._popen.sentinel
    114         # Avoid a refcycle if the target function holds an indirect

~\Anaconda3\envs\py_gpu\lib\multiprocessing\context.py in _Popen(process_obj)
    221     @staticmethod
    222     def _Popen(process_obj):
--> 223         return _default_context.get_context().Process._Popen(process_obj)
    224 
    225 class DefaultContext(BaseContext):

~\Anaconda3\envs\py_gpu\lib\multiprocessing\context.py in _Popen(process_obj)
    320         def _Popen(process_obj):
    321             from .popen_spawn_win32 import Popen
--> 322             return Popen(process_obj)
    323 
    324     class SpawnContext(BaseContext):

~\Anaconda3\envs\py_gpu\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
     87             try:
     88                 reduction.dump(prep_data, to_child)
---> 89                 reduction.dump(process_obj, to_child)
     90             finally:
     91                 set_spawning_popen(None)

~\Anaconda3\envs\py_gpu\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

BrokenPipeError: [Errno 32] Broken pipe

Thank you

cc @ssnl

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: crashProblem manifests as a hard crash, as opposed to a RuntimeErrormodule: dataloaderRelated to torch.utils.data.DataLoader and Samplermodule: multiprocessingRelated to torch.multiprocessingtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions