Skip to content

Dataloader crashes if num_worker>0 on MacOS with Python 3.8 #46648

@PiotrJander

Description

@PiotrJander

🐛 Bug

When using a DataLoader with num_workers > 0, the process crashes with a stack trace which contains

...
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
...
RuntimeError: DataLoader worker (pid 88201) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.
...

The full stack trace is here: https://gist.github.com/PiotrJander/e82ca1d94ae85e746f33a18d7ca2d492

The error does not occur when num_workers is set to 0.

The error only occurs on MacOS (specifically tested on 10.15.7) with Python 3.8 (specifically tested with Python 3.8.5). I could not reproduce it on Linux, or on MacOS with Python 3.6 or 3.7.

To Reproduce

Steps to reproduce the behavior:

Run the following script which is taken verbatim from the official documentation: https://gist.github.com/PiotrJander/be093261db295b2571e3dde952a61ff4

on MacOS with Python 3.8 in an environment containing

pytorch-lightning==1.0.0
torch==1.6.0
torchvision==0.7.0

The full stack trace is here: https://gist.github.com/PiotrJander/e82ca1d94ae85e746f33a18d7ca2d492

Expected behavior

Environment

PyTorch version: 1.6.0
Is debug build: No
CUDA used to build PyTorch: None

OS: Mac OSX 10.15.7
GCC version: Could not collect
CMake version: version 3.16.2

Python version: 3.8
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip3] numpy==1.18.5
[pip3] pytorch-lightning==1.0.0
[pip3] torch==1.6.0
[pip3] torchvision==0.7.0
[conda] Could not collect

Additional notes

This seems to be related to #5301 and #25302 but those issues were closed without providing a solution.

cc @ssnl @VitalyFedyunin @ejguan

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: dataloaderRelated to torch.utils.data.DataLoader and SamplertriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions