Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Gluon DataLoader cannot release the processes in the pool #13521

@YutingZhang

Description

@YutingZhang

https://github.com/apache/incubator-mxnet/blob/f2dcd7c7b8676b55d912997fc3f9c62c55915307/python/mxnet/gluon/data/dataloader.py#L532-L533

Logically, when a DataLoader is recycled, the _worker_pool should be recycled, and the terminate() of the _worker_pool function should be called immediately. However, it did not ...

Each time I kill a DataLoader, it leaves the worker processes dangling.
I guess it is a bug of python multiprocess.Pool. Anyway, I think we can patch it by explicitly call _worker_pool.terminate()

Minimum code to reproduce the errors.

import mxnet as mx
import numpy as np
A=np.random.rand(999, 2000)
D=mx.gluon.data.DataLoader(A, batch_size=8, num_workers=2)
the_iter = iter(D)
next(the_iter)
del the_iter
del D

I recorded a video demo for this bug: https://drive.google.com/open?id=1q4CmU_F1vAtxoZ_KUmrIEfVRk3RsQfv8

Environment: today's mxnet from pip, python3.6 on p3

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions