Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Low CPU usage of MXNet in subprocesses #13593

@YutingZhang

Description

@YutingZhang

MXNet has low CPU usage when running CPU operations in multiple process scenarios. Specifically, for MXNet computation in a subprocess, MxNet can use only 1 or 2 CPUs to do its job. This issue shows different behavior for different variants of MxNet (see below) and on different machines ...

This issue is critical because it slows down the multiprocess object-detection data-loading in gluoncv very significantly, making Faster-RCNN training in gluoncv unusable.

This is tested on the 20181207 version, and other versions (e.g., 1.3.1) show similar problems.

Code to reproduce the issue

Filename: mxnet_cpu_test.py

import argparse
import sys
from concurrent import futures
import time
import numpy as np
mx=None


def run(need_import):
    if need_import:
        import mxnet as mx
    else:
        global mx
    A = mx.nd.random.uniform(low=0, high=1, shape=(5000, 5000))
    while True:
        A = mx.nd.dot(A, A)

def parse_args():
    parser = argparse.ArgumentParser("benchmark mxnet cpu")
    parser.add_argument('--num-workers', '-j', dest='num_workers', type=int, default=0)
    parser.add_argument('--late-import', action='store_true')
    return parser.parse_args()

def main(args):

    if args.num_workers == 0:
        print("Main process")
        try:
            run(need_import=args.late_import)
        except KeyboardInterrupt:
            pass
    else:
        print("Subprocesses")
        ex = futures.ProcessPoolExecutor(args.num_workers)

        for _ in range(args.num_workers):
            ex.submit(run, need_import=args.late_import)
        while True:
            try:
                time.sleep(10000)
            except KeyboardInterrupt:
                ex.shutdown(wait=False)
                break
    print("Stopped")


if __name__ == "__main__":
    args = parse_args()
    if not args.late_import:
       import mxnet as mx
    main(args)

Detailed experiments:

  • Run in the main process:
    python3 mxnet_cpu_test.py --num-workers=0
    image
    Working fine for all mxnet variants (GPU or CPU-only).

  • Run in two subproceses
    -- mxnet-cu90 on p3.16x:
    python3 mxnet_cpu_test.py --num-workers=2
    image
    It uses only 2 CPUs per subprocess.
    -- mxnet-mkl on p3.16x:
    python3 mxnet_cpu_test.py --num-workers=2
    image
    Same here. It uses only 2 CPUs per subprocess.
    -- mxnet-mkl on CPU-only machine c5.18x:
    python3 mxnet_cpu_test.py --num-workers=2
    image
    Even worse. It uses only 1.5 CPUs per subprocess.
    -- However, for vanilla CPU-version mxnet on c5.18x:
    python3 mxnet_cpu_test.py --num-workers=2
    image
    It is working better. At least, it uses 5 CPUs per subprocess.
    -- Weirdly, still vanilla CPU-version mxnet but on GPU machine p3.16x:
    python3 mxnet_cpu_test.py --num-workers=2
    image
    It is working worse, i.e., 2 CPUs per subprocesses.

  • This problem seems relevant to how MXNet manage the thread per subprocess. If I do not import mxnet in the main process and instead import mxnet in each subprocess:
    python3 mxnet_cpu_test.py --num-workers=2 --late-import
    image
    Then everything is working fine.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions