Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

MXNet 2.x significantly slower than 1.x in Sockeye #20636

@fhieber

Description

@fhieber

Description

We observe a significant reduction in Sockeye inference speed with a recent build of MXNet 2.x (master branch). Compared to 1.x versions of MXNet, GPU translation with MXNet 2.x is ~2x slower.

For MXNet 2.x, we migrated Sockeye to the Gluon 2.0 interface and adopted the new Numpy namespaces. Otherwise, code is equivalent to master with the same level of hybridization (static_alloc=True) in both branches. The pull request/branch can be found here: awslabs/sockeye#953.

The runs below use half-precision and run on a p3.2xlarge. Outputs are equal.

p3.2xlarge instance

batch size 64

mxnet-cu112 2.0.0b20211001:

[INFO:__main__] Processed 3003 lines. Total time: 37.2888, sec/sent: 0.0124, sent/sec: 80.5336

mxnet-cu112 1.7:

[INFO:__main__] Processed 3003 lines. Total time: 20.2805, sec/sent: 0.0068, sent/sec: 148.0735

batch size 1

mxnet-cu112 2.0.0b20211001:

[INFO:__main__] Processed 3003 lines. Total time: 858.3818, sec/sent: 0.2858, sent/sec: 3.4984

mxnet-cu112 1.7:

[INFO:__main__] Processed 3003 lines. Total time: 302.0189, sec/sent: 0.1006, sent/sec: 9.9431

g4 instance

mx18/out.1.bpe.log:[2021-10-04:20:02:32:INFO:__main__:read_and_translate] Processed 3003 lines. Total time: 316.4692, sec/sent: 0.1054, sent/sec: 9.4891
mx18/out.64.bpe.log:[2021-10-04:20:03:10:INFO:__main__:read_and_translate] Processed 3003 lines. Total time: 31.8175, sec/sent: 0.0106, sent/sec: 94.3819
mx20/out.1.bpe.log:[2021-10-04:20:17:32:INFO:__main__:read_and_translate] Processed 3003 lines. Total time: 714.5509, sec/sent: 0.2379, sent/sec: 4.2026
mx20/out.64.bpe.log:[2021-10-04:20:18:26:INFO:__main__:read_and_translate] Processed 3003 lines. Total time: 46.4607, sec/sent: 0.0155, sent/sec: 64.6352

To Reproduce

  • Download the Sockeye sample model
  • Run translate.sh with the master branch of Sockeye
  • Run translate.sh with the mx2 branch of Sockeye

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. wget https://github.com/awslabs/sockeye/releases/download/2.3.22/wmt14_en_de.tgz
  2. tar -xvf wmt14_en_de.tgz
  3. git clone https://github.com/awslabs/sockeye.git
  4. pip install -r sockeye/requirements/requirements.gpu-cu112.txt`
  5. mv sockeye/sockeye wmt_14_en_de
  6. cd wmt_14_en_de
  7. bash translate.sh [translate with master branch]
  8. git checkout mx2
  9. (Install nightly build of mx2: pip uninstall mxnet-cu112 ; pip install --pre -f https://dist.mxnet.io/python 'mxnet-cu112')
  10. bash translate.sh [translate with mx2 branch]

What have you tried to solve it?

Environment

  • Cuda 11.2 (conda install -c conda-forge nccl cudnn cudatoolkit==11.2)
  • MXNet 1.8.post0 or MXNet 1.7 vs MXNet 2.x (2.0.0b20211001)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions