Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[MXNet] - [BERT] #14864

@araitats

Description

@araitats

dmlc/gluon-nlp#690

Description

There is a problem with a custom BERT model training with the later version of MXNet 1.5.0 (observed with cu90).
mlm_loss stops around 7.3 and nsp_acc stopps around 54.
mxnet-cu90 version which is older than 1.5.0b20190425 does not have this issue.
1.5.0b20190426 onward has this issue.

Environment info (Required)

Amazon SageMaker Notebook (ml.p3.16xlarge)
CUDA version: 9.0

Package used (Python/R/Scala/Julia):
Python 3.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions