Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

incorrect grad of gluon.nn.BatchNorm when scale=False #16297

@shesung

Description

@shesung

When using gluon.nn.BatchNorm(scale=False) on gpu, the computed grad for beta is not correct. The grad of beta seem to be accumulated between iterations.

When setting scale=True or running on cpu, it goes correctly.

This problem may make network hard to converge during trainning.

Environment info (Required)

CentOS Linux release 7.2.1511 (Core)
GTX 1080Ti
Driver Version: 384.69
CUDA Version 9.0.176

installed with pip:
numpy 1.17.2
mxnet-cu90 1.5.0

Code

In this example, the grad of beta shuold be [1, 1, 1] at each iteration.

import mxnet as mx
from mxnet import gluon, autograd

ctx = mx.gpu()
x = mx.nd.ones((1,3,1,1), ctx=ctx)

net = gluon.nn.BatchNorm(scale=False, epsilon=2e-5, momentum=0.0)
net.initialize(ctx=ctx)
trainer = gluon.Trainer(params=net.collect_params(),
                        optimizer='sgd',
                        optimizer_params={'learning_rate': 0.01, 'wd': 0.0005, 'momentum': 0.9})
net.hybridize()

for i in range(10):
    with autograd.record():
        out = net(x)
    out.backward()
    trainer.step(x.shape[0])
    for name, param in net.collect_params().items():
        if 'beta' in name:
            print(name, param.grad(ctx).asnumpy())

output:

batchnorm0_beta [1. 1. 1.]
batchnorm0_beta [2. 2. 2.]
batchnorm0_beta [3. 3. 3.]
batchnorm0_beta [4. 4. 4.]
batchnorm0_beta [5. 5. 5.]
batchnorm0_beta [6. 6. 6.]
batchnorm0_beta [7. 7. 7.]
batchnorm0_beta [8. 8. 8.]
batchnorm0_beta [9. 9. 9.]
batchnorm0_beta [10. 10. 10.]

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions