Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

GPU tests are unstable #12453

@lebeg

Description

@lebeg

Description

Multiple CI jobs were failing with CUDA memory problems:

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10921/23/pipeline/

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1550/pipeline/

Message

Check failed: (err) == (cudaSuccess) Name: mxnet_generic_kernel ErrStr:an illegal memory access was encountered

Log with context

test_operator_gpu.test_countsketch ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=104987558 to reproduce.
ERROR
test_operator_gpu.test_sparse_nd_basic ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=2134146737 to reproduce.
ERROR
test_operator_gpu.test_exc_multiple_waits ... ok
test_operator_gpu.test_lstm_bidirectional ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=200476953 to reproduce.
ERROR
test_operator_gpu.test_sparse_nd_setitem ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=2082345391 to reproduce.
ERROR
test_operator_gpu.test_exc_post_fail ... ok
test_operator_gpu.test_gru_sym ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1532640391 to reproduce.
ERROR
test_operator_gpu.test_exc_mutable_var_fail ... ok
test_operator_gpu.test_sparse_nd_slice ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1828661033 to reproduce.
ERROR
test_operator_gpu.test_ndarray_elementwise ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1460065938 to reproduce.
ERROR
test_operator_gpu.test_gru_bidirectional ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=16762643 to reproduce.
ERROR
test_operator_gpu.test_ndarray_elementwisesum ... [06:59:47] src/operator/tensor/./.././../common/../operator/mxnet_op.h:622: Check failed: (err) == (cudaSuccess) Name: mxnet_generic_kernel ErrStr:an illegal memory access was encountered
/work/runtime_functions.sh: line 639:     8 Aborted                 (core dumped) nosetests-2.7 $NOSE_COVERAGE_ARGUMENTS --with-xunit --xunit-file nosetests_gpu.xml --verbose tests/python/gpu

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions