Skip to content

Tests failing on A100 due to tolerance #2357

@madil90

Description

@madil90

Describe the bug
14 tests are failing on A100 due to issue with tolerance. The details of all tests and relative error are given below

To Reproduce
On a A100 GPU:

  • Setup and install MONAI
  • Run tests

** Details **
Failing tests and relative error on A100 are:

  • test_rand_3d_elastic_3 (tests.test_rand_elastic_3d.TestRand3DElastic) Mismatched elements: 1 / 8 (12.5%) Max absolute difference: 0.00179242 Max relative differenc: 0.00010684
  • test_value_cuda_0 (tests.test_lltm.TestLLTM) AssertionError: With rtol=0.0001 and atol=0.0001, found 2 element(s) (out of 8) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.00022590160369873047 (0.616874098777771 vs. 0.6171000003814697), which occurred at index (2, 1).
  • test_shape_5 (tests.test_local_normalized_cross_correlation_loss.TestLocalNormalizedCrossCorrelationLoss) Not equal to tolerance rtol=1e-05, atol=0 Mismatched elements: 1 / 1 (100%) Max absolute difference: 0. Max relative dfference: 0. ​x: array(-0.918834, dype=float32) ​y: array(-0.918672)
  • test_shape_0 (tests.test_global_mutual_information_loss.TestGlobalMutualInformationLoss) Not equal to tolerance rtol=0.0001, atol=0 Mismatched elements: 1 / 1 (100%) Max absolute difference: 0. Max relative dfference: 0. ​x: array(-1.098231, dype=float32) ​y: array(-1.098602)
  • test_shape_1 (tests.test_global_mutual_information_loss.TestGlobalMutualInformationLoss) AssertionError: Not equal to tolerance rtol=0.0001, atol=0 Mismatched elements: 1 / 1 (100%) Max absolute difference: 0. Max relative dfference: 0. ​x: array(-1.083798, dype=float32) ​y: array(-1.083999)
  • test_shape_2 (tests.test_global_mutual_information_loss.TestGlobalMutualInformationLoss) AssertionError: Not equal to tolerance rtol=0.0001, atol=0 Mismatched elements: 1 / 1 (100%) Max absolute difference: 0. Max relative dfference: 0. ​x: array(-1.083799, dype=float32) ​y: array(-1.083999)
  • test_shape_3 (tests.test_global_mutual_information_loss.TestGlobalMutualInformationLoss) AssertionError: Not equal to tolerance rtol=0.0001, atol=0 Mismatched elements: 1 / 1 (100%) Max absolute difference: 0. Max relative dfference: 0. ​x: array(-1.083798, dype=float32) ​y: array(-1.083999)
  • test_affine_transform_2d (tests.test_affine_transform.TestAffineTransform) AssertionError: Not equal to tolerance rtol=1e-07, atol=0.0001 Mismatched elements: 12 / 12 (100%) Max absolute difference: 0.002 Max relative diffrence: 3071. ​x: array([[[[0.002197, 0.499532, 0.999491, 1.49934], ​[3.866943, 1.365621, 1.86558 , 2.36523], ​[7.7323 , 3.037472, 2.731669, 3.231812]]]], dype=float32) ​y: array([[[[0.000001, 0.5 , 1. , 1.5 ], ​[3.866026, 1.366025, 1.866025, 2.36625], ​[7.732052, 3.035899, 2.732051, 3.232051]]]])
  • test_affine_transform_3d (tests.test_affine_transform.TestAffineTransform) AssertionError: Not equal to tolerance rtol=1e-07, atol=0.0001 Mismatched elements: 44 / 48 (91.7%) Max absolute difference: 0.002 Max relative differene: 31533.831 ​x: array([[[[[ 0.001892, 0.5017 ], ​[ 2.367615, 1.36743], ​[ 4.733337, .402832],... ​y: array([[[[[ 0. , 0.5 ], ​[ 2.366025, 1.36605], ​[ 4.732051, 2.401924],...
  • test_to_norm_affine_0 (tests.test_affine_transform.TestToNormAffine) AssertionError: Not equal to tolerance rtol=1e-07, atol=0.0001 Mismatched elements: 2 / 9 (22.2%) Max absolute difference: 0. Max relative diffrence: 0.001 ​x: array([[[ 1.333008, 0. , 0.33008], ​[ 0. , 0.399902, -0.60098], ​[ 0. , 0. , 1. ]]], dype=float32) ​y: array([[[ 1.333333, 0. , 0.33333], ​[ 0. , 0.4 , -0.6 ], ​[ 0. , 0. , 1. ]]])
  • test_to_norm_affine_1 (tests.test_affine_transform.TestToNormAffine) AssertionError: Not equal to tolerance rtol=1e-07, atol=0.0001 Mismatched elements: 2 / 9 (22.2%) Max absolute difference: 0. Max relative dfference: 0. ​x: array([[[ 1.25 , 0. , 0.25 ], ​[ 0. , 0.499878, -0.50244], ​[ 0. , 0. , 1. ]]], dype=float32) ​y: array([[[ 1.25, 0. , .25], ​[ 0. , 0.5 , -.5 ], ​[ 0. , 0. , 1. ]]])
  • `test_to_norm_affine_2 (tests.test_affine_transform.TestToNormAffine)
    AssertionError:
    Not equal to tolerance rtol=1e-07, atol=0.0001

Mismatched elements: 2 / 16 (12.5%)
Max absolute difference: 0.
Max relative diffrence: 0.001
​x: array([[[ 2. , 0. , 0. , 1. ],
​[ 0. , 1.333008, 0. , 0.33008],
​[ 0. , 0. , 0.399902, -0.60098],
​[ 0. , 0. , 0. , 1. ]]], dype=float32)
​y: array([[[ 2. , 0. , 0. , 1. ],
​[ 0. , 1.333333, 0. , 0.33333],
​[ 0. , 0. , 0.4 , -0.6 ],
​[ 0. , 0. , 0. , 1. ]]])`

  • `test_to_norm_affine_3 (tests.test_affine_transform.TestToNormAffine)
    AssertionError:
    Not equal to tolerance rtol=1e-07, atol=0.0001

Mismatched elements: 2 / 16 (12.5%)
Max absolute difference: 0.
Max relative dfference: 0.
​x: array([[[ 1.5 , 0. , 0. , 0.5 ],
​[ 0. , 1.25 , 0. , 0.25 ],
​[ 0. , 0. , 0.499878, -0.50244],
​[ 0. , 0. , 0. , 1. ]]], dype=float32)
​y: array([[[ 1.5 , 0. , 0. , .5 ],
​[ 0. , 1.25, 0. , .25],
​[ 0. , 0. , 0.5 , -.5 ],
​[ 0. , 0. , 0. , 1. ]]])`

  • `test_rand_2d_elastic_4 (tests.test_rand_elastic_2d.TestRand2DElastic)
    AssertionError:
    Not equal to tolerance rtol=0.0001, atol=0.0001

Mismatched elements: 2 / 12 (16.7%)
Max absolute difference: 0.001
Max relative dfference: 0.
​x: array([[[ 1.357813, 1.92286],
​[ 5.626798, 6.43219]],
...
​y: array([[[ 1.358411, 1.92131],
​[ 5.626623, 6.642721]],
...`

Expected behavior
All tests should pass

Screenshots
If applicable, add screenshots to help explain your problem.

Proposed Solution
To increase the relative and absolute tolerance of the tests.

Metadata

Metadata

Assignees

Labels

CI/CDbugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions