-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Tests failing on A100 due to tolerance #2357
Description
Describe the bug
14 tests are failing on A100 due to issue with tolerance. The details of all tests and relative error are given below
To Reproduce
On a A100 GPU:
- Setup and install MONAI
- Run tests
** Details **
Failing tests and relative error on A100 are:
test_rand_3d_elastic_3 (tests.test_rand_elastic_3d.TestRand3DElastic) Mismatched elements: 1 / 8 (12.5%) Max absolute difference: 0.00179242 Max relative differenc: 0.00010684test_value_cuda_0 (tests.test_lltm.TestLLTM) AssertionError: With rtol=0.0001 and atol=0.0001, found 2 element(s) (out of 8) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.00022590160369873047 (0.616874098777771 vs. 0.6171000003814697), which occurred at index (2, 1).test_shape_5 (tests.test_local_normalized_cross_correlation_loss.TestLocalNormalizedCrossCorrelationLoss) Not equal to tolerance rtol=1e-05, atol=0 Mismatched elements: 1 / 1 (100%) Max absolute difference: 0. Max relative dfference: 0. x: array(-0.918834, dype=float32) y: array(-0.918672)test_shape_0 (tests.test_global_mutual_information_loss.TestGlobalMutualInformationLoss) Not equal to tolerance rtol=0.0001, atol=0 Mismatched elements: 1 / 1 (100%) Max absolute difference: 0. Max relative dfference: 0. x: array(-1.098231, dype=float32) y: array(-1.098602)test_shape_1 (tests.test_global_mutual_information_loss.TestGlobalMutualInformationLoss) AssertionError: Not equal to tolerance rtol=0.0001, atol=0 Mismatched elements: 1 / 1 (100%) Max absolute difference: 0. Max relative dfference: 0. x: array(-1.083798, dype=float32) y: array(-1.083999)test_shape_2 (tests.test_global_mutual_information_loss.TestGlobalMutualInformationLoss) AssertionError: Not equal to tolerance rtol=0.0001, atol=0 Mismatched elements: 1 / 1 (100%) Max absolute difference: 0. Max relative dfference: 0. x: array(-1.083799, dype=float32) y: array(-1.083999)test_shape_3 (tests.test_global_mutual_information_loss.TestGlobalMutualInformationLoss) AssertionError: Not equal to tolerance rtol=0.0001, atol=0 Mismatched elements: 1 / 1 (100%) Max absolute difference: 0. Max relative dfference: 0. x: array(-1.083798, dype=float32) y: array(-1.083999)test_affine_transform_2d (tests.test_affine_transform.TestAffineTransform) AssertionError: Not equal to tolerance rtol=1e-07, atol=0.0001 Mismatched elements: 12 / 12 (100%) Max absolute difference: 0.002 Max relative diffrence: 3071. x: array([[[[0.002197, 0.499532, 0.999491, 1.49934], [3.866943, 1.365621, 1.86558 , 2.36523], [7.7323 , 3.037472, 2.731669, 3.231812]]]], dype=float32) y: array([[[[0.000001, 0.5 , 1. , 1.5 ], [3.866026, 1.366025, 1.866025, 2.36625], [7.732052, 3.035899, 2.732051, 3.232051]]]])test_affine_transform_3d (tests.test_affine_transform.TestAffineTransform) AssertionError: Not equal to tolerance rtol=1e-07, atol=0.0001 Mismatched elements: 44 / 48 (91.7%) Max absolute difference: 0.002 Max relative differene: 31533.831 x: array([[[[[ 0.001892, 0.5017 ], [ 2.367615, 1.36743], [ 4.733337, .402832],... y: array([[[[[ 0. , 0.5 ], [ 2.366025, 1.36605], [ 4.732051, 2.401924],...test_to_norm_affine_0 (tests.test_affine_transform.TestToNormAffine) AssertionError: Not equal to tolerance rtol=1e-07, atol=0.0001 Mismatched elements: 2 / 9 (22.2%) Max absolute difference: 0. Max relative diffrence: 0.001 x: array([[[ 1.333008, 0. , 0.33008], [ 0. , 0.399902, -0.60098], [ 0. , 0. , 1. ]]], dype=float32) y: array([[[ 1.333333, 0. , 0.33333], [ 0. , 0.4 , -0.6 ], [ 0. , 0. , 1. ]]])test_to_norm_affine_1 (tests.test_affine_transform.TestToNormAffine) AssertionError: Not equal to tolerance rtol=1e-07, atol=0.0001 Mismatched elements: 2 / 9 (22.2%) Max absolute difference: 0. Max relative dfference: 0. x: array([[[ 1.25 , 0. , 0.25 ], [ 0. , 0.499878, -0.50244], [ 0. , 0. , 1. ]]], dype=float32) y: array([[[ 1.25, 0. , .25], [ 0. , 0.5 , -.5 ], [ 0. , 0. , 1. ]]])- `test_to_norm_affine_2 (tests.test_affine_transform.TestToNormAffine)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0.0001
Mismatched elements: 2 / 16 (12.5%)
Max absolute difference: 0.
Max relative diffrence: 0.001
x: array([[[ 2. , 0. , 0. , 1. ],
[ 0. , 1.333008, 0. , 0.33008],
[ 0. , 0. , 0.399902, -0.60098],
[ 0. , 0. , 0. , 1. ]]], dype=float32)
y: array([[[ 2. , 0. , 0. , 1. ],
[ 0. , 1.333333, 0. , 0.33333],
[ 0. , 0. , 0.4 , -0.6 ],
[ 0. , 0. , 0. , 1. ]]])`
- `test_to_norm_affine_3 (tests.test_affine_transform.TestToNormAffine)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0.0001
Mismatched elements: 2 / 16 (12.5%)
Max absolute difference: 0.
Max relative dfference: 0.
x: array([[[ 1.5 , 0. , 0. , 0.5 ],
[ 0. , 1.25 , 0. , 0.25 ],
[ 0. , 0. , 0.499878, -0.50244],
[ 0. , 0. , 0. , 1. ]]], dype=float32)
y: array([[[ 1.5 , 0. , 0. , .5 ],
[ 0. , 1.25, 0. , .25],
[ 0. , 0. , 0.5 , -.5 ],
[ 0. , 0. , 0. , 1. ]]])`
- `test_rand_2d_elastic_4 (tests.test_rand_elastic_2d.TestRand2DElastic)
AssertionError:
Not equal to tolerance rtol=0.0001, atol=0.0001
Mismatched elements: 2 / 12 (16.7%)
Max absolute difference: 0.001
Max relative dfference: 0.
x: array([[[ 1.357813, 1.92286],
[ 5.626798, 6.43219]],
...
y: array([[[ 1.358411, 1.92131],
[ 5.626623, 6.642721]],
...`
Expected behavior
All tests should pass
Screenshots
If applicable, add screenshots to help explain your problem.
Proposed Solution
To increase the relative and absolute tolerance of the tests.