Skip to content

Random CI errors may related to out of memory #4330

@Nic-Ma

Description

@Nic-Ma

Is your feature request related to a problem? Please describe.
https://github.com/Project-MONAI/MONAI/runs/6567755462?check_suite_focus=true

======================================================================
ERROR: test_verify_0____w_MONAI_MONAI_tests_testing_data_metadata_json (tests.test_bundle_verify_net.TestVerifyNetwork)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/__w/MONAI/MONAI/tests/test_bundle_verify_net.py", line 43, in test_verify
    subprocess.check_call(cmd, env=test_env)
  File "/opt/conda/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['coverage', 'run', '-m', 'monai.bundle', 'verify_net_in_out', 'network_def', '--meta_file', '/__w/MONAI/MONAI/tests/testing_data/metadata.json', '--config_file', '/__w/MONAI/MONAI/tests/testing_data/inference.json', '-n', '2', '--any', '32', '--args_file', '/tmp/tmp7y1u_zw9/def_args.json', '--_meta_#network_data_format#inputs#image#spatial_shape', "[32,'*','4**p*n']"]' returned non-zero exit status 1.

======================================================================
ERROR: test_bspline (tests.test_global_mutual_information_loss.TestGlobalMutualInformationLoss)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/MONAI/MONAI/tests/test_global_mutual_information_loss.py", line 106, in test_bspline
    result = loss_fn(a2, a1).detach().cpu().numpy()
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
tests finished, printing completed times >10.0s in ascending order...

test_read_patches_cucim_0 (tests.test_masked_inference_wsi_dataset.TestMaskedInferenceWSIDataset) (10.2s)
test_script_0 (tests.test_senet.TestSENET) (10.4s)
test_invert (tests.test_invertd.TestInvertd) (10.6s)
  File "/__w/MONAI/MONAI/monai/losses/image_dissimilarity.py", line 319, in forward
    wa, pa, wb, pb = self.parzen_windowing(pred, target)  # (batch, num_sample, num_bin), (batch, 1, num_bin)
  File "/__w/MONAI/MONAI/monai/losses/image_dissimilarity.py", line 233, in parzen_windowing
    pred_weight, pred_probability = self.parzen_windowing_b_spline(pred, order=3)
  File "/__w/MONAI/MONAI/monai/losses/image_dissimilarity.py", line 283, in parzen_windowing_b_spline
    weight + (4 - 6 * sample_bin_matrix**2 + 3 * sample_bin_matrix**3) * (sample_bin_matrix < 1) / 6
RuntimeError: CUDA out of memory. Tried to allocate 736.00 MiB (GPU 0; 14.76 GiB total capacity; 5.73 GiB already allocated; 94.00 MiB free; 5.75 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Would be nice to enhance the test logic.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Relationships

None yet

Development

No branches or pull requests

Issue actions