Skip to content

Fix cuda defined in train_params bug#6370

Merged
wyli merged 5 commits intoProject-MONAI:devfrom
heyufan1995:fix-cuda
Apr 15, 2023
Merged

Fix cuda defined in train_params bug#6370
wyli merged 5 commits intoProject-MONAI:devfrom
heyufan1995:fix-cuda

Conversation

@heyufan1995
Copy link
Copy Markdown
Member

Fixes # .
If user defined CUDA_VISIBLE_DEVICES in train_params, bundleAlgo will put that into cmd and cause error.
Pop this out before cmd and throw out a warning

Description

A few sentences describing the changes proposed in this pull request.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: heyufan1995 <[email protected]>
@mingxin-zheng mingxin-zheng requested a review from wyli April 15, 2023 03:58
@wyli
Copy link
Copy Markdown
Contributor

wyli commented Apr 15, 2023

/integration-test

@wyli
Copy link
Copy Markdown
Contributor

wyli commented Apr 15, 2023

/build

@wyli
Copy link
Copy Markdown
Contributor

wyli commented Apr 15, 2023

this still doesn't work, with the error:

2023-04-15 08:10:35,481 - INFO - Launching: OMP_NUM_THREADS=1 python /tmp/tmp5ibdlrdn/workdir/dints_0/scripts/train.py run --config_file='/tmp/tmp5ibdlrdn/workdir/dints_0/configs/network_search.yaml','/tmp/tmp5ibdlrdn/workdir/dints_0/configs/transforms_validate.yaml','/tmp/tmp5ibdlrdn/workdir/dints_0/configs/network.yaml','/tmp/tmp5ibdlrdn/workdir/dints_0/configs/transforms_train.yaml','/tmp/tmp5ibdlrdn/workdir/dints_0/configs/hyper_parameters.yaml','/tmp/tmp5ibdlrdn/workdir/dints_0/configs/hyper_parameters_search.yaml','/tmp/tmp5ibdlrdn/workdir/dints_0/configs/transforms_infer.yaml' --training#num_images_per_batch=2 --training#num_epochs=2 --training#num_epochs_per_validation=1
algo_templates.tar.gz: 72.0kB [00:00, 153kB/s]                             Traceback (most recent call last):
  File "/tmp/tmp5ibdlrdn/workdir/dints_0/scripts/train.py", line 36, in <module>
    from apex.contrib.clip_grad import clip_grad_norm_
  File "/opt/conda/lib/python3.8/site-packages/apex/__init__.py", line 10, in <module>
    from . import amp
  File "/opt/conda/lib/python3.8/site-packages/apex/amp/__init__.py", line 1, in <module>
    from .amp import init, half_function, float_function, promote_function,\
  File "/opt/conda/lib/python3.8/site-packages/apex/amp/amp.py", line 5, in <module>
    from .frontend import *
  File "/opt/conda/lib/python3.8/site-packages/apex/amp/frontend.py", line 2, in <module>
    from ._initialize import _initialize
  File "/opt/conda/lib/python3.8/site-packages/apex/amp/_initialize.py", line 2, in <module>
    from torch._six import string_classes
ModuleNotFoundError: No module named 'torch._six'

https://github.com/Project-MONAI/MONAI/actions/runs/4706795068/jobs/8348263719

@wyli
Copy link
Copy Markdown
Contributor

wyli commented Apr 15, 2023

/build

@wyli wyli merged commit b356fec into Project-MONAI:dev Apr 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants