Conversation
…nd of the script
|
As for failing tests, let decorate What are the problems with |
…st_multinode_distrib_cpu
|
for |
OK, it is a precision issue, let's add a tol option as done for XLA. |
|
@fco-dv have you tackled the issue concerning |
|
@sdesrozis not yet , my guess is that the process group is not destroyed at the end of |
|
@fco-dv what's the issue it is about ? |
|
@vfdev-5 when running with gpu enabled, |
Maybe, we can try to do same as here : https://github.com/pytorch/ignite/blob/master/tests/ignite/conftest.py#L104 ? |
|
Managed to fix gpu tests with : |
|
Seems ok now for: Default conf : 2 | 4 | 0 and with gpu : 2 | 1 | 1 @vfdev-5 for the CI integration would you like me to create another PR or continue on this one ? thanks! |
|
@fco-dv Thanks ! Let's merge it like that and for Circle CI, I'll enable it on PRs and let's intergrate it in another PR. |
* fix run_multinode_tests_in_docker.sh : run tests with docker python version * add missing modules * build an image with test env and add 'nnodes' 'nproc_per_node' 'gpu' as parameters * pytorch#1615 : change nproc_per_node default to 4 * pytorch#1615 : fix for gpu enabled tests / container rm step at the end of the script * add xfail decorator for tests/ignite/engine/test_deterministic.py::test_multinode_distrib_cpu * fix script gpu_options * add default tol=1e-6 for _test_distrib_compute_on_criterion * fix for "RuntimeError: trying to initialize the default process group twice!" * tolerance for test_multinode_distrib_cpu case only * fix assert None error * autopep8 fix Co-authored-by: vfdev <[email protected]> Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: fco-dv <[email protected]>
* Recall/Precision metrics for ddp : average == false and multilabel == true * For v0.4.3 - Add more versionadded, versionchanged tags - Change v0.5… (#1612) * For v0.4.3 - Add more versionadded, versionchanged tags - Change v0.5.0 to v0.4.3 * Update ignite/contrib/metrics/regression/canberra_metric.py Co-authored-by: vfdev <[email protected]> * Update ignite/contrib/metrics/regression/manhattan_distance.py Co-authored-by: vfdev <[email protected]> * Update ignite/contrib/metrics/regression/r2_score.py Co-authored-by: vfdev <[email protected]> * Update ignite/handlers/checkpoint.py Co-authored-by: vfdev <[email protected]> * address PR comments Co-authored-by: vfdev <[email protected]> * added TimeLimit handler with its test and doc (#1611) * added TimeLimit handler with its test and doc * fixed documentation * fixed docstring and formatting * flake8 fix trailing whitespace :) * modified class logger , default value and tests * changed rounding to nearest integer * tests refactored , docs modified * fixed default value , removed global logger * fixing formatting * Added versionadded * added test for engine termination Co-authored-by: vfdev <[email protected]> * Update handlers to use setup_logger (#1617) * Fixes #1614 - Updated handlers EarlyStopping and TerminateOnNan - Replaced `logging.getLogger` with `setup_logger` in the mentioned handlers * Updated `TimeLimit` handler. Replaced use of `logger.getLogger` with `setup_logger` from `ignite.utils` Co-authored-by: Pradyumna Rahul K <[email protected]> Co-authored-by: Sylvain Desroziers <[email protected]> * Managing Deprecation using decorators (#1585) * Starter code for managing deprecation * Make functions deprecated using the `@deprecated` decorator * Add arguments to the @deprecated decorator to customize it for each function * Improve `@deprecated` decorator and add tests * Replaced the `raise` keyword with added `warnings` * Added tests several possibilities of the decorator usage * Removing the test deprecation to check tests * Add static typing, fix mypy errors * Make `@deprecated` to raise Exceptions or Warning * The `@deprecated` decorator will now always emit warning unless explicitly asked to raise an Exception * Fix mypy errors * Fix mypy errors (hopefully) * Fix the test `test_deprecated_setup_any_logging` * Change the test to work with the `@deprecated` decorator * Change to snake_case, handle mypy ignores * Improve Type Annotations * Update common.py * For v0.4.3 - Add more versionadded, versionchanged tags - Change v0.5… (#1612) * For v0.4.3 - Add more versionadded, versionchanged tags - Change v0.5.0 to v0.4.3 * Update ignite/contrib/metrics/regression/canberra_metric.py Co-authored-by: vfdev <[email protected]> * Update ignite/contrib/metrics/regression/manhattan_distance.py Co-authored-by: vfdev <[email protected]> * Update ignite/contrib/metrics/regression/r2_score.py Co-authored-by: vfdev <[email protected]> * Update ignite/handlers/checkpoint.py Co-authored-by: vfdev <[email protected]> * address PR comments Co-authored-by: vfdev <[email protected]> * `version` -> version Co-authored-by: vfdev <[email protected]> Co-authored-by: François COKELAER <[email protected]> Co-authored-by: Sylvain Desroziers <[email protected]> * Create documentation.md * Distributed tests on Windows should be skipped until fixed. (#1620) * modified CONTRIBUTING.md * bash instead of sh * Added Checkpoint.get_default_score_fn (#1621) * Added Checkpoint.get_default_score_fn to simplify best_model_handler creation * Added score_sign argument * Updated docs * Update about.rst * Update pre-commit hooks and CONTRIBUTING.md (#1622) * Change pre-commit config and CONTRIBUTING.md - Update hook versions - Remove seed-isort-config - Add black profile to isort * Fix files based on new pre-commit config * Add meaningful exclusions to prettier - Also update actions workflow files to match local pre-commit * added requirements.txt and updated readme.md (#1624) * added requirements.txt and updated readme.md * Update examples/contrib/cifar10/README.md Co-authored-by: vfdev <[email protected]> * Update examples/contrib/cifar10/requirements.txt Co-authored-by: vfdev <[email protected]> Co-authored-by: vfdev <[email protected]> * Replace relative paths with raw.githubusercontent (#1629) * Updated cifar10 example (#1632) * Updates for cifar10 example * Updates for cifar10 example * More updates * Updated code * Fixed code-formatting * Fixed failling CI and typos for cifar10 examples (#1633) * Updates for cifar10 example * Updates for cifar10 example * More updates * Updated code * Fixed code-formatting * Fixed typo and failing CI * Fixed hvd spawn fail and better synced qat code * Removed temporary hack to install pth 1.7.1 (#1638) - updated default pth image for gpu tests - updated TORCH_CUDA_ARCH_LIST - fixed /merge -> /head in trigger ci pipeline * [docker] Pillow -> Pillow-SIMD (#1509) (#1639) * [docker] Pillow -> Pillow-SIMD (#1509) * [docker] Pillow -> Pillow-SIMD * replace pillow with pillow-simd in base docker files * chore(docker): apt-get autoremove after pillow-simd installation * apt-get install at once, autoremove g++ * install g++ in pillow installation layer Co-authored-by: Sylvain Desroziers <[email protected]> * Fix g++ install issue Co-authored-by: Jeff Yang <[email protected]> Co-authored-by: Sylvain Desroziers <[email protected]> * Fix multinode tests script (#1631) * fix run_multinode_tests_in_docker.sh : run tests with docker python version * add missing modules * build an image with test env and add 'nnodes' 'nproc_per_node' 'gpu' as parameters * #1615 : change nproc_per_node default to 4 * #1615 : fix for gpu enabled tests / container rm step at the end of the script * add xfail decorator for tests/ignite/engine/test_deterministic.py::test_multinode_distrib_cpu * fix script gpu_options * add default tol=1e-6 for _test_distrib_compute_on_criterion * fix for "RuntimeError: trying to initialize the default process group twice!" * tolerance for test_multinode_distrib_cpu case only * fix assert None error * autopep8 fix Co-authored-by: vfdev <[email protected]> Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: fco-dv <[email protected]> * remove warning for average=False and is_multilabel=True * update docstring and {precision, recall} tests according to test_multilabel_input_NCHW Co-authored-by: vfdev <[email protected]> Co-authored-by: Ahmed Omar <[email protected]> Co-authored-by: Pradyumna Rahul <[email protected]> Co-authored-by: Pradyumna Rahul K <[email protected]> Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: Devanshu Shah <[email protected]> Co-authored-by: Debojyoti Chakraborty <[email protected]> Co-authored-by: Jeff Yang <[email protected]> Co-authored-by: fco-dv <[email protected]>
Fixes #1627
Description: Try to fix the run_miltinode_tests_in_docker.sh
nnodes|nproc_per_node|gpudocker rmsteps at the endCheck list: