Issue 1123 - Improve usage of contrib common methods with other save handlers by vfdev-5 · Pull Request #1171 · pytorch/ignite

vfdev-5 · 2020-07-01T15:03:59Z

Description:

added save_handler arg to setup_common_training_handlers
added method delegated_save_best_models_by_val_score

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

Personally, I'm not really fan of the name delegated_save_best_models_by_val_score, if you have other better ideas ...

- added save_handler arg to setup_common_training_handlers - added method delegated_save_best_models_by_val_score

sdesrozis

LGTM !

vfdev-5 · 2020-07-01T21:03:29Z

@sdesrozis @erip any other better name for delegated_save_best_models_by_val_score ?

sdesrozis · 2020-07-01T21:11:34Z

Possible to do in the same way than setup_common_training_handlers using mutual exclusion ?

No better idea atm

vfdev-5 · 2020-07-01T21:13:50Z

Possible to do in the same way than setup_common_training_handlers using mutual exclusion ?

No better idea atm

No, due to output_path is the first arg and not the last arg or kwargs

vfdev-5 · 2020-07-02T08:38:36Z

@sdesrozis how about:

any_save_best_models_by_val_score ?
save_handler_best_models_by_val_score ?
store_save_best_models_by_val_score (as we also have save_best_model_by_val_score both maybe cumbersome)
gen_save_best_models_by_val_score ?

instead of delegated_save_best_models_by_val_score

…ls_by_val_score

sdesrozis · 2020-07-02T10:27:05Z

save_handler_best_models_by_val_score sounds good to me because it’s more explicit.

But I’m not a good audience for that, I never use such helpers. It encapsulates few code, I prefer write it.

vfdev-5 · 2020-07-02T10:33:12Z

@sdesrozis i renamed the method to gen_save_best_models_by_val_score which looks a bit better.

Generally, with save_best_models_by_val_score you can gain 5-10 lines of code when doing explicitly the same thing...
Do you think we should make provide explicit code for saving best models in reproducible examples ?

sdesrozis · 2020-07-02T10:57:02Z

It’s purely a matter of taste...

erip

LGTM!

@y0ast

* Updated ImageNet example (pytorch#1138) * [WIP] Updated ImageNet example - minor fixes for Pascal VOC12 * Fixed flake8 * Updated pytorch-version-tests.yml to run cron every day at 00:00 UTC (pytorch#1141) Co-authored-by: Sylvain Desroziers <[email protected]> * Added check_compute_fn argument to EpochMetric and related metrics (pytorch#1140) * Added check_compute_fn argument to EpochMetric and related functions. * Updated docstrings * Added check_compute_fn to _BaseRegressionEpoch * Adding typing hints for check_compute_fn * Update roc_auc.py Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Docs cosmetics (pytorch#1142) * Updated docs, replaced single quote by double quote if is code - fixed missing link to Engine - cosmetics * More doc updates * More updates * Fix batch size calculation error (pytorch#1137) * Fix batch size calculation error * Add tests for fixed batch size calculation * Fix tests * Test for num_workers * Fix nproc comparison * Improve docs * Fixed docstring Co-authored-by: vfdev <[email protected]> * Docs updates (pytorch#1139) * [WIP] Added teaser gif * [WIP] Updated README * [WIP] Updated README * [WIP] Updated docs * Reverted unintended pyproject.toml edits * Updated README and examples parts * More updates of README * Added badge to check pytorch/python compatible versions * Updated README * Added ref to blog "Using Optuna to Optimize PyTorch Ignite Hyperparameters" * Update README.md * Fixed bad internal link in examples * Updated README * Fixes docs (pytorch#1147) * Fixed bad link on teaser * Added manual_seed into docs * Issue pytorch#1115 : pbar persists due to specific rule in tqdm (notebook) when n < total (pytorch#1145) * Issue pytorch#1115 pbar persists in notebook due to specific rules when n < total * close pbar doesn't rise danger bar * fix when pbar.total is None Co-authored-by: vfdev <[email protected]> Co-authored-by: Desroziers <[email protected]> * Updated codebase such that torch>=1.3 (pytorch#1150) Co-authored-by: vfdev <[email protected]> * add wandb (pytorch#1152) wandb integration already exists, just adding it to the requirements file * Fixed typo and missing part of "Where to go next" (pytorch#1151) * Fixes pytorch#1153 (pytorch#1154) - temporary downgrade of scipy to 1.4.1 instead of 1.5.0 * Use global_step as priority, if it exists (pytorch#1155) * Use global_step as priority, if it exists * Fix flake8 error * Style fix Co-authored-by: vfdev <[email protected]> * Fix TrainsSaver handling of Checkpoint's n_saved (pytorch#1135) * Utilize Trains framework callbacks to better support checkpoint saving and respect Checkpoint.n_saved * Update trains callbacks to new format * autopep8 fix * Fix trains mnist example (store checkpoints in local folder) * Use trains 0.15.1rc0 until PR is approved * Use CallbackType for Trains callback type resolution. Add unit test for Trains callbacks * Update trains version * Updated test_trains_saver_callbacks Co-authored-by: jkhenning <> Co-authored-by: vfdev <[email protected]> * Stateful handlers (pytorch#1156) * Stateful handlers * Added state_dict/load_state_dict tests for Checkpoint * integration test * Updated docstring and added include_self to ModelCheckpoint * An integreation test for checkpointing with stateful handlers * Black and flake8 Co-authored-by: vfdev-5 <[email protected]> * Fixes pytorch#1162 (pytorch#1163) * Fixes pytorch#1162 - relaxed check of optimizer type * Updated docs * Cosmetics (pytorch#1164) * update ignite version to 0.5.0 in preparation of next release. (pytorch#1158) Co-authored-by: vfdev <[email protected]> * Create FUNDING.yml * Update README.md Added "Uncertainty Estimation Using a Single Deep Deterministic Neural Network" paper by @y0ast * Issue 1124 (pytorch#1170) * Fixes pytorch#1124 - Trains logger can log torch vectors * Log vector as title=tag+key, series=str(index) * Improved namings in _XlaDistModel (pytorch#1173) * Issue 1123 - Improve usage of contrib common methods with other save handlers (pytorch#1171) * Added delegated_save_best_models_by_val_score * Fixes pytorch#1123 - added save_handler arg to setup_common_training_handlers - added method delegated_save_best_models_by_val_score * Renamed delegated_save_best_models_by_val_score to gen_save_best_models_by_val_score * Issue 1165 : nccl + torch.cuda not available (pytorch#1166) * fix issue 1165 * Update ignite/distributed/comp_models/native.py Co-authored-by: vfdev <[email protected]> * add test for nccl /wo gpu Co-authored-by: Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Fix typo in the docstring of ModelCheckpoint * Fixes failing tests with native dist comp model (pytorch#1177) - saves/restore env on init/finalize * Set isort to 4.3.21 as it fails on 5.0 (pytorch#1180) * improve docs for custom events (pytorch#1179) * ValueError -> TypeError (pytorch#1175) * ValueError -> TypeError * refactor corresponeding unit-test Co-authored-by: vfdev <[email protected]> * Update cifar10 (pytorch#1181) * Updated code to log models on Trains server * Updated cifar10 example to log necessary things to Trains * Fix Exception misuse in `ignite.contrib.handlers.base_logger.py` (pytorch#1183) * ValueError -> TypeError * NotImplementedError -> NotImplemented * rollback ignite/engine/events [raise NotImplementedError] * fix misuses of exceptions in ignite/contrib/handlers/base_logger.py * refactor corresponding unit tests Co-authored-by: Sylvain Desroziers <[email protected]> * Fixed failing cifar10 test (pytorch#1184) * Fix Exception misuse in `ignite.contrib.handlers.custom_events.py` (pytorch#1186) * ValueError -> TypeError * NotImplementedError -> NotImplemented * rollback ignite/engine/events [raise NotImplementedError] * fix misuses of exceptions in ignite/contrib/handlers/custom_events.py * remove period in exceptions * refactor corresponding unit tests * Update tpu-tests.yml * Fix Exception misuse in `ignite.contrib.engines.common.py` (pytorch#1182) * ValueError -> TypeError * NotImplementedError -> NotImplemented * fix misuses of exceptions in ignite/contrib/engines/common.py * rollback ignite/engine/events [raise NotImplementedError] Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Refactored test_utils.py into 3 files (pytorch#1185) - we can better test new coming comp models Co-authored-by: Sylvain Desroziers <[email protected]> * Fix Exception misuse in `ignite.contrib.handlers.lr_finder.py` (pytorch#1187) * ValueError -> TypeError * NotImplementedError -> NotImplemented * rollback ignite/engine/events [raise NotImplementedError] * fix misuses of exceptions in ignite/contrib/handlers/lr_finder.py * refactor corresponding unit tests * fix typo Co-authored-by: Desroziers <[email protected]> Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Fix Exception misuse in `ignite.contrib.handlers.mlflow_logger.py` (pytorch#1188) * ValueError -> TypeError * NotImplementedError -> NotImplemented * rollback ignite/engine/events [raise NotImplementedError] * fix misuses of exceptions in ignite/contrib/handlers/mlflow_logger.py & refactor corresponding unit tests Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Fix Exception misuse in `ignite.contrib.handlers.neptune_logger.py` (pytorch#1189) * ValueError -> TypeError * NotImplementedError -> NotImplemented * rollback ignite/engine/events [raise NotImplementedError] * fix misuses of exceptions in ignite/contrib/handlers/neptune_logger.py & refactor corresponding unit tests Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Update README.md (pytorch#1190) * Update README.md We are adding a disclaimer to all non-FB led repos in the PyTorch github org. Let me know if you have any concerns. Thanks! * Update README.md Co-authored-by: vfdev <[email protected]> * fix for distributed proxy sampler runtime error (pytorch#1192) * fix for distributed proxy sampler padding * fixed formatting * Updated timers to include fired hanlders' times (pytorch#1104) (pytorch#1194) * update timers including fired handlers ones * autopep8 fix * fix measurement and add test * rename fire_start_time to handlers_start_time Co-authored-by: Desroziers <[email protected]> Co-authored-by: AutoPEP8 <> * Improve pascalvoc (pytorch#1193) * Fixes pytorch#1124 - Trains logger can log torch vectors * [WIP] Fixes issue with exp_trackin - improved configs - training script * [WIP] Added explicit TrainsSaver setup * Updated training script * Fixed formatting * Fixed bad merging * Added missing rank dispatch for the progressbar * Custom filename pattern for saving checkpoints (pytorch#1127) * Custom filename pattern for saving checkpoints * The suffix check be confused when adding name initially to the dict * The filename prefix was updated which is not necessary was reverted * The default filename pattern attribute was set instead of the `_filename_pattern` * The redundant filename pattern to make filename was ugly, changed to something much more simple. * The filename pattern implementation changed to have a new way to be initialized via an additional argument. * - The extension given in the class has a dot infront of it, this can cause issues when having the latest filename pattern. have fixed it by assigning only the extension value not the dot - The docsstring was updated to latest changes - The assignment of name to filename pattern was missing * The tests for checking the checkpoint filenames when a custom filename pattern is given. * The formatting issue fixed * - Added a function to get the filename pattern for the default to make it much more readable. - Updated the current checkpoint __call__ to make filename based on the new function which has introduced - Updated test_checkpoint_filename_pattern to have the exact values instead have a function. - Updated a test case where it was failing due to the latest changes in a checkpoint __call__. * - The _get_filename_pattern function updated to public and static setup_filename_pattern - The setup_filename_pattern now takes updated arguments of with_score, with_score_name and with_global_step_transform * The dostring and the static setup_filename_pattern were updated - The docstring was updated with the filename_pattern also added a example for this as well. - The static function `setup_filename_pattern` to get the default filename pattern of a checkpoint didn't have a proper typing. Have updated accordingly - The `setup_filename_pattern` function accepted the custom filename pattern which was not required. Have updated this as well not to accept the custom filename pattern. * The tests for the static function `Checkpoint.setup_filename_pattern`. * The Docstring for setup_filename_pattern added and have updated the tests for this function. - The docstring for the function used for making the default filename pattern for checkpoints is added. - Added a new argument for filename prefix (`with_prefix`). - The tests for the update is added * Code clean up to have much more meaning to the code * Simplified the code and tests * fix quotes * Revert "fix quotes" This reverts commit 1b8d8e1. Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Docs update and auto_model change (pytorch#1197) * Fixes pytorch#1174 - Updated docs - auto_model puts params on device if they are not the device * - Updated docs * Update auto.py * Minor optimization for idist.get_* (pytorch#1196) * Minor optimization for idist.get_* * Set overhead threshold to 1.9 * Keep only test_idist_methods_overhead_nccl * Removed _sync_model_wrapper to implicitly check if we need to sync model This also reduces time of idist.get_* method calls vs native calls * Update test_native.py * autopep8 fix * Update test_native.py Co-authored-by: AutoPEP8 <> Co-authored-by: Sylvain Desroziers <[email protected]> * Propagate spawn kwargs from parallel to model's spawn (pytorch#1201) * Fixes pytorch#1199 - Updated code to propagate spawn kwargs - start_method is fork by default * Fixed bad syntax * Fixes pytorch#1198 - bug with CM in PascalVOC example (pytorch#1200) * Fixes pytorch#1198 - put CM to cpu before converting to numpy - removed manual recall computation, put into CM definition * Explicit CM compute by all proc and logging by 0 rank proc * Added link to Discuss.PyTorch forum (pytorch#1205) - Updated readme and FAQ * Fixed wrong IoU computation in Pascal VOC (pytorch#1204) * Fixed wrong IoU computation * use black to fix lint check error * Updated training code: - added custom_event_filter to log images less frequently - split events to avoid running validation twice in the end of the training * Fixed formatting Co-authored-by: Desroziers <[email protected]> * Fix Typo in `ignite.handlers.timing` (pytorch#1208) * ValueError -> TypeError * NotImplementedError -> NotImplemented * rollback ignite/engine/events [raise NotImplementedError] * fix misuses of exceptions in ignite/contrib/handlers/custom_events.py * remove period in exceptions * refactor corresponding unit tests * fix typo in ignite/handlers/timing.py * Fixes issue with logging XLA tensors (pytorch#1207) * [WIP] fixed typing * Fixes pytorch#1136 - fixed problem when all_reduce does not put result tensor to original device * REFACTOR: Early Return Pattern (if elif else -> if if return) (pytorch#1211) * Issue 1133 - Fixes flaky Visdom tests (pytorch#1149) * [WIP] inspect bug * Attempt to fix flaky Visdom tests * autopep8 fix Co-authored-by: vfdev-5 <[email protected]> Co-authored-by: AutoPEP8 <> * Updated about page * Replaced teaser code by a notebook runnable in Colab (pytorch#1216) * Replaced teaser code by a notebook runnable in Colab * Updated teaser (py, ipynb) * Added support of Horovod (pytorch#1195) * [WIP] Horovod comp model * [WIP] Horovod comp model - Implemented spawn - Added comp model tests * Refactored test_utils.py into 3 files - we can better test new coming comp models * [WIP] Run horovod tests * [WIP] Horovod comp model + tests * autopep8 fix * [WIP] More tests * Updated utils tests * autopep8 fix * [WIP] more tests * Updated tests and code and cifar10 example * autopep8 fix * Fixed failing CI and updated code * autopep8 fix * Fixes failing test * Fixed bug with new/old hvd API and the config * Added metric tests * Formatting and docs updated * Updated frequency test * Fixed formatting and a typo in idist.model_name docs * Fixed failing test * Docs updates and updated auto methods according to horovod API * autopep8 fix * Cosmetics Co-authored-by: AutoPEP8 <> * metrics: add SSIM (pytorch#1217) * metrics: add SSIM * add scikit-image dependency * add distributed tests, fix docstring * .gitignore back to normal * Update ignite/metrics/ssim.py Co-authored-by: vfdev <[email protected]> * .format(), separate functions * scalar input for kernel, sigma, fix py3.5 CI * apply suggestions * some fixes * fixed tpu tests * Minor code cosmetrics and raised err tolerance in tests * used list comprehension convolution, fixed tests * added uniform kernel, change tolerance, various image size tests * Update ignite/metrics/ssim.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/ssim.py Co-authored-by: vfdev <[email protected]> * Fix flake8 Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * add the EpochOutputStore with tests (pytorch#1226) * add the EpochOutputStore with tests * add correct import and unify the test cases * fix checks from flake8 and isort Co-authored-by: Zhiliang@siemens <[email protected]> * add horovod test (pytorch#1230) (pytorch#1231) Co-authored-by: Jeff Yang <[email protected]> * Update README.md * Added idist.broadcast (pytorch#1237) * [WIP] Added idist.broadcast * Removed unused code * Added tests to increase coverage * Docker for users pytorch#1214 (pytorch#1218) * Docker for users pytorch#1214 - prebuilt docker image handling Ignite examples configuration * Docker for users pytorch#1214 - more complete basic image based on pytorch 1.5.1-cuda10.1-cudnn7-devel - with apex, opencv setups and pascal_voc2012 requirements _ container running with non-privileged user * Docker for users pytorch#1214 - improve Dockerfiles for vision and apex-vision (TORCH_CUDA_ARCH_LIST as argument) - propose apex-vision with multi-stage build * Docker for users pytorch#1214 - Dockerfiles for nlp and vision tasks with their apex version - user as root, Ignite examples added * Update README.md Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * [BC-breaking] NotImplementedError -> NotImplemented (pytorch#1178) * NotImplementedError -> NotImplemented * returning NotImplemented, instead of raising it * make type restriction inside & add corresponding tests * autopep8 fix * remove extra spaces * Updates according to the review * Fixed unsupported f-string in 3.5 - added more tests * Updated docs and tests Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: AutoPEP8 <> Co-authored-by: vfdev-5 <[email protected]> * Allow passing keyword arguments to save function on checkpoint. (pytorch#1245) * Allow passing keyword arguments to save function on checkpoint. * Change Docstring * Add tests for keywords to DiskSaver * autopep8 fix * Use pytest.raises instead of xfail. Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: AutoPEP8 <> * Docs updates and fix of black version (pytorch#1250) * Update governance.rst * Fix Exception misuse in `ignite.contrib.handlers.param_scheduler.py` (pytorch#1206) * ValueError -> TypeError * NotImplementedError -> NotImplemented * rollback ignite/engine/events [raise NotImplementedError] * fix misuses of exceptions in ignite/contrib/handlers/custom_events.py * remove period in exceptions * refactor corresponding unit tests * fix misuses of exceptions in ignite/contrib/handlers/param_scheduler.py & refactor corresponding unit tests * fix misuses of exceptions in ignite/contrib/handlers/param_scheduler.py & refactor corresponding unit tests (stricter: list/tuple -> TypeError & item of list/tuple -> ValueError) * autopep8 fix * remove extra spaces * autopep8 fix * add matches to pytest.raises * add match to pytest.raises * autopep8 fix * add missing tests * autopep8 fix * Update param_scheduler.py * revert previous modification Co-authored-by: AutoPEP8 <> Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Issue pytorch#1247 (pytorch#1252) * Delete test_custom_events.py * Delete custom_events.py * Removing depriciated CustomPeriodicEvent * Remove deprecated CustomPeriodicEvent * Update test_tqdm_logger.py * Remove deprecated CustomPeriodicEvent * Update test_tqdm_logger.py Adding needed space. * Removing CustomPeriodicEvent * Update handlers.rst * [WIP] Update readme for docker (pytorch#1254) * [WIP] Update readme for docker * Update README.md Co-authored-by: vfdev <[email protected]> * Update README.md Co-authored-by: vfdev <[email protected]> * [WIP] Update readme for docker - fix rendering * [WIP] Update readme for docker - add DockerHub Ignite repo link and images list * Updated readme Co-authored-by: vfdev <[email protected]> * Update README.md * Update index.rst * Update common.py * Update CONTRIBUTING.md * [WIP] Added sync_bn to auto_model with tests (pytorch#1265) * Added dist support for EpochMetric and other similar metrics (pytorch#1229) * [WIP] Added dist support for EpochMetric with tests * Updated docs * [WIP] Added idist.broadcast * Removed unused code * [WIP] Updated code * Code and test updates * autopep8 fix * Replaced XLA unsupported type() method by attribute .dtype * Updated code Co-authored-by: AutoPEP8 <> * Fixes pytorch#1258 (pytorch#1268) - Replaced mp.spawn by mp.start_processes for native comp model * Updated CONTRIBUTING.md (pytorch#1275) * Updatd CONTRIBUTING.md * Update CONTRIBUTING.md * Rename Epoch to Iterations when using epoch_length with max_epochs=1 (pytorch#1279) * Set default description as none * Add test for description with max_epochs set to 1 * Change default description to use iterations when max_epochs=1 * Correct test_pbar_with_max_epochs_set_to_one * Modify tests to reflect change from epochs to iterations * Use engine.state.max_epochs instead of engine.state_dict() * Change Iterations to Iteration * Correct tests * Update progress bar docstring * Update tqdm_logger.py Co-authored-by: vfdev <[email protected]> * Update README.md * [BC-breaking] Make Metrics accumulate values on device specified by user (pytorch#1232) (pytorch#1238) * Make Metrics accumulate values on device specified by user (pytorch#1232) * update accuracy to accumulate _num_correct in a tensor on the right device * update loss metric to accumulate _sum in a tensor on the right device * update mae metric to accumulate in a tensor on the right device * update mpd metric to accumulate in a tensor on the right device * update mse metric to accumulate in a tensor on the right device * update top k accuracy metric to accumulate in a tensor on the right device * update precision and recall metrics to accumulate in tensors on the right device * ..... * black formatting * reverted run*.sh * change all metrics default device to cpu except running_average * Update ignite/metrics/precision.py Co-authored-by: vfdev <[email protected]> * remove Optional type from metric devices since default is cpu * add comment explaining lack of detach in accuracy metrics Co-authored-by: vfdev <[email protected]> * Improved and fixed accuracy tests * autopep8 fix * update docs and docstrings for updated metrics (pytorch#1239) * update accuracy to accumulate _num_correct in a tensor on the right device * update loss metric to accumulate _sum in a tensor on the right device * update mae metric to accumulate in a tensor on the right device * update mpd metric to accumulate in a tensor on the right device * update mse metric to accumulate in a tensor on the right device * update top k accuracy metric to accumulate in a tensor on the right device * update precision and recall metrics to accumulate in tensors on the right device * ..... * black formatting * reverted run*.sh * change all metrics default device to cpu except running_average * Update ignite/metrics/precision.py Co-authored-by: vfdev <[email protected]> * remove Optional type from metric devices since default is cpu * add comment explaining lack of detach in accuracy metrics * update docstrings and docs * Update ignite/metrics/accumulation.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/accumulation.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/accumulation.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/accuracy.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/fbeta.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/loss.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/metric.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/precision.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/recall.py Co-authored-by: vfdev <[email protected]> * add comment explaining lack of detach in metrics docs * support device argument for running_average * update support for device argumenet for accumulation * fix and improve device tests for metrics * fix and improve device tests for metrics * fix TPU tests * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: vfdev <[email protected]> * Updates to metrics_impl (pytorch#1266) * update accuracy to accumulate _num_correct in a tensor on the right device * update loss metric to accumulate _sum in a tensor on the right device * update mae metric to accumulate in a tensor on the right device * update mpd metric to accumulate in a tensor on the right device * update mse metric to accumulate in a tensor on the right device * update top k accuracy metric to accumulate in a tensor on the right device * update precision and recall metrics to accumulate in tensors on the right device * ..... * black formatting * reverted run*.sh * change all metrics default device to cpu except running_average * Update ignite/metrics/precision.py Co-authored-by: vfdev <[email protected]> * remove Optional type from metric devices since default is cpu * add comment explaining lack of detach in accuracy metrics * update docstrings and docs * Update ignite/metrics/accumulation.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/accumulation.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/accumulation.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/accuracy.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/fbeta.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/loss.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/metric.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/precision.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/recall.py Co-authored-by: vfdev <[email protected]> * add comment explaining lack of detach in metrics docs * support device argument for running_average * update support for device argumenet for accumulation * fix and improve device tests for metrics * fix and improve device tests for metrics * fix TPU tests * Apply suggestions from code review * Apply suggestions from code review * detach tensors earlier in update * remove redundant to() call * ensure metrics aren't created on XLA devices * Fixed isort * move xla check to Metric.__init__ instead of individual metrics * update xla tests * replace deleted callable check * remove redundant precision and recall __init__ * replace precision/recall __init__ for docs rendering * add support for metrics_lambda with components on diff devices Co-authored-by: vfdev <[email protected]> Co-authored-by: n2cholas <[email protected]> * Update metrics.rst * Update metrics.rst * Fix TPU tests for metrics_impl branch (pytorch#1277) * update accuracy to accumulate _num_correct in a tensor on the right device * update loss metric to accumulate _sum in a tensor on the right device * update mae metric to accumulate in a tensor on the right device * update mpd metric to accumulate in a tensor on the right device * update mse metric to accumulate in a tensor on the right device * update top k accuracy metric to accumulate in a tensor on the right device * update precision and recall metrics to accumulate in tensors on the right device * ..... * black formatting * reverted run*.sh * change all metrics default device to cpu except running_average * Update ignite/metrics/precision.py Co-authored-by: vfdev <[email protected]> * remove Optional type from metric devices since default is cpu * add comment explaining lack of detach in accuracy metrics * update docstrings and docs * Update ignite/metrics/accumulation.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/accumulation.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/accumulation.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/accuracy.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/fbeta.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/loss.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/metric.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/precision.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/recall.py Co-authored-by: vfdev <[email protected]> * add comment explaining lack of detach in metrics docs * support device argument for running_average * update support for device argumenet for accumulation * fix and improve device tests for metrics * fix and improve device tests for metrics * fix TPU tests * Apply suggestions from code review * Apply suggestions from code review * detach tensors earlier in update * remove redundant to() call * ensure metrics aren't created on XLA devices * Fixed isort * move xla check to Metric.__init__ instead of individual metrics * update xla tests * replace deleted callable check * remove redundant precision and recall __init__ * replace precision/recall __init__ for docs rendering * add support for metrics_lambda with components on diff devices * fix epoch_metric xla test Co-authored-by: vfdev <[email protected]> Co-authored-by: n2cholas <[email protected]> * metrics_impl fix 2 gpu hvd tests and ensure consistent detaching (pytorch#1280) * update accuracy to accumulate _num_correct in a tensor on the right device * update loss metric to accumulate _sum in a tensor on the right device * update mae metric to accumulate in a tensor on the right device * update mpd metric to accumulate in a tensor on the right device * update mse metric to accumulate in a tensor on the right device * update top k accuracy metric to accumulate in a tensor on the right device * update precision and recall metrics to accumulate in tensors on the right device * ..... * black formatting * reverted run*.sh * change all metrics default device to cpu except running_average * Update ignite/metrics/precision.py Co-authored-by: vfdev <[email protected]> * remove Optional type from metric devices since default is cpu * add comment explaining lack of detach in accuracy metrics * update docstrings and docs * Update ignite/metrics/accumulation.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/accumulation.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/accumulation.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/accuracy.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/fbeta.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/loss.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/metric.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/precision.py Co-authored-by: vfdev <[email protected]> * Update ignite/metrics/recall.py Co-authored-by: vfdev <[email protected]> * add comment explaining lack of detach in metrics docs * support device argument for running_average * update support for device argumenet for accumulation * fix and improve device tests for metrics * fix and improve device tests for metrics * fix TPU tests * Apply suggestions from code review * Apply suggestions from code review * detach tensors earlier in update * remove redundant to() call * ensure metrics aren't created on XLA devices * Fixed isort * move xla check to Metric.__init__ instead of individual metrics * update xla tests * replace deleted callable check * remove redundant precision and recall __init__ * replace precision/recall __init__ for docs rendering * add support for metrics_lambda with components on diff devices * fix epoch_metric xla test * detach output consistently for all metrics * fix horovod two gpu tests * make confusion matrix detaches like other metrics Co-authored-by: vfdev <[email protected]> Co-authored-by: n2cholas <[email protected]> * Fixes failing test on TPUs Co-authored-by: Nicholas Vadivelu <[email protected]> Co-authored-by: AutoPEP8 <> Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: n2cholas <[email protected]> * Specify tqdm to be less than or equal to v4.48.0 (pytorch#1293) * Fixes pytorch#1285 (pytorch#1290) - use mp.spawn for pytorch < 1.5 * Issue 1249 : fix ParamGroupScheduler with schedulers based on different optimizers (pytorch#1274) * remove **kwargs from LRScheduler * revert ParamGroupScheduler inheritance : remove ParamScheduler base class * use ParamGroupScheduler in ConcatScheduler * add tests for ParamGroupScheduler with multiple optimizers * autopep8 fix * fix doc example * fix from vfdev comments * refactor list of optimizers and paranames * add tests * autopep8 fix Co-authored-by: Desroziers <[email protected]> Co-authored-by: AutoPEP8 <> Co-authored-by: vfdev <[email protected]> * remove prints (pytorch#1292) * remove prints * code formatting Co-authored-by: vfdev <[email protected]> * Fix link to pytorch documents (pytorch#1294) * Fix link to pytorch documents * Fix too long lines Co-authored-by: vfdev <[email protected]> * Added required_output_keys public attribute (1289) (pytorch#1291) * Fixes pytorch#1289 - Promoted _required_output_keys to be public as user would like to override it. * Updated docs * Fixed typo in docs (concepts). (pytorch#1295) * Setup Mypy check at CI step (pytorch#1296) * add mypy file * add mypy at CI step * add mypy step at Contributing.md Co-authored-by: vfdev <[email protected]> * Update README.md * Docker for users with Horovod (pytorch#1248) * [WIP] Docker for users with Horovod - base / vision / nlp - with apex build * [WIP] Docker for users with Horovod - install horovod with .whl , add nccl in runtime image * Docker for users with Horovod - update Readmes for horovod images and configuration * Docker for users with Horovod - hvd tags/v0.20.0 - ignite examples with git sparse checkout * Docker for users with Horovod - update docs Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Added input data type check (pytorch#1301) * Update metrics.rst * Docker for users with MSDeepSpeed (pytorch#1304) * Docker for users with DeepSpeed - msdp-base | vision | nlp * Docker for users with DeepSpeed - rename images extensions to msdp-apex-* Co-authored-by: Sylvain Desroziers <[email protected]> * Update README.md * Updated hvd images + scripts (pytorch#1306) * Updated hvd images - added scripts to auto build and push images * Updated scripts according to the review * Update BatchFiltered docstring * Improve Canberra metric (pytorch#1312) * Add abs on denominators in canberra metric and use sklearn in test * autopep8 fix * improve docstring * use canberra on total computation * Update canberra_metric.py Co-authored-by: Desroziers <[email protected]> Co-authored-by: AutoPEP8 <> Co-authored-by: vfdev <[email protected]> * Improve Canberra metric for DDP (pytorch#1314) * refactor canberra metric for ddp * improve canberra for ddp * autopep8 fix * use tensor for accumulation * detach output * remove useless item() * add missing move to device * refactor detach() and move * refactor to remove useless view_as and to() * do not expose reinit__is_reduced ad sync_all_reduce Co-authored-by: Desroziers <[email protected]> Co-authored-by: AutoPEP8 <> * Improve ManhattanDistance metric for DDP (pytorch#1320) * fix manhattan distance and improve for ddp * replace article by sklearn documentation * Update ignite/contrib/metrics/regression/manhattan_distance.py Co-authored-by: vfdev <[email protected]> Co-authored-by: Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Update README.md * Update about.rst * Update Circle CI docker image to pytorch 1.6.0 (pytorch#1325) * Update Circle CI docker image to pytorch 1.6. Closes pytorch#1225 * Update Circle CI docker image to pytorch 1.6. Closes pytorch#1225 (pytorch#1322) * Revert "Update Circle CI docker image to pytorch 1.6. Closes pytorch#1225 (pytorch#1322)" (pytorch#1323) This reverts commit 22ecac6. * Update Circle CI docker image to pytorch 1.6.0 Closes pytorch#1225 Co-authored-by: vfdev <[email protected]> * Update CONTRIBUTING.md * Add new logo (pytorch#1324) * Update Circle CI docker image to pytorch 1.6. Closes pytorch#1225 (pytorch#1322) * Revert "Update Circle CI docker image to pytorch 1.6. Closes pytorch#1225 (pytorch#1322)" (pytorch#1323) This reverts commit 22ecac6. * add logos * remove past logo from readme * add logo guidelines * Update README.md Changed size to 512 * Updated docs logo Co-authored-by: Juan Miguel Boyero Corral <[email protected]> Co-authored-by: vfdev <[email protected]> * Fixed CI on GPUs with pth 1.6.0 (pytorch#1326) * Fixed CI on GPUs with pth 1.6.0 - updated tests/run_gpu_tests.sh file - updated nccl version to 2.7 for Horovod build * Fixed hvd failing tests * Updated about us (pytorch#1327) - Added CITATION file * Improve R2Score metric for DDP (pytorch#1318) * imrpove r2 for ddp * autopep8 fix * _num_examples type is scalar * autopep8 fix Co-authored-by: Desroziers <[email protected]> Co-authored-by: AutoPEP8 <> Co-authored-by: vfdev <[email protected]> * Fix canberra docstring : reference already in namespace (pytorch#1330) Co-authored-by: Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Improve State and Engine docs pytorch#1259 (pytorch#1333) - add State.restart() method - add note in Engine.run() docstring / improve error message - unit test for State.restart() * pytorch#1336 missing link in doc fix (pytorch#1337) * Make SSIM accumulate on specified device (pytorch#1328) * make ssim accumulate on specified device * keep output on original device until accumulation * implement more efficient kernel creation Co-authored-by: vfdev <[email protected]> * Update documentation for terminate Events (pytorch#1338) * Update documentation for terminate Events (pytorch#1332) * Converted raw table in docstring to list table * Update README.md Co-authored-by: Anmol Joshi <[email protected]> Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: Marijan Smetko <[email protected]> Co-authored-by: Desroziers <[email protected]> Co-authored-by: Lavanya Shukla <[email protected]> Co-authored-by: Akihiro Matsukawa <[email protected]> Co-authored-by: Jake Henning <[email protected]> Co-authored-by: Elijah Rippeth <[email protected]> Co-authored-by: Wang Ran (汪然) <[email protected]> Co-authored-by: Joseph Spisak <[email protected]> Co-authored-by: Ryan Wong <[email protected]> Co-authored-by: Joel Hanson <[email protected]> Co-authored-by: Wansoo Kim <[email protected]> Co-authored-by: Jeff Yang <[email protected]> Co-authored-by: Zhiliang <[email protected]> Co-authored-by: Zhiliang@siemens <[email protected]> Co-authored-by: François COKELAER <[email protected]> Co-authored-by: Kilian Pfeiffer <[email protected]> Co-authored-by: Tawishi <[email protected]> Co-authored-by: Michael Hollingworth <[email protected]> Co-authored-by: Nicholas Vadivelu <[email protected]> Co-authored-by: n2cholas <[email protected]> Co-authored-by: Benjamin Lo <[email protected]> Co-authored-by: Nidhi Zare <[email protected]> Co-authored-by: Keisuke Kamahori <[email protected]> Co-authored-by: Théo Dumont <[email protected]> Co-authored-by: kenjihiraoka <[email protected]> Co-authored-by: Juan Miguel Boyero Corral <[email protected]> Co-authored-by: Isabela Presedo-Floyd <[email protected]> Co-authored-by: Sumit Roy <[email protected]> Co-authored-by: Shashank Gupta <[email protected]>

vfdev-5 added 2 commits July 1, 2020 16:00

Added delegated_save_best_models_by_val_score

09e80b6

Fixes pytorch#1123

2bda650

- added save_handler arg to setup_common_training_handlers - added method delegated_save_best_models_by_val_score

vfdev-5 requested review from erip and sdesrozis July 1, 2020 15:04

sdesrozis approved these changes Jul 1, 2020

View reviewed changes

Merge branch 'master' into issue-1123

9ce8bbe

Merge branch 'master' into issue-1123

1cfb1e2

Renamed delegated_save_best_models_by_val_score to gen_save_best_mode…

436c386

…ls_by_val_score

erip approved these changes Jul 2, 2020

View reviewed changes

vfdev-5 merged commit ab546ab into pytorch:master Jul 2, 2020

vfdev-5 deleted the issue-1123 branch December 17, 2020 21:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue 1123 - Improve usage of contrib common methods with other save handlers #1171

Issue 1123 - Improve usage of contrib common methods with other save handlers #1171
vfdev-5 merged 5 commits intopytorch:masterfrom
vfdev-5:issue-1123

vfdev-5 commented Jul 1, 2020 •

edited

Loading

Uh oh!

sdesrozis left a comment

Uh oh!

vfdev-5 commented Jul 1, 2020

Uh oh!

sdesrozis commented Jul 1, 2020

Uh oh!

vfdev-5 commented Jul 1, 2020 •

edited

Loading

Uh oh!

vfdev-5 commented Jul 2, 2020

Uh oh!

sdesrozis commented Jul 2, 2020

Uh oh!

vfdev-5 commented Jul 2, 2020

Uh oh!

sdesrozis commented Jul 2, 2020

Uh oh!

erip left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

vfdev-5 commented Jul 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdesrozis left a comment

Choose a reason for hiding this comment

Uh oh!

vfdev-5 commented Jul 1, 2020

Uh oh!

sdesrozis commented Jul 1, 2020

Uh oh!

vfdev-5 commented Jul 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vfdev-5 commented Jul 2, 2020

Uh oh!

sdesrozis commented Jul 2, 2020

Uh oh!

vfdev-5 commented Jul 2, 2020

Uh oh!

sdesrozis commented Jul 2, 2020

Uh oh!

erip left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vfdev-5 commented Jul 1, 2020 •

edited

Loading

vfdev-5 commented Jul 1, 2020 •

edited

Loading