Skip to content

StatsHandler Has Wrong Numbering #3197

@bhashemian

Description

@bhashemian

Describe the bug

StatsHandler uses 0 based number for iterations but goes up to N (instead of N -1) . Please find below an example output of what is happening. The training goes from 0/N to N/N which is N+1 number, so the iteration numbering of current epoch bleeds into the next epoch.

2021-10-26 22:25:02,068 - ignite.engine.engine.SupervisedTrainer - INFO - Epoch: 1/4, Iter: 36/38 -- train_loss: 0.6423 
2021-10-26 22:25:02,527 - ignite.engine.engine.SupervisedTrainer - INFO - Epoch: 1/4, Iter: 37/38 -- train_loss: 0.5574 
2021-10-26 22:25:02,531 - ignite.engine.engine.SupervisedTrainer - INFO - Current learning rate: 0.0008535533905932737
2021-10-26 22:25:02,531 - ignite.engine.engine.SupervisedEvaluator - INFO - Engine run resuming from iteration 0, epoch 0 until 1 epochs
2021-10-26 22:25:30,474 - ignite.engine.engine.SupervisedEvaluator - INFO - Got new best metric of val_acc: 0.6845185185185185
2021-10-26 22:25:30,474 - ignite.engine.engine.SupervisedEvaluator - INFO - Epoch[1] Metrics -- val_acc: 0.6845 
2021-10-26 22:25:30,474 - ignite.engine.engine.SupervisedEvaluator - INFO - Key metric: val_acc best value: 0.6845185185185185 at epoch: 1
2021-10-26 22:25:30,555 - ignite.engine.engine.SupervisedEvaluator - INFO - Epoch[1] Complete. Time taken: 00:00:27
2021-10-26 22:25:30,555 - ignite.engine.engine.SupervisedEvaluator - INFO - Engine run complete. Time taken: 00:00:28
2021-10-26 22:25:30,741 - ignite.engine.engine.SupervisedTrainer - INFO - Saved checkpoint at epoch: 1
2021-10-26 22:25:30,741 - ignite.engine.engine.SupervisedTrainer - INFO - Key metric: None best value: -1 at epoch: -1
2021-10-26 22:25:30,742 - ignite.engine.engine.SupervisedTrainer - INFO - Epoch[1] Complete. Time taken: 00:03:53
2021-10-26 22:26:06,201 - ignite.engine.engine.SupervisedTrainer - INFO - Epoch: 2/4, Iter: 38/38 -- train_loss: 0.5390 
2021-10-26 22:26:07,537 - ignite.engine.engine.SupervisedTrainer - INFO - Epoch: 2/4, Iter: 0/38 -- train_loss: 0.6080 
2021-10-26 22:26:08,645 - ignite.engine.engine.SupervisedTrainer - INFO - Epoch: 2/4, Iter: 1/38 -- train_loss: 0.6061 

and it get worse over next epochs:

INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 2/4, Iter: 35/38 -- train_loss: 0.6096 
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 2/4, Iter: 36/38 -- train_loss: 0.5019 
INFO:ignite.engine.engine.SupervisedTrainer:Current learning rate: 0.0005
INFO:ignite.engine.engine.SupervisedEvaluator:Engine run resuming from iteration 0, epoch 1 until 2 epochs
INFO:ignite.engine.engine.SupervisedEvaluator:Got new best metric of val_acc: 0.7331851851851852
INFO:ignite.engine.engine.SupervisedEvaluator:Epoch[2] Metrics -- val_acc: 0.7332 
INFO:ignite.engine.engine.SupervisedEvaluator:Key metric: val_acc best value: 0.7331851851851852 at epoch: 2
INFO:ignite.engine.engine.SupervisedEvaluator:Epoch[2] Complete. Time taken: 00:00:55
INFO:ignite.engine.engine.SupervisedEvaluator:Engine run complete. Time taken: 00:00:56
INFO:ignite.engine.engine.SupervisedTrainer:Saved checkpoint at epoch: 2
INFO:ignite.engine.engine.SupervisedTrainer:Key metric: None best value: -1 at epoch: -1
INFO:ignite.engine.engine.SupervisedTrainer:Epoch[2] Complete. Time taken: 00:04:20
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 3/4, Iter: 37/38 -- train_loss: 0.4985 
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 3/4, Iter: 38/38 -- train_loss: 0.5456 
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 3/4, Iter: 0/38 -- train_loss: 0.5515 
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 3/4, Iter: 1/38 -- train_loss: 0.5685 

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Relationships

None yet

Development

No branches or pull requests

Issue actions