-
Notifications
You must be signed in to change notification settings - Fork 1.5k
StatsHandler Has Wrong Numbering #3197
Copy link
Copy link
Closed
Labels
bugSomething isn't workingSomething isn't working
Milestone
Description
Describe the bug
StatsHandler uses 0 based number for iterations but goes up to N (instead of N -1) . Please find below an example output of what is happening. The training goes from 0/N to N/N which is N+1 number, so the iteration numbering of current epoch bleeds into the next epoch.
2021-10-26 22:25:02,068 - ignite.engine.engine.SupervisedTrainer - INFO - Epoch: 1/4, Iter: 36/38 -- train_loss: 0.6423
2021-10-26 22:25:02,527 - ignite.engine.engine.SupervisedTrainer - INFO - Epoch: 1/4, Iter: 37/38 -- train_loss: 0.5574
2021-10-26 22:25:02,531 - ignite.engine.engine.SupervisedTrainer - INFO - Current learning rate: 0.0008535533905932737
2021-10-26 22:25:02,531 - ignite.engine.engine.SupervisedEvaluator - INFO - Engine run resuming from iteration 0, epoch 0 until 1 epochs
2021-10-26 22:25:30,474 - ignite.engine.engine.SupervisedEvaluator - INFO - Got new best metric of val_acc: 0.6845185185185185
2021-10-26 22:25:30,474 - ignite.engine.engine.SupervisedEvaluator - INFO - Epoch[1] Metrics -- val_acc: 0.6845
2021-10-26 22:25:30,474 - ignite.engine.engine.SupervisedEvaluator - INFO - Key metric: val_acc best value: 0.6845185185185185 at epoch: 1
2021-10-26 22:25:30,555 - ignite.engine.engine.SupervisedEvaluator - INFO - Epoch[1] Complete. Time taken: 00:00:27
2021-10-26 22:25:30,555 - ignite.engine.engine.SupervisedEvaluator - INFO - Engine run complete. Time taken: 00:00:28
2021-10-26 22:25:30,741 - ignite.engine.engine.SupervisedTrainer - INFO - Saved checkpoint at epoch: 1
2021-10-26 22:25:30,741 - ignite.engine.engine.SupervisedTrainer - INFO - Key metric: None best value: -1 at epoch: -1
2021-10-26 22:25:30,742 - ignite.engine.engine.SupervisedTrainer - INFO - Epoch[1] Complete. Time taken: 00:03:53
2021-10-26 22:26:06,201 - ignite.engine.engine.SupervisedTrainer - INFO - Epoch: 2/4, Iter: 38/38 -- train_loss: 0.5390
2021-10-26 22:26:07,537 - ignite.engine.engine.SupervisedTrainer - INFO - Epoch: 2/4, Iter: 0/38 -- train_loss: 0.6080
2021-10-26 22:26:08,645 - ignite.engine.engine.SupervisedTrainer - INFO - Epoch: 2/4, Iter: 1/38 -- train_loss: 0.6061
and it get worse over next epochs:
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 2/4, Iter: 35/38 -- train_loss: 0.6096
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 2/4, Iter: 36/38 -- train_loss: 0.5019
INFO:ignite.engine.engine.SupervisedTrainer:Current learning rate: 0.0005
INFO:ignite.engine.engine.SupervisedEvaluator:Engine run resuming from iteration 0, epoch 1 until 2 epochs
INFO:ignite.engine.engine.SupervisedEvaluator:Got new best metric of val_acc: 0.7331851851851852
INFO:ignite.engine.engine.SupervisedEvaluator:Epoch[2] Metrics -- val_acc: 0.7332
INFO:ignite.engine.engine.SupervisedEvaluator:Key metric: val_acc best value: 0.7331851851851852 at epoch: 2
INFO:ignite.engine.engine.SupervisedEvaluator:Epoch[2] Complete. Time taken: 00:00:55
INFO:ignite.engine.engine.SupervisedEvaluator:Engine run complete. Time taken: 00:00:56
INFO:ignite.engine.engine.SupervisedTrainer:Saved checkpoint at epoch: 2
INFO:ignite.engine.engine.SupervisedTrainer:Key metric: None best value: -1 at epoch: -1
INFO:ignite.engine.engine.SupervisedTrainer:Epoch[2] Complete. Time taken: 00:04:20
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 3/4, Iter: 37/38 -- train_loss: 0.4985
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 3/4, Iter: 38/38 -- train_loss: 0.5456
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 3/4, Iter: 0/38 -- train_loss: 0.5515
INFO:ignite.engine.engine.SupervisedTrainer:Epoch: 3/4, Iter: 1/38 -- train_loss: 0.5685
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working