Fix ordering bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py #64758

BalajiAI · 2021-09-09T16:56:42Z

🐛 Bug

'CosineAnnealingWarmRestarts' object has no attribute 'T_cur'.
In the Constructor of the CosineAnnealingWarmRestarts, we're calling the constructor of the Parent class (_LRScheduler) which inturn calls the step method of the CosineAnnealingWarmRestarts.
The called method tries to update the object's attribute 'T_cur' which is not defined yet. So it raises the error.
This only holds, when we give the value for last_epoch argument as 0 or greater than 0 to the 'CosineAnnealingWarmRestarts', while initializing the object.

To Reproduce

Steps to reproduce the behavior:

Give the value for the last_epoch argument as zero OR
Give the value for the last_epoch argument as a Positive integer.

Expected behavior

I only expected the 'CosineAnnealingWarmRestarts' object to be initialized.

Environment

PyTorch version: 1.9.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.2
Libc version: glibc-2.31
Python version: 3.8.10 [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: No CUDA

Additional context

We can able to solve this bug by moving the line 'self.T_cur = self.last_epoch' above the 'super(CosineAnnealingWarmRestarts,self).init()' line. Since we've initialized the "self.T_cur" to the object.

Possibly related: #65342

'CosineAnnealingWarmRestarts' object has no attribute 'T_cur'. In the Constructor of the CosineAnnealingWarmRestarts, we're calling the constructor of the Parent class (_LRScheduler) which inturn calls the step method of the CosineAnnealingWarmRestarts. The called method tries to update the object's attribute 'T_cur' which is not defined yet. So it raises the error. This only holds, when we give the value for last_epoch argument as 0 or greater than 0 to the 'CosineAnnealingWarmRestarts', while initializing the object.

facebook-github-bot · 2021-09-09T16:56:47Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/64758
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit cf38722 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

linux-xenial-cuda11.3-py3.6-gcc7 / test (default, 2, 2, linux.8xlarge.nvidia.gpu) (1/1)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-09T17:27:41.8784767Z CONTINUE_THROUGH_ERROR: false

2021-09-09T17:27:41.8776622Z   IN_WHEEL_TEST: 1
2021-09-09T17:27:41.8777214Z   CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
2021-09-09T17:27:41.8778093Z   ALPINE_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine
2021-09-09T17:27:41.8778789Z   PR_LABELS: []
2021-09-09T17:27:41.8780416Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7:3fb4365799993abcfc83e51d42c137e89cb2459a
2021-09-09T17:27:41.8782161Z   JOB_BASE_NAME: linux-xenial-cuda11.3-py3.6-gcc7-test
2021-09-09T17:27:41.8782898Z   TEST_CONFIG: default
2021-09-09T17:27:41.8783340Z   SHARD_NUMBER: 2
2021-09-09T17:27:41.8783742Z   NUM_TEST_SHARDS: 2
2021-09-09T17:27:41.8784236Z   PYTORCH_IGNORE_DISABLED_ISSUES: 
2021-09-09T17:27:41.8784767Z   CONTINUE_THROUGH_ERROR: false
2021-09-09T17:27:41.8785253Z   GPU_FLAG: --gpus all
2021-09-09T17:27:41.8785661Z   SHM_SIZE: 2g
2021-09-09T17:27:41.8786030Z   PR_NUMBER: 64758
2021-09-09T17:27:41.8786436Z ##[endgroup]
2021-09-09T17:28:06.8903051Z Processing ./dist/torch-1.10.0a0+git7c08681-cp36-cp36m-linux_x86_64.whl
2021-09-09T17:28:06.9290347Z Requirement already satisfied: dataclasses in /opt/conda/lib/python3.6/site-packages (from torch==1.10.0a0+git7c08681) (0.8)
2021-09-09T17:28:06.9295865Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.6/site-packages (from torch==1.10.0a0+git7c08681) (3.10.0.0)
2021-09-09T17:28:07.3235420Z Installing collected packages: torch
2021-09-09T17:28:17.3018638Z Successfully installed torch-1.10.0a0+git7c08681
2021-09-09T17:28:17.6671192Z ++++ dirname .jenkins/pytorch/common.sh

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

codecov · 2021-09-09T19:23:22Z

Codecov Report

Merging #64758 (cf38722) into master (2b41bf4) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #64758   +/-   ##
=======================================
  Coverage   66.65%   66.66%           
=======================================
  Files         710      710           
  Lines       92418    92418           
=======================================
+ Hits        61601    61609    +8     
+ Misses      30817    30809    -8

albanD · 2021-09-13T14:49:56Z

Doing this would break #23480 again no?

iramazanli · 2021-09-13T16:36:03Z

Doing this would break #23480 again no?

I feel the assignment changes to self.T_cur = last_epoch from self.T_cur = self.last_epoch in this PR. That's why i don't expect the same error happen again. Because previously the error #23480 was happening as we were setting

        self.T_cur = last_epoch

after the initialization of superclass. However, here we're doing the update before initialzation.

What do you think about this @albanD ?

BalajiAI · 2021-09-14T10:54:28Z

Doing this would break #23480 again no?

No sir. It won't break.

albanD

ok

BalajiAI · 2021-09-22T01:57:27Z

@jbschlosser
Can you review sir?

jbschlosser

Looks good AFAICT! From my (limited) testing, it seems to fix the problem

facebook-github-bot · 2021-09-22T16:03:46Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-09-23T00:03:33Z

@jbschlosser merged this pull request in 32f0387.

facebook-github-bot added the cla signed label Sep 9, 2021

pytorchbot added the open source label Sep 9, 2021

jbschlosser requested review from albanD and jbschlosser September 14, 2021 13:07

jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 14, 2021

albanD approved these changes Sep 14, 2021

View reviewed changes

jbschlosser approved these changes Sep 22, 2021

View reviewed changes

jbschlosser changed the title ~~Bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py~~ Fix ordering bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py Sep 22, 2021

facebook-github-bot closed this in 32f0387 Sep 23, 2021

facebook-github-bot added the Merged label Sep 23, 2021

BalajiAI deleted the patch-2 branch September 23, 2021 01:44

BalajiAI restored the patch-2 branch September 23, 2021 01:44

BalajiAI deleted the patch-2 branch September 23, 2021 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ordering bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py #64758

Fix ordering bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py #64758

Uh oh!

BalajiAI commented Sep 9, 2021 •

edited by jbschlosser

Loading

Uh oh!

facebook-github-bot commented Sep 9, 2021 •

edited

Loading

Uh oh!

codecov bot commented Sep 9, 2021 •

edited

Loading

Uh oh!

albanD commented Sep 13, 2021

Uh oh!

iramazanli commented Sep 13, 2021

Uh oh!

BalajiAI commented Sep 14, 2021 •

edited

Loading

Uh oh!

albanD left a comment

Uh oh!

BalajiAI commented Sep 22, 2021

Uh oh!

jbschlosser left a comment

Uh oh!

facebook-github-bot commented Sep 22, 2021

Uh oh!

facebook-github-bot commented Sep 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Fix ordering bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py #64758

Fix ordering bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py #64758

Uh oh!

Conversation

BalajiAI commented Sep 9, 2021 • edited by jbschlosser Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Uh oh!

facebook-github-bot commented Sep 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

linux-xenial-cuda11.3-py3.6-gcc7 / test (default, 2, 2, linux.8xlarge.nvidia.gpu) (1/1)

Uh oh!

codecov bot commented Sep 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

albanD commented Sep 13, 2021

Uh oh!

iramazanli commented Sep 13, 2021

Uh oh!

BalajiAI commented Sep 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

BalajiAI commented Sep 22, 2021

Uh oh!

jbschlosser left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 22, 2021

Uh oh!

facebook-github-bot commented Sep 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

BalajiAI commented Sep 9, 2021 •

edited by jbschlosser

Loading

facebook-github-bot commented Sep 9, 2021 •

edited

Loading

codecov bot commented Sep 9, 2021 •

edited

Loading

BalajiAI commented Sep 14, 2021 •

edited

Loading