Skip to content

Conversation

@BalajiAI
Copy link
Contributor

@BalajiAI BalajiAI commented Sep 9, 2021

🐛 Bug

'CosineAnnealingWarmRestarts' object has no attribute 'T_cur'.
In the Constructor of the CosineAnnealingWarmRestarts, we're calling the constructor of the Parent class (_LRScheduler) which inturn calls the step method of the CosineAnnealingWarmRestarts.
The called method tries to update the object's attribute 'T_cur' which is not defined yet. So it raises the error.
This only holds, when we give the value for last_epoch argument as 0 or greater than 0 to the 'CosineAnnealingWarmRestarts', while initializing the object.

Bug_in_CosineAnnealingWarmRestarts

To Reproduce

Steps to reproduce the behavior:

  1. Give the value for the last_epoch argument as zero OR
  2. Give the value for the last_epoch argument as a Positive integer.

Expected behavior

I only expected the 'CosineAnnealingWarmRestarts' object to be initialized.

Environment

PyTorch version: 1.9.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.2
Libc version: glibc-2.31
Python version: 3.8.10 [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: No CUDA

Additional context

We can able to solve this bug by moving the line 'self.T_cur = self.last_epoch' above the 'super(CosineAnnealingWarmRestarts,self).init()' line. Since we've initialized the "self.T_cur" to the object.

Possibly related: #65342

'CosineAnnealingWarmRestarts'  object has no attribute 'T_cur'.
In the Constructor of the CosineAnnealingWarmRestarts, we're calling the constructor of the Parent class (_LRScheduler) which inturn calls the step method of the CosineAnnealingWarmRestarts.
The called method tries to update the object's attribute  'T_cur' which is not defined yet. So it raises the error.
This only holds, when we give the value for last_epoch argument as 0 or greater than 0 to the 'CosineAnnealingWarmRestarts', while initializing the object.
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 9, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit cf38722 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-cuda11.3-py3.6-gcc7 / test (default, 2, 2, linux.8xlarge.nvidia.gpu) (1/1)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-09T17:27:41.8784767Z CONTINUE_THROUGH_ERROR: false
2021-09-09T17:27:41.8776622Z   IN_WHEEL_TEST: 1
2021-09-09T17:27:41.8777214Z   CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
2021-09-09T17:27:41.8778093Z   ALPINE_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine
2021-09-09T17:27:41.8778789Z   PR_LABELS: []
2021-09-09T17:27:41.8780416Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7:3fb4365799993abcfc83e51d42c137e89cb2459a
2021-09-09T17:27:41.8782161Z   JOB_BASE_NAME: linux-xenial-cuda11.3-py3.6-gcc7-test
2021-09-09T17:27:41.8782898Z   TEST_CONFIG: default
2021-09-09T17:27:41.8783340Z   SHARD_NUMBER: 2
2021-09-09T17:27:41.8783742Z   NUM_TEST_SHARDS: 2
2021-09-09T17:27:41.8784236Z   PYTORCH_IGNORE_DISABLED_ISSUES: 
2021-09-09T17:27:41.8784767Z   CONTINUE_THROUGH_ERROR: false
2021-09-09T17:27:41.8785253Z   GPU_FLAG: --gpus all
2021-09-09T17:27:41.8785661Z   SHM_SIZE: 2g
2021-09-09T17:27:41.8786030Z   PR_NUMBER: 64758
2021-09-09T17:27:41.8786436Z ##[endgroup]
2021-09-09T17:28:06.8903051Z Processing ./dist/torch-1.10.0a0+git7c08681-cp36-cp36m-linux_x86_64.whl
2021-09-09T17:28:06.9290347Z Requirement already satisfied: dataclasses in /opt/conda/lib/python3.6/site-packages (from torch==1.10.0a0+git7c08681) (0.8)
2021-09-09T17:28:06.9295865Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.6/site-packages (from torch==1.10.0a0+git7c08681) (3.10.0.0)
2021-09-09T17:28:07.3235420Z Installing collected packages: torch
2021-09-09T17:28:17.3018638Z Successfully installed torch-1.10.0a0+git7c08681
2021-09-09T17:28:17.6671192Z ++++ dirname .jenkins/pytorch/common.sh

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@codecov
Copy link

codecov bot commented Sep 9, 2021

Codecov Report

Merging #64758 (cf38722) into master (2b41bf4) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #64758   +/-   ##
=======================================
  Coverage   66.65%   66.66%           
=======================================
  Files         710      710           
  Lines       92418    92418           
=======================================
+ Hits        61601    61609    +8     
+ Misses      30817    30809    -8     

@albanD
Copy link
Collaborator

albanD commented Sep 13, 2021

Doing this would break #23480 again no?

@iramazanli
Copy link
Contributor

Doing this would break #23480 again no?

I feel the assignment changes to self.T_cur = last_epoch from self.T_cur = self.last_epoch in this PR. That's why i don't expect the same error happen again. Because previously the error #23480 was happening as we were setting

        self.T_cur = last_epoch

after the initialization of superclass. However, here we're doing the update before initialzation.

What do you think about this @albanD ?

@BalajiAI
Copy link
Contributor Author

BalajiAI commented Sep 14, 2021

Doing this would break #23480 again no?

No sir. It won't break.

@jbschlosser jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 14, 2021
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@BalajiAI
Copy link
Contributor Author

@jbschlosser
Can you review sir?

Copy link
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good AFAICT! From my (limited) testing, it seems to fix the problem

@facebook-github-bot
Copy link
Contributor

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jbschlosser jbschlosser changed the title Bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py Fix ordering bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py Sep 22, 2021
@facebook-github-bot
Copy link
Contributor

@jbschlosser merged this pull request in 32f0387.

@BalajiAI BalajiAI deleted the patch-2 branch September 23, 2021 01:44
@BalajiAI BalajiAI restored the patch-2 branch September 23, 2021 01:44
@BalajiAI BalajiAI deleted the patch-2 branch September 23, 2021 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants