Skip to content

Conversation

@huydhn
Copy link
Contributor

@huydhn huydhn commented Dec 10, 2024

Fixes #142485

The workflow check lint job timed out in trunk, i.e. https://github.com/pytorch/pytorch/actions/runs/12261226178/job/34207762939, and here was what happened:

  1. Enable py3.13 wheels for ROCm #142294 landed yesterday to build ROCm on 3.13, but the PR had a landrace with [EZ] Do not checkout builder for Linux builds #142282 in the generated workflow file
  2. The trunk lint check caught that in https://github.com/pytorch/pytorch/blob/main/.github/scripts/report_git_status.sh#L2
  3. However, the script also attempted to print the difference with git diff .github/workflows. This command was the one that stuck because git diff uses page by default and requires a prompt to display the next page ¯_(ツ)_/¯

It took so long to debug this because a timeout Nova GHA doesn't print any progress. I'll create an issue for this.

Bonus:

I also fix the broken print from test tool lint job that confuses GitHub https://github.com/pytorch/pytorch/actions/runs/12261226178 with an annotation failure Credentials could not be loaded, please check your action inputs

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142476

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9f82330 with merge base d3d1a78 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Dec 10, 2024
@huydhn huydhn changed the title Why does this one timeout? Fix timeout check workflow lint job Dec 10, 2024
@huydhn huydhn marked this pull request as ready for review December 10, 2024 19:42
@huydhn huydhn requested a review from a team as a code owner December 10, 2024 19:42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the generated file changed, but not the source file? Did we forget to generate on a previous commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a landrace #142282 removed some while #142294 l added them back. Thus, rerunning the script will generate a new workflow file.

@huydhn
Copy link
Contributor Author

huydhn commented Dec 10, 2024

@pytorchbot merge -f 'Lint jobs have passed'

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Comment on lines +210 to +211
PYTHONPATH=$(pwd) pytest tools/test/test_*.py
PYTHONPATH=$(pwd) pytest .github/scripts/test_*.py
Copy link
Contributor

@ZainRizvi ZainRizvi Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this only setting they PYTHONPATH env var, resulting in no test files being executed or discovered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mori360 pushed a commit to mori360/pytorch that referenced this pull request Dec 11, 2024
Fixes pytorch#142485

The workflow check lint job timed out in trunk, i.e. https://github.com/pytorch/pytorch/actions/runs/12261226178/job/34207762939, and here was what happened:

1. pytorch#142294 landed yesterday to build ROCm on 3.13, but the PR had a landrace with pytorch#142282 in the generated workflow file
2. The trunk lint check caught that in https://github.com/pytorch/pytorch/blob/main/.github/scripts/report_git_status.sh#L2
3. However, the script also attempted to print the difference with `git diff .github/workflows`.  This command was the one that stuck because `git diff` uses page by default and requires a prompt to display the next page ¯\_(ツ)_/¯

It took so long to debug this because a timeout Nova GHA doesn't print any progress.  I'll create an issue for this.

Bonus:

I also fix the broken print from test tool lint job that confuses GitHub https://github.com/pytorch/pytorch/actions/runs/12261226178 with an annotation failure `Credentials could not be loaded, please check your action inputs`

Pull Request resolved: pytorch#142476
Approved by: https://github.com/wdvr
bluenote10 pushed a commit to bluenote10/pytorch that referenced this pull request Dec 14, 2024
Fixes pytorch#142485

The workflow check lint job timed out in trunk, i.e. https://github.com/pytorch/pytorch/actions/runs/12261226178/job/34207762939, and here was what happened:

1. pytorch#142294 landed yesterday to build ROCm on 3.13, but the PR had a landrace with pytorch#142282 in the generated workflow file
2. The trunk lint check caught that in https://github.com/pytorch/pytorch/blob/main/.github/scripts/report_git_status.sh#L2
3. However, the script also attempted to print the difference with `git diff .github/workflows`.  This command was the one that stuck because `git diff` uses page by default and requires a prompt to display the next page ¯\_(ツ)_/¯

It took so long to debug this because a timeout Nova GHA doesn't print any progress.  I'll create an issue for this.

Bonus:

I also fix the broken print from test tool lint job that confuses GitHub https://github.com/pytorch/pytorch/actions/runs/12261226178 with an annotation failure `Credentials could not be loaded, please check your action inputs`

Pull Request resolved: pytorch#142476
Approved by: https://github.com/wdvr
@github-actions github-actions bot deleted the debug-workflow-checks-lint branch January 11, 2025 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UNSTABLE Lint / workflow-checks / linux-job

5 participants