Skip to content

Conversation

@clee2000
Copy link
Contributor

@clee2000 clee2000 commented Mar 26, 2025

Disabling some tests to restore periodic

nogpu avx512 timeout:
https://hud.pytorch.org/pytorch/pytorch/commit/59f14d19aea4091c65cca2417c509e3dbf60c0ed#38492953496-box

profiler failure: https://hud.pytorch.org/pytorch/pytorch/commit/7ae0ce6360b6e4f944906502d20da24c04debee5#38461255009-box

test_accelerator failure:
https://hud.pytorch.org/pytorch/pytorch/commit/87bfd66c3c7061db6d36d8daa62f08f507f90e39#39476723746-box
origin: 146098

test_overrides failure:
https://hud.pytorch.org/pytorch/pytorch/commit/bf752c36da08871d76a66fd52ad09f87e66fc770#39484562957-box
origin: 146098

inductor cpu repro:
https://hud.pytorch.org/pytorch/pytorch/commit/bb9c4260249ea0c57e87395eff5271fb479efb6a#38447525659-box

functorch eager transforms:
https://hud.pytorch.org/pytorch/pytorch/commit/8f858e226ba81fde41d39aa34f1fd4cb4a4ecc51#39488068620-box
https://hud.pytorch.org/pytorch/pytorch/commit/f2cea01f7195e59abd154b5551213ee3e38fa40d#39555064878
https://hud.pytorch.org/pytorch/pytorch/commit/b5281a4a1806c978e34c5cfa0befd298e469b7fd#39599355600
either 148288 or 148261?

https://hud.pytorch.org/hud/pytorch/pytorch/2ec9aceaeb77176c4bdeb2d008a34cba0cd57e3c/1?per_page=100&name_filter=periodic&mergeLF=true

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150059

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 76 Pending

As of commit b069100 with merge base 85e4e51 (image):

NEW FAILURE - The following job has failed:

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@clee2000 clee2000 added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Mar 26, 2025
@clee2000 clee2000 changed the title [CI] Disable some tests that are failing in periodic, add another shard for nogpu_AVX512 [CI] Disable some tests that are failing in periodic Mar 26, 2025
@clee2000 clee2000 closed this Mar 27, 2025
@clee2000 clee2000 reopened this Mar 27, 2025
@clee2000 clee2000 added the keep-going Don't stop on first failure, keep running tests until the end label Mar 27, 2025
@clee2000 clee2000 force-pushed the csl/disable_some_periodic_tests branch from 4095f65 to aa32624 Compare March 27, 2025 17:41
@clee2000 clee2000 marked this pull request as ready for review March 28, 2025 16:08
@clee2000 clee2000 requested a review from a team as a code owner March 28, 2025 16:08
Copy link
Contributor

@ZainRizvi ZainRizvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prob best to share the full list of failures on treehuggers as well (in addition to merging this PR). Someone might decide they actually care about fixing some of these test.

Comment on lines 34 to 37
if os.getenv("ATEN_CPU_CAPABILITY") in ("default", "avx2"):
# This test is not supported on ARM
print("Skipping due to failing when cuda build runs on non cuda machine, see #150059 for example")
sys.exit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried using pytest.skip(allow_module_level=True) (docs)? That might let you properly skip all tests (and have them be marked as skipped) instead of this silent skip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried this on a dummy test file, unfortunately it just says 1 skipped for the entire file and I can't seem to get it to print the skip reason either


@dtypes(torch.float)
@unittest.skipIf(
TEST_CUDA_MEM_LEAK_CHECK, "Leaking memory, see #150059 for example"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you include the full link here for the benefit of future devs?

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thank you very much for this

@clee2000
Copy link
Contributor Author

@pytorchbot merge -f "previous run passed, most recent commit mostly just comment changes + 1 more test"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor

atalman commented Mar 28, 2025

@pytorchbot cherry-pick --onto release/2.7 --c regression

@pytorchbot
Copy link
Collaborator

Cherry picking #150059

Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 9092dd2e828c4af67620e36b6d985b8d4ab2997b returned non-zero exit code 1

Auto-merging .github/workflows/periodic.yml
Auto-merging test/inductor/test_cpu_repro.py
CONFLICT (content): Merge conflict in test/inductor/test_cpu_repro.py
Auto-merging test/profiler/test_profiler.py
error: could not apply 9092dd2e828... [CI] Disable some tests that are failing in periodic (#150059)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

atalman pushed a commit to atalman/pytorch that referenced this pull request Mar 31, 2025
Disabling some tests to restore periodic

nogpu avx512 timeout:
https://hud.pytorch.org/pytorch/pytorch/commit/59f14d19aea4091c65cca2417c509e3dbf60c0ed#38492953496-box

profiler failure: https://hud.pytorch.org/pytorch/pytorch/commit/7ae0ce6360b6e4f944906502d20da24c04debee5#38461255009-box

test_accelerator failure:
https://hud.pytorch.org/pytorch/pytorch/commit/87bfd66c3c7061db6d36d8daa62f08f507f90e39#39476723746-box
origin: 146098

test_overrides failure:
https://hud.pytorch.org/pytorch/pytorch/commit/bf752c36da08871d76a66fd52ad09f87e66fc770#39484562957-box
origin: 146098

inductor cpu repro:
https://hud.pytorch.org/pytorch/pytorch/commit/bb9c4260249ea0c57e87395eff5271fb479efb6a#38447525659-box

functorch eager transforms:
https://hud.pytorch.org/pytorch/pytorch/commit/8f858e226ba81fde41d39aa34f1fd4cb4a4ecc51#39488068620-box
https://hud.pytorch.org/pytorch/pytorch/commit/f2cea01f7195e59abd154b5551213ee3e38fa40d#39555064878
https://hud.pytorch.org/pytorch/pytorch/commit/b5281a4a1806c978e34c5cfa0befd298e469b7fd#39599355600
either 148288 or 148261?

https://hud.pytorch.org/hud/pytorch/pytorch/2ec9aceaeb77176c4bdeb2d008a34cba0cd57e3c/1?per_page=100&name_filter=periodic&mergeLF=true

Pull Request resolved: pytorch#150059
Approved by: https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/malfet
malfet added a commit that referenced this pull request Apr 2, 2025
…50059 (#150327)

* [CI] Disable some tests that are failing in periodic (#150059)

Disabling some tests to restore periodic

nogpu avx512 timeout:
https://hud.pytorch.org/pytorch/pytorch/commit/59f14d19aea4091c65cca2417c509e3dbf60c0ed#38492953496-box

profiler failure: https://hud.pytorch.org/pytorch/pytorch/commit/7ae0ce6360b6e4f944906502d20da24c04debee5#38461255009-box

test_accelerator failure:
https://hud.pytorch.org/pytorch/pytorch/commit/87bfd66c3c7061db6d36d8daa62f08f507f90e39#39476723746-box
origin: 146098

test_overrides failure:
https://hud.pytorch.org/pytorch/pytorch/commit/bf752c36da08871d76a66fd52ad09f87e66fc770#39484562957-box
origin: 146098

inductor cpu repro:
https://hud.pytorch.org/pytorch/pytorch/commit/bb9c4260249ea0c57e87395eff5271fb479efb6a#38447525659-box

functorch eager transforms:
https://hud.pytorch.org/pytorch/pytorch/commit/8f858e226ba81fde41d39aa34f1fd4cb4a4ecc51#39488068620-box
https://hud.pytorch.org/pytorch/pytorch/commit/f2cea01f7195e59abd154b5551213ee3e38fa40d#39555064878
https://hud.pytorch.org/pytorch/pytorch/commit/b5281a4a1806c978e34c5cfa0befd298e469b7fd#39599355600
either 148288 or 148261?

https://hud.pytorch.org/hud/pytorch/pytorch/2ec9aceaeb77176c4bdeb2d008a34cba0cd57e3c/1?per_page=100&name_filter=periodic&mergeLF=true

Pull Request resolved: #150059
Approved by: https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/malfet

* disable_CompiledOptimizerParityTests

* Update test/inductor/test_compiled_optimizers.py

---------

Co-authored-by: Catherine Lee <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
Disabling some tests to restore periodic

nogpu avx512 timeout:
https://hud.pytorch.org/pytorch/pytorch/commit/59f14d19aea4091c65cca2417c509e3dbf60c0ed#38492953496-box

profiler failure: https://hud.pytorch.org/pytorch/pytorch/commit/7ae0ce6360b6e4f944906502d20da24c04debee5#38461255009-box

test_accelerator failure:
https://hud.pytorch.org/pytorch/pytorch/commit/87bfd66c3c7061db6d36d8daa62f08f507f90e39#39476723746-box
origin: 146098

test_overrides failure:
https://hud.pytorch.org/pytorch/pytorch/commit/bf752c36da08871d76a66fd52ad09f87e66fc770#39484562957-box
origin: 146098

inductor cpu repro:
https://hud.pytorch.org/pytorch/pytorch/commit/bb9c4260249ea0c57e87395eff5271fb479efb6a#38447525659-box

functorch eager transforms:
https://hud.pytorch.org/pytorch/pytorch/commit/8f858e226ba81fde41d39aa34f1fd4cb4a4ecc51#39488068620-box
https://hud.pytorch.org/pytorch/pytorch/commit/f2cea01f7195e59abd154b5551213ee3e38fa40d#39555064878
https://hud.pytorch.org/pytorch/pytorch/commit/b5281a4a1806c978e34c5cfa0befd298e469b7fd#39599355600
either 148288 or 148261?

https://hud.pytorch.org/hud/pytorch/pytorch/2ec9aceaeb77176c4bdeb2d008a34cba0cd57e3c/1?per_page=100&name_filter=periodic&mergeLF=true

Pull Request resolved: pytorch#150059
Approved by: https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/malfet
@github-actions github-actions bot deleted the csl/disable_some_periodic_tests branch May 2, 2025 02:17
Tensor = torch.Tensor

if os.getenv("ATEN_CPU_CAPABILITY") in ("default", "avx2"):
# This test is not supported on ARM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clee2000 Just came across this: The comment seems wrongly pasted from somewhere else.

I assume this test is supposed to be skipped when no GPU is available. So wouldn't checking TEST_CUDA be better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR keep-going Don't stop on first failure, keep running tests until the end Merged module: inductor topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants