Parametrize test_lstm_packed #137447

huydhn · 2024-10-07T22:04:22Z

The test runs all its combination (512) sequentially, so it takes more than 30 minutes to finish or timeout on ASAN after one hour. Parametrizing it will break it up, so individual tests can finish and aren't need to be marked as slow anymore.

Also, the test seems to run OOM on a 2xlarge with std::bad_alloc memory error. Maybe, this would also fix the issue (pending CI testing)

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

The test runs all its combination (512) sequentially

pytorch-bot · 2024-10-07T22:04:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137447

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit c87af06 with merge base 14b4099 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

slow / linux-focal-cuda12.1-py3-gcc9-slow-gradcheck / test (default, 2, 8, lf.linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
test_maskedtensor.py::TestBasicsCUDA::test_stack_cuda

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD

Ho nice refactor, thanks!

huydhn · 2024-10-08T07:54:15Z

@pytorchbot merge

pytorchmergebot · 2024-10-08T07:56:20Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-10-08T11:39:53Z

Merge failed

Reason: 1 jobs have failed, first few of them are: slow / linux-focal-cuda12.1-py3-gcc9-slow-gradcheck / test (default, 6, 8, lf.linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

huydhn · 2024-10-08T15:24:41Z

@pytorchbot merge -f 'Existing slow failures'

pytorchmergebot · 2024-10-08T15:26:19Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

huydhn · 2024-10-08T20:13:46Z

@pytorchbot revert -m 'Need to up few more instance to 4xlarge, revert to reland' -c weird

pytorchmergebot · 2024-10-08T20:15:19Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2024-10-08T20:15:27Z

@huydhn your PR has been successfully reverted.

This reverts commit d5493ed. Reverted #137447 on behalf of https://github.com/huydhn due to Need to up few more instance to 4xlarge, revert to reland ([comment](#137447 (comment)))

huydhn · 2024-10-09T04:52:53Z

@pytorchbot merge

pytorchmergebot · 2024-10-09T04:54:39Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-10-09T04:54:46Z

Merge failed

Reason: 1 jobs have failed, first few of them are: slow / linux-focal-cuda12.1-py3-gcc9-slow-gradcheck / test (default, 2, 8, lf.linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

huydhn · 2024-10-09T05:11:21Z

@pytorchbot merge -f 'Existing slow failures'

pytorchmergebot · 2024-10-09T05:13:45Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

The failed test is recently moved backed from slow and it requires more RAM than what available on 2xlarge runner. It looks ok to up the instance size to 4xlarge instead. I missed periodic jobs in #137447 Example periodic failures https://hud.pytorch.org/pytorch/pytorch/commit/de4c2a3b4e89d96334dc678d1c3f2ae51a6630a0 (test_cpu_repro) Pull Request resolved: #137633 Approved by: https://github.com/seemethere, https://github.com/malfet

Parametrize test_lstm_packed

9ce6b74

The test runs all its combination (512) sequentially

huydhn added the test-config/default label Oct 7, 2024

huydhn requested a review from a team October 7, 2024 22:04

pytorch-bot bot added module: inductor topic: not user facing topic category labels Oct 7, 2024

huydhn added the ciflow/slow label Oct 7, 2024

huydhn requested a review from clee2000 October 7, 2024 22:05

albanD approved these changes Oct 7, 2024

View reviewed changes

malfet approved these changes Oct 7, 2024

View reviewed changes

huydhn added 2 commits October 7, 2024 16:39

Also need to update test_cpu_cpp_wrapper

d800587

Up the CPU instance size to avoid memory error

dd6baac

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 8, 2024

pytorchmergebot added the merging label Oct 8, 2024

pytorchmergebot removed the merging label Oct 8, 2024

pytorchmergebot added the merging label Oct 8, 2024

pytorchmergebot added the Merged label Oct 8, 2024

pytorchmergebot closed this in d5493ed Oct 8, 2024

pytorchmergebot removed the merging label Oct 8, 2024

pytorchmergebot added the Reverted label Oct 8, 2024

pytorchmergebot reopened this Oct 8, 2024

StrongerXi mentioned this pull request Oct 8, 2024

[dynamo] Fix error when inlining certain nested closure returned by another function #137510

Closed

Up some more runners to 4xlarge

c87af06

pytorchmergebot added the merging label Oct 9, 2024

pytorchmergebot removed the merging label Oct 9, 2024

pytorchmergebot added the merging label Oct 9, 2024

pytorchmergebot closed this in df114a4 Oct 9, 2024

pytorchmergebot removed the merging label Oct 9, 2024

huydhn mentioned this pull request Oct 9, 2024

Increase the runner size of AVX* jobs to 4xlarge #137633

Closed

huydhn mentioned this pull request Oct 13, 2024

slow workflow has been broken for 4+ weeks #136694

Closed

chunyuan-w mentioned this pull request Oct 14, 2024

Adds support for accelerated sorting with x86-simd-sort #127936

Closed

zhuhaozhe mentioned this pull request Nov 5, 2024

[inductor] refine loop split logic #128812

Closed

sanchitintel mentioned this pull request Nov 29, 2024

Revert - Adds support for accelerated sorting with x86-simd-sort #141782

Closed

robert-hardwick mentioned this pull request Feb 5, 2025

[ARM] Unit test failure on Neoverse-N1 - CPUReproTests.test_lstm_packed #146483

Open

Parametrize test_lstm_packed #137447

Parametrize test_lstm_packed #137447

Uh oh!

Conversation

huydhn commented Oct 7, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137447

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

huydhn commented Oct 8, 2024

Uh oh!

pytorchmergebot commented Oct 8, 2024

Merge started

Uh oh!

pytorchmergebot commented Oct 8, 2024

Merge failed

Uh oh!

huydhn commented Oct 8, 2024

Uh oh!

pytorchmergebot commented Oct 8, 2024

Merge started

Uh oh!

huydhn commented Oct 8, 2024

Uh oh!

pytorchmergebot commented Oct 8, 2024

Uh oh!

pytorchmergebot commented Oct 8, 2024

Uh oh!

huydhn commented Oct 9, 2024

Uh oh!

pytorchmergebot commented Oct 9, 2024

Merge started

Uh oh!

pytorchmergebot commented Oct 9, 2024

Merge failed

Uh oh!

huydhn commented Oct 9, 2024

Uh oh!

pytorchmergebot commented Oct 9, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

huydhn commented Oct 7, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 7, 2024 •

edited

Loading