Skip to content

Conversation

@Mousius
Copy link
Contributor

@Mousius Mousius commented Dec 2, 2024

There are a number of cases where pattern matching differs based on the presence of ACL, causing the tests to fail. This adds TEST_ACL and skipIfACL so that these tests can still run with different values or be entirely skipped if necessary.

cc @malfet @snadampal @milpuz01 @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@Mousius Mousius requested a review from a team as a code owner December 2, 2024 23:43
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141921

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit e08321e with merge base 5bc09ac (image):

NEW FAILURE - The following job has failed:

  • pull / linux-focal-py3.9-clang10-onnx / build (gh)
    ##[error]Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/home/ec2-user/actions-runner/_work/pytorch/pytorch/.github/actions/get-workflow-job-id'. Did you forget to run actions/checkout before running your local action?

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Mousius
Copy link
Contributor Author

Mousius commented Dec 2, 2024

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Dec 2, 2024
@Mousius
Copy link
Contributor Author

Mousius commented Dec 2, 2024

@pytorchbot label "module: arm"

@pytorch-bot pytorch-bot bot added the module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 label Dec 2, 2024
@malfet malfet added the ciflow/linux-aarch64 linux aarch64 CI workflow label Dec 3, 2024
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use skipIfXYZ only if something catastrophically fails, otherwise do xfailIfXYZ
If change is compete, then please add test_mkldnn_pattern_matching to aarch64 test shard. If test is still work in progress, I would appreciate if you can split it into several PRs:

  • One that tweaks existing tests to match ACL patterns
  • Another that skips/xfails non-confirming tests, and explain why those should be be used when ACL is present (to distinguish between incorrect tests/perf tweaks and ones that actually will result in silent correctness)


NOTEST_CPU = "cpu" in split_if_not_empty(os.getenv('PYTORCH_TESTING_DEVICE_EXCEPT_FOR', ''))

skipIfACL = unittest.skipIf(TEST_ACL, "ACL is not supported")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not xfailIfACL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this suggestion 😸 my developer senses were tingling as I wrote it.

@Mousius
Copy link
Contributor Author

Mousius commented Dec 3, 2024

Please use skipIfXYZ only if something catastrophically fails, otherwise do xfailIfXYZ If change is compete, then please add test_mkldnn_pattern_matching to aarch64 test shard. If test is still work in progress, I would appreciate if you can split it into several PRs:

  • One that tweaks existing tests to match ACL patterns
  • Another that skips/xfails non-confirming tests, and explain why those should be be used when ACL is present (to distinguish between incorrect tests/perf tweaks and ones that actually will result in silent correctness)

Added to .ci/pytorch/test.sh, most of the xfail tests are to do with fusion which I've linked to in an above comment being explicitly blocked.

@cpuhrsch cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 3, 2024
There are a number of cases where pattern matching differs based on the
presence of ACL, causing the tests to fail. This adds `TEST_ACL` and
`skipIfACL` so that these tests can still run with different values or
be entirely skipped if necessary.
@Mousius Mousius force-pushed the test-mkldnn-acl-checks branch from b517f33 to 15e740c Compare December 5, 2024 17:54
@Mousius
Copy link
Contributor Author

Mousius commented Dec 5, 2024

@malfet this should be up to date now 😸

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates, looks good to me

@malfet
Copy link
Contributor

malfet commented Dec 5, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 5, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-14)

Details for Dev Infra team Raised by workflow job

@malfet
Copy link
Contributor

malfet commented Dec 6, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@malfet
Copy link
Contributor

malfet commented Dec 6, 2024

@pytorchbot merge -f "it finally looks ok"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorch-bot bot pushed a commit that referenced this pull request Dec 9, 2024
There are a number of cases where pattern matching differs based on the presence of ACL, causing the tests to fail. This adds `TEST_ACL` and `skipIfACL` so that these tests can still run with different values or be entirely skipped if necessary.

Pull Request resolved: #141921
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/linux-aarch64 linux aarch64 CI workflow ciflow/trunk Trigger trunk jobs on your pull request Merged module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 module: inductor open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants