Skip to content

Conversation

@nWEIdia
Copy link
Collaborator

@nWEIdia nWEIdia commented Jan 26, 2025

@nWEIdia nWEIdia requested a review from jeffdaily as a code owner January 26, 2025 03:30
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145696

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 6 Cancelled Jobs, 72 Pending

As of commit 52354fd with merge base b808774 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@nWEIdia
Copy link
Collaborator Author

nWEIdia commented Jan 27, 2025

I may need to get rid of lingering cu121 jobs before merging this

CUSPARSELT_NAME="libcusparse_lt-linux-${arch_path}-0.5.2.1-archive"
CUSPARSELT_NAME="libcusparse_lt-linux-${arch_path}-0.6.2.3-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-${arch_path}/${CUSPARSELT_NAME}.tar.xz
elif [[ ${CUDA_VERSION:0:4} == "11.8" ]]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matching error would have been caught a bit easier if you made sure the last else in this chain raised an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you please clarify? cuda matching error, or cusparselt matching error?
Where is the "matching"?

Copy link
Collaborator

@Skylion007 Skylion007 Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh where no CUSPARSELT_NAME name is selected, and there it's blank. IE: None of the if else are matched

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he is referring to if none of the "if condition" matched, need to raise an CUSPARLT_NAME not defined error in a separate "else" clause.

@nWEIdia nWEIdia requested a review from a team as a code owner January 27, 2025 19:04
@nWEIdia
Copy link
Collaborator Author

nWEIdia commented Jan 27, 2025

cc @huydhn for help reviewing test_trymerge and test_filter* changes, thanks!

Copy link
Contributor

@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still one failed test https://hud.pytorch.org/pr/pytorch/pytorch/145696#36245113047, but overall LGTM! Please fix the test before landing

@nWEIdia nWEIdia changed the title [CI][CUDA][cuSPARSELt] cusparselt 0.6.3 related cleanups [CI][CUDA][cuSPARSELt] cusparselt 0.6.3 and cu121 related cleanups Jan 27, 2025
(pytorch#104312) to test the API. Since
the old PR has cu121, I cannot remove cu121 occurences here.

Some other cases, it is just testing remove suffix. So it is fine to not
use cu121 but cu124, well let's just use cu126
amdfaa and others added 10 commits January 27, 2025 14:56
…#145488)

pytorch#144989

This fixes tts_angular model on torchbench for `--export-aot-inductor`

I put meta function in cpp, as shape calculation requires cudnn API calls.
I've extracted shape calculation to be used in implementation as this logic has some non-trivial actions and comments.

```
└─ $ python benchmarks/dynamo/torchbench.py --only tts_angular --accuracy --no-translation-validation --inference --bfloat16 --export-aot-inductor --disable-cudagraphs --device cuda
loading model: 0it [00:00, ?it/s]WARNING:common:Model tts_angular does not support bfloat16, running with amp instead
loading model: 0it [00:01, ?it/s]
WARNING:common:Model tts_angular does not support bfloat16, running with amp instead
cuda eval  tts_angular
WARNING:common:Model tts_angular does not support bfloat16, running with amp instead
pass
```

Pull Request resolved: pytorch#145488
Approved by: https://github.com/eqy, https://github.com/zou3519
Let TORCHINDUCTOR_FX_GRAPH_CACHE=0 being respected in unit test. This is helpful if I want the compilation to happen for testing.   Setting INDUCTOR_TEST_DISABLE_FRESH_CACHE to 1 is not the same, since that will cause the generated wrapper file being deleted. But we may want to check those files after running a test.

Pull Request resolved: pytorch#141195
Approved by: https://github.com/masnesral, https://github.com/desertfire
…torch#145572)

Summary: I think skipping these tests is suboptimal. If we categorize as expected failures, then we'll see test failures when they start passing, which means they're more likely to be removed. As a skip, they quietly continue to skip.
Pull Request resolved: pytorch#145572
Approved by: https://github.com/Skylion007, https://github.com/zou3519
@nWEIdia
Copy link
Collaborator Author

nWEIdia commented Jan 27, 2025

Oops, sorry for the disaster. Closing.

@nWEIdia nWEIdia closed this Jan 27, 2025
pytorchmergebot pushed a commit that referenced this pull request Jan 28, 2025
…145793)

Make ci cusparselt installation be consistent with nightly binary
Remove cu121 related docker build jobs and inductor runs Update test failures relating to cu121

Retry of #145696
Pull Request resolved: #145793
Approved by: https://github.com/eqy, https://github.com/tinglvv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.