Skip to content

Conversation

@loscrossos
Copy link
Contributor

CUDA Toolkit 12.9 has been out for a while. The build currently fails when it is installed as the builder checks against hardcoded values. this PR adds the value 12.9. a better mechanism would be to check dynamically that the major number is the same... maybe next time when CUDA13 comes out :)

@loadams loadams enabled auto-merge (squash) June 27, 2025 15:08
@loadams loadams merged commit 59bb08b into deepspeedai:master Jun 27, 2025
10 checks passed
@stas00
Copy link
Collaborator

stas00 commented Jul 3, 2025

@tjruwase, @loadams - perhaps we should switch to not check minor versions - so far 10.x, 11.x and 12.x we have had to make a PR to hardcode every CUDA minor release version - so every new CUDA release users suffer.

Perhaps if the history tells us something we should switch to a reverse logic and not accept the mismatch situation only if we know that some minor version is actually not compatible with the rest of the major version array.

p.s. a new release is needed. thank you!

@loadams
Copy link
Collaborator

loadams commented Jul 8, 2025

@tjruwase, @loadams - perhaps we should switch to not check minor versions - so far 10.x, 11.x and 12.x we have had to make a PR to hardcode every CUDA minor release version - so every new CUDA release users suffer.

Perhaps if the history tells us something we should switch to a reverse logic and not accept the mismatch situation only if we know that some minor version is actually not compatible with the rest of the major version array.

p.s. a new release is needed. thank you!

@stas00 - agreed on this, we probably should - I think this came from problems we saw in Cuda 11.1 and 11.2 early on if I recall. But those were fixed with the official versions.

And I missed this, but the new release should be out now.

@stas00
Copy link
Collaborator

stas00 commented Jul 8, 2025

the minor case started with 10.x already. I was the one who proposed to ignore the minor version diff, since it was often in a conflict with pre-installed system-wide cuda - vs cuda torch was built with.

So you're saying cuda 11.1 and 11.2 actually did need to be matched? If so what about my proposal to not require matching minor versions unless we know a particular minor version has to be matching?

lpnpcs pushed a commit to lpnpcs/DeepSpeed that referenced this pull request Jul 30, 2025
CUDA Toolkit 12.9 has been out for a while. The build currently fails
when it is installed as the builder checks against hardcoded values.
this PR adds the value 12.9. a better mechanism would be to check
dynamically that the major number is the same... maybe next time when
CUDA13 comes out :)

Signed-off-by: LosCrossos <[email protected]>
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Oct 4, 2025
CUDA Toolkit 12.9 has been out for a while. The build currently fails
when it is installed as the builder checks against hardcoded values.
this PR adds the value 12.9. a better mechanism would be to check
dynamically that the major number is the same... maybe next time when
CUDA13 comes out :)

Signed-off-by: LosCrossos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants