Skip to content

Conversation

@copybara-service
Copy link

PR #22512: [XLA:GPU] Enable cuDNN kernel for NVFP4 block scaled dot

Imported from GitHub PR openxla/xla#22512

Support NVFP4 in addition to MXFP8 hardware acceleration for the "__op$block_scaled_dot" custom call.

This PR also addresses some nits from the internal review (like renaming a generic CompositeType to a more specific CudnnMxType).
Copybara import of the project:

--
32e76a88b2107c079e26826417d22664cbf809a3 by Sergey Kozub [email protected]:

[XLA:GPU] Enable cuDNN kernel for NVFP4 block scaled dot

Merging this change closes #22512

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#22512 from openxla:skozub/block_scaling_nvfp4 32e76a88b2107c079e26826417d22664cbf809a3

Imported from GitHub PR openxla/xla#22512

Support NVFP4 in addition to MXFP8 hardware acceleration for the "__op$block_scaled_dot" custom call.

This PR also addresses some nits from the internal review (like renaming a generic `CompositeType` to a more specific `CudnnMxType`).
Copybara import of the project:

--
32e76a88b2107c079e26826417d22664cbf809a3 by Sergey Kozub <[email protected]>:

[XLA:GPU] Enable cuDNN kernel for NVFP4 block scaled dot

Merging this change closes #22512

PiperOrigin-RevId: 725985050
@copybara-service copybara-service bot force-pushed the exported_pr_725943746 branch from 7fb33fe to 491cbfb Compare February 12, 2025 11:42
@copybara-service copybara-service bot closed this Feb 12, 2025
@copybara-service copybara-service bot merged commit 491cbfb into master Feb 12, 2025
2 checks passed
@copybara-service copybara-service bot deleted the exported_pr_725943746 branch February 12, 2025 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Installation issue with Tensorflow-cpu, no module named '_pwyrap_tensorflow_internal'

1 participant