[MPS][BE] Introduce MetalShaderLibary class #125550

malfet · 2024-05-05T13:52:04Z

That factors out a repeated pattern of creating a library/fetching a func from source

Typical usecase

static MetalShaderLibrary lib(SHADER_SOURCE);
...

id<MTLComputePipelineState> cplState = lib.getPipelieStateForFunc("kernel_name")

Make it possible to use with templated sources
Add scalarToMetalTypeString(const Tensor&) variant to avoid repeated scalarToMetalTypeString(t.scalar_type()) calls in the code

I.e. it makes no functional changes, but reduces MPS codebase size by 365 lines

That factors out a repeated pattern of creating a library/fetching a func from source TODO: Add templated flavor

pytorch-bot · 2024-05-05T13:52:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125550

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6d81c0a with merge base dba689b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kulinseth

Good cleanup , thanks..

malfet · 2024-05-06T17:13:49Z

@pytorchbot merge -f "MPS and lint are green, otherwise it's an no-op"

pytorchmergebot · 2024-05-06T17:15:39Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

PyTorch MPS backend for the most part relies on MPSGraph to provide specific operations, but recently more and more often one had to implement custom kernel here that were simply embedded in the operator codebase and were compiled directly using [`- id<MTLLibrary>newLibraryWithSource:options:error:`](https://developer.apple.com/documentation/metal/mtldevice/1433431-newlibrarywithsource) (first metal kernel to MPS backend was added in #82307 ) Later on, as number of operator grew, those were refactored into `MetalShaderLibrary` convenience class (see #125550 ) But as number of kernels keeps growing, it's time to make a next step and properly compile them into `.metalib` This PR does exactly that by: - Moving shader sources into separate .metal files - Adds check on whether full Xcode installed or just DeveloperTools - If full Xcode is installed, compiles and links shaders into .metallib for Metal-3.0(Available on MacOS 13) and Metal-3.1 standard (available on MacOS 14, can use bfloat) and bundles both using `-sectcreate` linker option and `getsectiondata` API call. `metallib_dummy.cpp` file is used to properly express dependencies between metallib build and torch_cpu link stages. Logic for generating metallibraries is loosely based on https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/kernels/CMakeLists.txt. - If only DeveloperTools CLI is installed, automatically wraps .metal into `_metallib.h` that contains shader source wrapped in `MetalShaderLibrary` Bulk of changes introduced in this PR are just moving code around. I.e. for every file that contains non-templated shader definition in `aten/src/ATen/native/mps/operators` folder, corresponding `.metal` file is created in `aten/src/ATen/native/mps/kernels` folder and embedded shader definition is replaced with the following ```cpp #ifndef PYTORCH_JIT_COMPILE_SHADERS static auto& lib = MetalShaderLibrary::getBundledLibrary(); #else #include <ATen/native/mps/OpName_metallib.h> #endif ``` Some historical stats: | PyTorch Version | Number of shaders in MPS | Ops added | | ------------- | ------------- | ---- | | 1.12 | 0 | | | 1.13 | 2 | bitwise_ops and index.out | | 2.0 | 4 | cross repeat and view) | | 2.1 | 9 | unary_ops, histogram, renorm, binary_ops | | 2.2 | 11 | gamma and bucketization | | 2.3 | 12 | naive_matmul (to workaround crash) | | 2.4 | 13 | quantized_mm | | 2.5 | 14 | fused_adam | Pros: - Better code structure/readability - Eventually allows one to use shared headers (and implement something like `TensorIterator`) - Faster runtime (as compilation is done ahead of time) and perhaps better optimized compiled kernels Cons: - Build process is a bit more complicated that it used to be - Need to maintain two codepath (as our CI builders only has DeveloperTools installed) Pull Request resolved: #138636 Approved by: https://github.com/manuelcandales

PyTorch MPS backend for the most part relies on MPSGraph to provide specific operations, but recently more and more often one had to implement custom kernel here that were simply embedded in the operator codebase and were compiled directly using [`- id<MTLLibrary>newLibraryWithSource:options:error:`](https://developer.apple.com/documentation/metal/mtldevice/1433431-newlibrarywithsource) (first metal kernel to MPS backend was added in pytorch#82307 ) Later on, as number of operator grew, those were refactored into `MetalShaderLibrary` convenience class (see pytorch#125550 ) But as number of kernels keeps growing, it's time to make a next step and properly compile them into `.metalib` This PR does exactly that by: - Moving shader sources into separate .metal files - Adds check on whether full Xcode installed or just DeveloperTools - If full Xcode is installed, compiles and links shaders into .metallib for Metal-3.0(Available on MacOS 13) and Metal-3.1 standard (available on MacOS 14, can use bfloat) and bundles both using `-sectcreate` linker option and `getsectiondata` API call. `metallib_dummy.cpp` file is used to properly express dependencies between metallib build and torch_cpu link stages. Logic for generating metallibraries is loosely based on https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/kernels/CMakeLists.txt. - If only DeveloperTools CLI is installed, automatically wraps .metal into `_metallib.h` that contains shader source wrapped in `MetalShaderLibrary` Bulk of changes introduced in this PR are just moving code around. I.e. for every file that contains non-templated shader definition in `aten/src/ATen/native/mps/operators` folder, corresponding `.metal` file is created in `aten/src/ATen/native/mps/kernels` folder and embedded shader definition is replaced with the following ```cpp #ifndef PYTORCH_JIT_COMPILE_SHADERS static auto& lib = MetalShaderLibrary::getBundledLibrary(); #else #include <ATen/native/mps/OpName_metallib.h> #endif ``` Some historical stats: | PyTorch Version | Number of shaders in MPS | Ops added | | ------------- | ------------- | ---- | | 1.12 | 0 | | | 1.13 | 2 | bitwise_ops and index.out | | 2.0 | 4 | cross repeat and view) | | 2.1 | 9 | unary_ops, histogram, renorm, binary_ops | | 2.2 | 11 | gamma and bucketization | | 2.3 | 12 | naive_matmul (to workaround crash) | | 2.4 | 13 | quantized_mm | | 2.5 | 14 | fused_adam | Pros: - Better code structure/readability - Eventually allows one to use shared headers (and implement something like `TensorIterator`) - Faster runtime (as compilation is done ahead of time) and perhaps better optimized compiled kernels Cons: - Build process is a bit more complicated that it used to be - Need to maintain two codepath (as our CI builders only has DeveloperTools installed) Pull Request resolved: pytorch#138636 Approved by: https://github.com/manuelcandales

[MPS][BE] Introduce MetalShaderLibary class

343475c

That factors out a repeated pattern of creating a library/fetching a func from source TODO: Add templated flavor

malfet requested a review from Skylion007 May 5, 2024 13:52

malfet requested a review from kulinseth as a code owner May 5, 2024 13:52

pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels May 5, 2024

malfet added 5 commits May 5, 2024 06:54

Fix lint

83b3c1c

Fix typo

d0cf200

And improve it a bit further

b0ca6c8

Add those

b032b30

More cleanups

6d81c0a

kulinseth approved these changes May 6, 2024

View reviewed changes

pytorchmergebot added the merging label May 6, 2024

pytorchmergebot closed this in e30e6d3 May 6, 2024

pytorchmergebot added Merged and removed merging labels May 6, 2024

github-actions bot deleted the malfet/mps-be-refactor-repeated-code branch June 6, 2024 01:54

qqaatw mentioned this pull request Jun 13, 2024

[MPS] Fused Adam & AdamW #127242

Closed

malfet mentioned this pull request Oct 24, 2024

[MPS] Compile kernels into Metallib #138636

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MPS][BE] Introduce MetalShaderLibary class #125550

[MPS][BE] Introduce MetalShaderLibary class #125550

Uh oh!

malfet commented May 5, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 5, 2024 •

edited

Loading

Uh oh!

kulinseth left a comment

Uh oh!

malfet commented May 6, 2024

Uh oh!

pytorchmergebot commented May 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[MPS][BE] Introduce MetalShaderLibary class #125550

[MPS][BE] Introduce MetalShaderLibary class #125550

Uh oh!

Conversation

malfet commented May 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125550

✅ No Failures

Uh oh!

kulinseth left a comment

Choose a reason for hiding this comment

Uh oh!

malfet commented May 6, 2024

Uh oh!

pytorchmergebot commented May 6, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

malfet commented May 5, 2024 •

edited

Loading

pytorch-bot bot commented May 5, 2024 •

edited

Loading