Skip to content

Conversation

@DenisVieriu97
Copy link
Collaborator

Use bindless Argument Buffers (unbounded arrays) for advanced indexing kernels - this allows caching of the PSOs since we don't have to query anymore the main metal function for the AB size (this is filled directly now on the CPU).

* Add pso caching for advanced indexing kernels

* Fix build on macos 12

* Fix macOS Monterey build

* Fix build failure

* Use MPSAllocator buffer pool to allocate memory for indexing kernels

* Revert changes
@DenisVieriu97 DenisVieriu97 added the ciflow/mps Run MPS tests (subset of trunk) label Apr 24, 2023
@DenisVieriu97 DenisVieriu97 requested a review from razarmehr April 24, 2023 02:47
@pytorch-bot pytorch-bot bot added the release notes: mps Release notes category label Apr 24, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Apr 24, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99855

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit eeac863:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@kulinseth
Copy link
Collaborator

@pytorchbot merge -f "All tests are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

bool isMacOS13Plus(MacOSVersion version) const;

MTLFunction_t metalIndexingFunction(const std::string &kernel, MTLFunctionConstantValues_t constantValues);
MTLComputePipelineState_t metalIndexingFunction(const std::string &kernel);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DenisVieriu97, do you think it would be better to rename it as metalIndexingPSO? If so, I can propose a PR for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqaatw this sounds good! Please go ahead with the change

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DenisVieriu97 here is the PR :) #101156

pytorchmergebot pushed a commit that referenced this pull request Jan 7, 2024
As [`newFunctionWithName:`](https://developer.apple.com/documentation/metal/mtllibrary/1515524-newfunctionwithname) does not accept error argument, do not attempt to print it as it'll be guaranteed `nil` at that point, that results in a classic null pointer dereference, when `TORCH_CHECK` will attempt to construct `std::string` from it. See below backtrace for example:
```
 thread #1, queue = 'metal gpu stream', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000018a316dc4 libsystem_platform.dylib`_platform_strlen + 4
    frame #1: 0x00000001471011bc libtorch_cpu.dylib`std::__1::__constexpr_strlen[abi:v160006](__str=0x0000000000000000) at cstring:114:10
    frame #2: 0x0000000147100c24 libtorch_cpu.dylib`std::__1::char_traits<char>::length(__s=0x0000000000000000) at char_traits.h:220:12
  * frame #3: 0x0000000147100bf0 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::operator<<[abi:v160006]<std::__1::char_traits<char>>(__os=0x000000016fdfb3a0, __str=0x0000000000000000) at ostream:901:57
    frame #4: 0x0000000147100bb4 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb5d0) at StringUtil.h:55:6
    frame #5: 0x00000001471007ac libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10
    frame #6: 0x0000000147101444 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10
    frame #7: 0x0000000147101404 libtorch_cpu.dylib`std::__1::basic_ostream<char, std::__1::char_traits<char>>& c10::detail::_str<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char const*, char const*>(ss=0x000000016fdfb3a0, t=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:68:10
    frame #8: 0x000000014710137c libtorch_cpu.dylib`c10::detail::_str_wrapper<char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, char const*, char const* const&>::call(args=0x000000016fdfb500, args="index_select_32bit_idx32", args=0x000000016fdfb4f8, args=0x000000016fdfb5d0) at StringUtil.h:75:5
    frame #9: 0x0000000147101310 libtorch_cpu.dylib`decltype(auto) c10::str<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>(args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at StringUtil.h:111:10
    frame #10: 0x0000000147100210 libtorch_cpu.dylib`decltype(auto) c10::detail::torchCheckMsgImpl<char [53], std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, char [10], char const*>((null)="Expected indexFunction to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)", args={a\xcb\xa7H\x01\0\0\0}, args="index_select_32bit_idx32", args={\x96\xcb\xa7H\x01\0\0\0}, args=0x000000016fdfb5d0) at Exception.h:453:10
    frame #11: 0x00000001470fffe8 libtorch_cpu.dylib`at::mps::MPSDevice::metalIndexingPSO(this=0x0000600000381670, kernel="index_select_32bit_idx32") at MPSDevice.mm:62:3
```

This was introduced by #99855 that replaced `newFunctionWithName:constantValues:error:` with `newFunctionWithName:`
Pull Request resolved: #116938
Approved by: https://github.com/Skylion007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) Merged merging open source release notes: mps Release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants