-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Cwrap arg assign #1102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cwrap arg assign #1102
Conversation
3e8cd49 to
c8ee0a1
Compare
apaszke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I only have some suggestions about the naming, but that's it. I only want to take a look about the generated files before merging it.
tools/cwrap/cwrap.py
Outdated
| call_arg = ', '.join(call_arg) | ||
| for plugin in self.plugins: | ||
| arg_unpack = plugin.process_all_unpacks(arg_unpack, option) | ||
| call_arg = plugin.process_all_unpacks(call_arg, option) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
tools/cwrap/cwrap.py
Outdated
| result.append(res) | ||
| return result | ||
|
|
||
| def build_arg_assign(self, arguments, arg_unpack): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
tools/cwrap/cwrap.py
Outdated
| assignement = [] | ||
| call_arg = [] | ||
| # If type names needs to be changed | ||
| arguments = self.get_formal_args(arguments) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
The generated code looks good too! |
|
@pytorchbot add to whitelist |
|
Thank you! |
…ions (pytorch#1131) Fixes pytorch#1102 This PR implements the second approach mentioned in pytorch#1102 For example, indexing and predicates are changed from: ``` = T0[(((((nvfuser_index_t)blockIdx.x) * ((nvfuser_index_t)blockDim.y)) + ((nvfuser_index_t)threadIdx.y)) * T0.stride[0])] ``` to: ``` = T0[(((((nvfuser_index_t)blockIdx.x) * 4) + ((nvfuser_index_t)threadIdx.y)) * T0.stride[0])] ``` The use of `blockDim.y` is replaced by the extent of the second axis of `T0`, which is `4` in this case. This change only matters when a parallel type is not exact (in this case `TIDy`). The indexing change only needed to change `getExtent` in index_compute.cpp. However, we also need to predicate `threadIdx` and `blockIdx` to be smaller than IterDomain extents. That's implemented as `ParallelizedDomainPredicate` in predicate_compute.h.
* Validate MacOS conda packages and wheels * Call all matrices using reusable workflow
Implement iRoPE for llama4 Needs the fix from PyTorch: pytorch#151270
My first
cwrapPR, thorough review needed !Extend cwrap stucture to assign the arguments to variables named
arg_*before the function call.This simplifies the
before_callthat can now access directly the arguments (the previous behaviour with${arg0}is still allowed right now for backward compatibility).Also allows the
before_callto modify the arguments!!Also fixes #146 as the unpacking is done before.
Sample before:
Sample after:
THTensor* arg_self = ((THPTensor*)self)->cdata; PyThreadState *_save = NULL; try { long ndim = arg_self->nDimension; THPUtils_assert(ndim == 2, "t_() expects a 2D tensor, but self is %ldD", ndim); Py_UNBLOCK_THREADS; THTensor_(transpose)(LIBRARY_STATE arg_self, arg_self, 0, 1); Py_BLOCK_THREADS; Py_INCREF(self); return (PyObject*)self; } catch (...) { if (_save) { Py_BLOCK_THREADS; } throw; }