Skip to content

Conversation

@davidberard98
Copy link
Contributor

@davidberard98 davidberard98 commented Sep 17, 2024

Stack from ghstack (oldest at bottom):

Fixes #136159

Prior to this PR, using cpp_wrapper without abi_compatible could result in incorrect dtypes.

The following block of code implements cpp_wrapper codegen for reinterpret_view for abi_compatible mode, but not for non-abi_compatible mode.

def codegen_reinterpret_view(
self, data, size_list, stride_list, offset, writer, dtype=None
) -> str:
dim = str(len(size_list))
original_offset = offset
size = self.codegen_shape_tuple(size_list)
stride = self.codegen_shape_tuple(stride_list)
offset = self.codegen_sizevar(offset)
call_strs = []
if config.abi_compatible:
final_tmp_name = None
final_tmp_name_is_RAIIAtenTensorHandle = False
def create_reinterpret_call():
tmp_name = f"tmp_tensor_handle_{next(self.tmp_tensor_id)}"
args = [
f"{data.get_name()}",
dim,
self.codegen_int_array_var(
size,
writer,
known_statically=self.is_statically_known_list_of_ints(
size_list
),
graph=self.get_codegened_graph(),
),
self.codegen_int_array_var(
stride,
writer,
known_statically=self.is_statically_known_list_of_ints(
stride_list
),
graph=self.get_codegened_graph(),
),
offset,
]
call_str = (
f"auto {tmp_name} = reinterpret_tensor_wrapper({', '.join(args)});"
)
return tmp_name, call_str
def create_dtypeview_call(reinterpret_call):
tmp_AtenTensorHandle = (
f"tmp_{data.get_name()}_{next(self.tmp_tensor_id)}"
)
call_strs = [f"AtenTensorHandle {tmp_AtenTensorHandle};"]
dtype_name = str(dtype).split(".")[-1]
device_name = data.layout.device.type
get_dtype_function = f"aoti_torch_dtype_{dtype_name}"
dtypeview_function = f"aoti_torch_{device_name}_view_dtype"
call_strs.append(
f"AOTI_TORCH_ERROR_CODE_CHECK({dtypeview_function}"
f"({reinterpret_call}, {get_dtype_function}(), &{tmp_AtenTensorHandle}));"
)
tmp_RAIIAtenTensorHandle = (
f"tmp_{data.get_name()}_{next(self.tmp_tensor_id)}_handle"
)
call_strs.append(
f"RAIIAtenTensorHandle {tmp_RAIIAtenTensorHandle}({tmp_AtenTensorHandle});"
)
return tmp_RAIIAtenTensorHandle, call_strs
if (
size_list == data.layout.size
and stride_list == data.layout.stride
and original_offset == data.layout.offset
):
# pure dtypeview
if dtype is not None and dtype != data.dtype:
tmp_output_name, tmp_call_strs = create_dtypeview_call(
data.get_name()
)
call_strs.extend(tmp_call_strs)
final_tmp_name = tmp_output_name
final_tmp_name_is_RAIIAtenTensorHandle = True
else:
return f"{data.get_name()}"
else:
# firstly create reinterpretview
final_tmp_name, reinterpret_call = create_reinterpret_call()
call_strs.append(reinterpret_call)
if dtype is not None and dtype != data.dtype:
# wrap it with dtypeview
final_tmp_name, tmp_call_strs = create_dtypeview_call(
reinterpret_call
)
call_strs.extend(tmp_call_strs)
# Because the memory planning is done in two passes (see the implementation
# of self.generate), the writeline behavior is different in the two passes.
if writer is None:
writer = self
writer.writelines(call_strs)
if (
self.can_stack_allocate_buffer(data)
and self.is_statically_known_list_of_ints(size_list)
and self.is_statically_known_list_of_ints(stride_list)
and ir.is_contiguous_strides_for_shape(stride_list, size_list)
):
return final_tmp_name
# NB, the return handle here represents a temporary tensor, which will be automatically
# released.
# Here's a sample usage in the cpp wrapper code:
# ```
# aoti_torch_addmm_out(
# buf1,
# arg1_1,
# RAIIAtenTensorHandle(tmp_tensor_handle_0),
# buf0,
# 1L,
# 1L));
# ```
# RAIIAtenTensorHandle(tmp_tensor_handle_0) will be released after the call to addmm_out.
# This could be problematic when it's used in a different pattern, for example:
# ````
# AtenTensorHandle tensor_args[] = {RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6};
# aoti_torch_proxy_executor_call_function(..., tensor_args);
# ````
# RAIIAtenTensorHandle(tmp_tensor_handle_2) will be invalid when it's used in the latter
# kernel call.
#
# This is solved by updating the proxy_executor invocation to
# ```
# aoti_torch_proxy_executor_call_function(...,
# std::vector<AtenTensorHandle>{
# RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6
# }.data()
# );
# ```
if not final_tmp_name_is_RAIIAtenTensorHandle:
return f"wrap_with_raii_handle_if_needed({final_tmp_name})"
else:
return final_tmp_name
else:
args = [data.get_name(), size, stride, offset]
return f"reinterpret_tensor({', '.join(args)})"

Added a test that verifies that we keep the view behavior, but returned tensors also have correct dtypes.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Prior to this PR, using cpp_wrapper without abi_compatible could result in incorrect dtypes.

The following block of code implements cpp_wrapper codegen for reinterpret_view for abi_compatible mode, but not for non-abi_compatible mode.

https://github.com/pytorch/pytorch/blob/f6f1504d39c92f1ab2a1ee10a7da97745593151f/torch/_inductor/codegen/cpp_wrapper_cpu.py#L1678-L1814

Added a test that verifies that we keep the view behavior, but returned tensors also have correct dtypes.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136233

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9d00d92 with merge base 6a6f5b2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…i_compatible"


Fixes #136159

Prior to this PR, using cpp_wrapper without abi_compatible could result in incorrect dtypes.

The following block of code implements cpp_wrapper codegen for reinterpret_view for abi_compatible mode, but not for non-abi_compatible mode.

https://github.com/pytorch/pytorch/blob/f6f1504d39c92f1ab2a1ee10a7da97745593151f/torch/_inductor/codegen/cpp_wrapper_cpu.py#L1678-L1814

Added a test that verifies that we keep the view behavior, but returned tensors also have correct dtypes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Sep 18, 2024
Prior to this PR, using cpp_wrapper without abi_compatible could result in incorrect dtypes.

The following block of code implements cpp_wrapper codegen for reinterpret_view for abi_compatible mode, but not for non-abi_compatible mode.

https://github.com/pytorch/pytorch/blob/f6f1504d39c92f1ab2a1ee10a7da97745593151f/torch/_inductor/codegen/cpp_wrapper_cpu.py#L1678-L1814

Added a test that verifies that we keep the view behavior, but returned tensors also have correct dtypes.

ghstack-source-id: ff22357
Pull Request resolved: #136233
@davidberard98 davidberard98 marked this pull request as ready for review September 18, 2024 05:16
Copy link
Member

@FindHao FindHao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@davidberard98
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 18, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@davidberard98
Copy link
Contributor Author

@pytorchbot merge -r

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/davidberard98/334/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/136233)

pytorchmergebot pushed a commit that referenced this pull request Sep 18, 2024
Prior to this PR, using cpp_wrapper without abi_compatible could result in incorrect dtypes.

The following block of code implements cpp_wrapper codegen for reinterpret_view for abi_compatible mode, but not for non-abi_compatible mode.

https://github.com/pytorch/pytorch/blob/f6f1504d39c92f1ab2a1ee10a7da97745593151f/torch/_inductor/codegen/cpp_wrapper_cpu.py#L1678-L1814

Added a test that verifies that we keep the view behavior, but returned tensors also have correct dtypes.

ghstack-source-id: 4151596
Pull Request resolved: #136233
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@davidberard98
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@davidberard98
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@davidberard98
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@davidberard98
Copy link
Contributor Author

@pytorchbot merge -f "waited for 2 days for gcp.a100 machine for CI, and the job just got rescheduled at the end of the queue - don't want to wait another 2 days as the queue is currently 1.7 days long."

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/davidberard98/334/head branch October 21, 2024 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants