[inductor] Make DtypeView work with cpp_wrapper without abi_compatible #136233

davidberard98 · 2024-09-17T22:47:56Z

Stack from ghstack (oldest at bottom):

-> [inductor] Make DtypeView work with cpp_wrapper without abi_compatible #136233

Prior to this PR, using cpp_wrapper without abi_compatible could result in incorrect dtypes.

The following block of code implements cpp_wrapper codegen for reinterpret_view for abi_compatible mode, but not for non-abi_compatible mode.

pytorch/torch/_inductor/codegen/cpp_wrapper_cpu.py

Lines 1678 to 1814 in f6f1504

    
           def codegen_reinterpret_view( 
        
               self, data, size_list, stride_list, offset, writer, dtype=None 
        
           ) -> str: 
        
               dim = str(len(size_list)) 
        
               original_offset = offset 
        
               size = self.codegen_shape_tuple(size_list) 
        
               stride = self.codegen_shape_tuple(stride_list) 
        
               offset = self.codegen_sizevar(offset) 
        
               call_strs = [] 
        
               if config.abi_compatible: 
        
                   final_tmp_name = None 
        
                   final_tmp_name_is_RAIIAtenTensorHandle = False 
        
                   def create_reinterpret_call(): 
        
                       tmp_name = f"tmp_tensor_handle_{next(self.tmp_tensor_id)}" 
        
                       args = [ 
        
                           f"{data.get_name()}", 
        
                           dim, 
        
                           self.codegen_int_array_var( 
        
                               size, 
        
                               writer, 
        
                               known_statically=self.is_statically_known_list_of_ints( 
        
                                   size_list 
        
                               ), 
        
                               graph=self.get_codegened_graph(), 
        
                           ), 
        
                           self.codegen_int_array_var( 
        
                               stride, 
        
                               writer, 
        
                               known_statically=self.is_statically_known_list_of_ints( 
        
                                   stride_list 
        
                               ), 
        
                               graph=self.get_codegened_graph(), 
        
                           ), 
        
                           offset, 
        
                       ] 
        
                       call_str = ( 
        
                           f"auto {tmp_name} = reinterpret_tensor_wrapper({', '.join(args)});" 
        
                       ) 
        
                       return tmp_name, call_str 
        
                   def create_dtypeview_call(reinterpret_call): 
        
                       tmp_AtenTensorHandle = ( 
        
                           f"tmp_{data.get_name()}_{next(self.tmp_tensor_id)}" 
        
                       ) 
        
                       call_strs = [f"AtenTensorHandle {tmp_AtenTensorHandle};"] 
        
                       dtype_name = str(dtype).split(".")[-1] 
        
                       device_name = data.layout.device.type 
        
                       get_dtype_function = f"aoti_torch_dtype_{dtype_name}" 
        
                       dtypeview_function = f"aoti_torch_{device_name}_view_dtype" 
        
                       call_strs.append( 
        
                           f"AOTI_TORCH_ERROR_CODE_CHECK({dtypeview_function}" 
        
                           f"({reinterpret_call}, {get_dtype_function}(), &{tmp_AtenTensorHandle}));" 
        
                       ) 
        
                       tmp_RAIIAtenTensorHandle = ( 
        
                           f"tmp_{data.get_name()}_{next(self.tmp_tensor_id)}_handle" 
        
                       ) 
        
                       call_strs.append( 
        
                           f"RAIIAtenTensorHandle {tmp_RAIIAtenTensorHandle}({tmp_AtenTensorHandle});" 
        
                       ) 
        
                       return tmp_RAIIAtenTensorHandle, call_strs 
        
                   if ( 
        
                       size_list == data.layout.size 
        
                       and stride_list == data.layout.stride 
        
                       and original_offset == data.layout.offset 
        
                   ): 
        
                       # pure dtypeview 
        
                       if dtype is not None and dtype != data.dtype: 
        
                           tmp_output_name, tmp_call_strs = create_dtypeview_call( 
        
                               data.get_name() 
        
                           ) 
        
                           call_strs.extend(tmp_call_strs) 
        
                           final_tmp_name = tmp_output_name 
        
                           final_tmp_name_is_RAIIAtenTensorHandle = True 
        
                       else: 
        
                           return f"{data.get_name()}" 
        
                   else: 
        
                       # firstly create reinterpretview 
        
                       final_tmp_name, reinterpret_call = create_reinterpret_call() 
        
                       call_strs.append(reinterpret_call) 
        
                       if dtype is not None and dtype != data.dtype: 
        
                           # wrap it with dtypeview 
        
                           final_tmp_name, tmp_call_strs = create_dtypeview_call( 
        
                               reinterpret_call 
        
                           ) 
        
                           call_strs.extend(tmp_call_strs) 
        
                   # Because the memory planning is done in two passes (see the implementation 
        
                   # of self.generate), the writeline behavior is different in the two passes. 
        
                   if writer is None: 
        
                       writer = self 
        
                   writer.writelines(call_strs) 
        
                   if ( 
        
                       self.can_stack_allocate_buffer(data) 
        
                       and self.is_statically_known_list_of_ints(size_list) 
        
                       and self.is_statically_known_list_of_ints(stride_list) 
        
                       and ir.is_contiguous_strides_for_shape(stride_list, size_list) 
        
                   ): 
        
                       return final_tmp_name 
        
                   # NB, the return handle here represents a temporary tensor, which will be automatically 
        
                   # released. 
        
                   # Here's a sample usage in the cpp wrapper code: 
        
                   # ``` 
        
                   # aoti_torch_addmm_out( 
        
                   #     buf1, 
        
                   #     arg1_1, 
        
                   #     RAIIAtenTensorHandle(tmp_tensor_handle_0), 
        
                   #     buf0, 
        
                   #     1L, 
        
                   #     1L)); 
        
                   # ``` 
        
                   # RAIIAtenTensorHandle(tmp_tensor_handle_0) will be released after the call to addmm_out. 
        
                   # This could be problematic when it's used in a different pattern, for example: 
        
                   # ```` 
        
                   # AtenTensorHandle tensor_args[] = {RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6}; 
        
                   # aoti_torch_proxy_executor_call_function(..., tensor_args); 
        
                   # ```` 
        
                   # RAIIAtenTensorHandle(tmp_tensor_handle_2) will be invalid when it's used in the latter 
        
                   # kernel call. 
        
                   # 
        
                   # This is solved by updating the proxy_executor invocation to 
        
                   # ``` 
        
                   # aoti_torch_proxy_executor_call_function(..., 
        
                   #     std::vector<AtenTensorHandle>{ 
        
                   #         RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6 
        
                   #     }.data() 
        
                   # ); 
        
                   # ``` 
        
                   if not final_tmp_name_is_RAIIAtenTensorHandle: 
        
                       return f"wrap_with_raii_handle_if_needed({final_tmp_name})" 
        
                   else: 
        
                       return final_tmp_name 
        
               else: 
        
                   args = [data.get_name(), size, stride, offset] 
        
                   return f"reinterpret_tensor({', '.join(args)})"

Added a test that verifies that we keep the view behavior, but returned tensors also have correct dtypes.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Prior to this PR, using cpp_wrapper without abi_compatible could result in incorrect dtypes. The following block of code implements cpp_wrapper codegen for reinterpret_view for abi_compatible mode, but not for non-abi_compatible mode. https://github.com/pytorch/pytorch/blob/f6f1504d39c92f1ab2a1ee10a7da97745593151f/torch/_inductor/codegen/cpp_wrapper_cpu.py#L1678-L1814 Added a test that verifies that we keep the view behavior, but returned tensors also have correct dtypes. [ghstack-poisoned]

pytorch-bot · 2024-09-17T22:48:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136233

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9d00d92 with merge base 6a6f5b2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…i_compatible" Fixes #136159 Prior to this PR, using cpp_wrapper without abi_compatible could result in incorrect dtypes. The following block of code implements cpp_wrapper codegen for reinterpret_view for abi_compatible mode, but not for non-abi_compatible mode. https://github.com/pytorch/pytorch/blob/f6f1504d39c92f1ab2a1ee10a7da97745593151f/torch/_inductor/codegen/cpp_wrapper_cpu.py#L1678-L1814 Added a test that verifies that we keep the view behavior, but returned tensors also have correct dtypes. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Prior to this PR, using cpp_wrapper without abi_compatible could result in incorrect dtypes. The following block of code implements cpp_wrapper codegen for reinterpret_view for abi_compatible mode, but not for non-abi_compatible mode. https://github.com/pytorch/pytorch/blob/f6f1504d39c92f1ab2a1ee10a7da97745593151f/torch/_inductor/codegen/cpp_wrapper_cpu.py#L1678-L1814 Added a test that verifies that we keep the view behavior, but returned tensors also have correct dtypes. ghstack-source-id: ff22357 Pull Request resolved: #136233

FindHao

LGTM!

davidberard98 · 2024-09-18T22:47:15Z

@pytorchbot merge

pytorchmergebot · 2024-09-18T22:49:07Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-18T22:49:21Z

Merge failed

Reason: 13 mandatory check(s) failed. The first few are:

pull / linux-focal-py3.12-clang10 / test (default, 4, 4, lf.linux.2xlarge)
pull / linux-focal-py3.12-clang10 / test (dynamo, 3, 3, lf.linux.2xlarge)
pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 2, 3, linux.2xlarge)
pull / linux-focal-py3.12-clang10-experimental-split-build / test (dynamo, 3, 3, linux.2xlarge)
pull / linux-jammy-py3.9-gcc11 / test (default, 4, 4, lf.linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

davidberard98 · 2024-09-18T23:12:27Z

@pytorchbot merge -r

pytorchmergebot · 2024-09-18T23:14:02Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]

pytorchmergebot · 2024-09-18T23:14:15Z

Successfully rebased gh/davidberard98/334/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/136233)

Prior to this PR, using cpp_wrapper without abi_compatible could result in incorrect dtypes. The following block of code implements cpp_wrapper codegen for reinterpret_view for abi_compatible mode, but not for non-abi_compatible mode. https://github.com/pytorch/pytorch/blob/f6f1504d39c92f1ab2a1ee10a7da97745593151f/torch/_inductor/codegen/cpp_wrapper_cpu.py#L1678-L1814 Added a test that verifies that we keep the view behavior, but returned tensors also have correct dtypes. ghstack-source-id: 4151596 Pull Request resolved: #136233

pytorchmergebot · 2024-09-18T23:15:31Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-19T05:12:56Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

davidberard98 · 2024-09-19T16:20:47Z

@pytorchbot merge

pytorchmergebot · 2024-09-19T16:22:35Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-19T22:21:15Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

davidberard98 · 2024-09-20T00:16:03Z

@pytorchbot merge

pytorchmergebot · 2024-09-20T00:17:56Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-20T06:16:30Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

davidberard98 · 2024-09-20T06:29:56Z

@pytorchbot merge

pytorchmergebot · 2024-09-20T06:31:47Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-20T12:30:26Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

davidberard98 · 2024-09-20T17:28:49Z

@pytorchbot merge -f "waited for 2 days for gcp.a100 machine for CI, and the job just got rescheduled at the end of the queue - don't want to wait another 2 days as the queue is currently 1.7 days long."

pytorchmergebot · 2024-09-20T17:30:22Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 17, 2024

davidberard98 added topic: bug fixes topic category release notes: inductor labels Sep 17, 2024

davidberard98 requested review from desertfire and jansel September 18, 2024 05:16

davidberard98 marked this pull request as ready for review September 18, 2024 05:16

davidberard98 requested review from FindHao and eellison September 18, 2024 05:16

FindHao approved these changes Sep 18, 2024

View reviewed changes

eellison approved these changes Sep 18, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 18, 2024

pytorchmergebot added the merging label Sep 18, 2024

pytorchmergebot removed the merging label Sep 18, 2024

Update

9d00d92

[ghstack-poisoned]

pytorchmergebot added the merging label Sep 18, 2024

jansel approved these changes Sep 19, 2024

View reviewed changes

pytorchmergebot added the Merged label Sep 20, 2024

pytorchmergebot closed this in d7a6980 Sep 20, 2024

pytorchmergebot removed the merging label Sep 20, 2024

github-actions bot deleted the gh/davidberard98/334/head branch October 21, 2024 02:08

	def codegen_reinterpret_view(
	self, data, size_list, stride_list, offset, writer, dtype=None
	) -> str:
	dim = str(len(size_list))
	original_offset = offset
	size = self.codegen_shape_tuple(size_list)
	stride = self.codegen_shape_tuple(stride_list)
	offset = self.codegen_sizevar(offset)
	call_strs = []
	if config.abi_compatible:
	final_tmp_name = None
	final_tmp_name_is_RAIIAtenTensorHandle = False

	def create_reinterpret_call():
	tmp_name = f"tmp_tensor_handle_{next(self.tmp_tensor_id)}"
	args = [
	f"{data.get_name()}",
	dim,
	self.codegen_int_array_var(
	size,
	writer,
	known_statically=self.is_statically_known_list_of_ints(
	size_list
	),
	graph=self.get_codegened_graph(),
	),
	self.codegen_int_array_var(
	stride,
	writer,
	known_statically=self.is_statically_known_list_of_ints(
	stride_list
	),
	graph=self.get_codegened_graph(),
	),
	offset,
	]
	call_str = (
	f"auto {tmp_name} = reinterpret_tensor_wrapper({', '.join(args)});"
	)
	return tmp_name, call_str

	def create_dtypeview_call(reinterpret_call):
	tmp_AtenTensorHandle = (
	f"tmp_{data.get_name()}_{next(self.tmp_tensor_id)}"
	)
	call_strs = [f"AtenTensorHandle {tmp_AtenTensorHandle};"]
	dtype_name = str(dtype).split(".")[-1]
	device_name = data.layout.device.type
	get_dtype_function = f"aoti_torch_dtype_{dtype_name}"
	dtypeview_function = f"aoti_torch_{device_name}_view_dtype"
	call_strs.append(
	f"AOTI_TORCH_ERROR_CODE_CHECK({dtypeview_function}"
	f"({reinterpret_call}, {get_dtype_function}(), &{tmp_AtenTensorHandle}));"
	)
	tmp_RAIIAtenTensorHandle = (
	f"tmp_{data.get_name()}_{next(self.tmp_tensor_id)}_handle"
	)
	call_strs.append(
	f"RAIIAtenTensorHandle {tmp_RAIIAtenTensorHandle}({tmp_AtenTensorHandle});"
	)
	return tmp_RAIIAtenTensorHandle, call_strs

	if (
	size_list == data.layout.size
	and stride_list == data.layout.stride
	and original_offset == data.layout.offset
	):
	# pure dtypeview
	if dtype is not None and dtype != data.dtype:
	tmp_output_name, tmp_call_strs = create_dtypeview_call(
	data.get_name()
	)
	call_strs.extend(tmp_call_strs)
	final_tmp_name = tmp_output_name
	final_tmp_name_is_RAIIAtenTensorHandle = True
	else:
	return f"{data.get_name()}"
	else:
	# firstly create reinterpretview
	final_tmp_name, reinterpret_call = create_reinterpret_call()
	call_strs.append(reinterpret_call)

	if dtype is not None and dtype != data.dtype:
	# wrap it with dtypeview
	final_tmp_name, tmp_call_strs = create_dtypeview_call(
	reinterpret_call
	)
	call_strs.extend(tmp_call_strs)
	# Because the memory planning is done in two passes (see the implementation
	# of self.generate), the writeline behavior is different in the two passes.
	if writer is None:
	writer = self
	writer.writelines(call_strs)
	if (
	self.can_stack_allocate_buffer(data)
	and self.is_statically_known_list_of_ints(size_list)
	and self.is_statically_known_list_of_ints(stride_list)
	and ir.is_contiguous_strides_for_shape(stride_list, size_list)
	):
	return final_tmp_name

	# NB, the return handle here represents a temporary tensor, which will be automatically
	# released.
	# Here's a sample usage in the cpp wrapper code:
	# ```
	# aoti_torch_addmm_out(
	# buf1,
	# arg1_1,
	# RAIIAtenTensorHandle(tmp_tensor_handle_0),
	# buf0,
	# 1L,
	# 1L));
	# ```
	# RAIIAtenTensorHandle(tmp_tensor_handle_0) will be released after the call to addmm_out.
	# This could be problematic when it's used in a different pattern, for example:
	# ````
	# AtenTensorHandle tensor_args[] = {RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6};
	# aoti_torch_proxy_executor_call_function(..., tensor_args);
	# ````
	# RAIIAtenTensorHandle(tmp_tensor_handle_2) will be invalid when it's used in the latter
	# kernel call.
	#
	# This is solved by updating the proxy_executor invocation to
	# ```
	# aoti_torch_proxy_executor_call_function(...,
	# std::vector<AtenTensorHandle>{
	# RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6
	# }.data()
	# );
	# ```
	if not final_tmp_name_is_RAIIAtenTensorHandle:
	return f"wrap_with_raii_handle_if_needed({final_tmp_name})"
	else:
	return final_tmp_name
	else:
	args = [data.get_name(), size, stride, offset]
	return f"reinterpret_tensor({', '.join(args)})"

[inductor] Make DtypeView work with cpp_wrapper without abi_compatible #136233

[inductor] Make DtypeView work with cpp_wrapper without abi_compatible #136233

Uh oh!

Conversation

davidberard98 commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136233

✅ No Failures

Uh oh!

FindHao left a comment

Choose a reason for hiding this comment

Uh oh!

davidberard98 commented Sep 18, 2024

Uh oh!

pytorchmergebot commented Sep 18, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 18, 2024

Merge failed

Uh oh!

davidberard98 commented Sep 18, 2024

Uh oh!

pytorchmergebot commented Sep 18, 2024

Uh oh!

pytorchmergebot commented Sep 18, 2024

Uh oh!

pytorchmergebot commented Sep 18, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 19, 2024

Uh oh!

davidberard98 commented Sep 19, 2024

Uh oh!

pytorchmergebot commented Sep 19, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 19, 2024

Uh oh!

davidberard98 commented Sep 20, 2024

Uh oh!

pytorchmergebot commented Sep 20, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 20, 2024

Uh oh!

davidberard98 commented Sep 20, 2024

Uh oh!

pytorchmergebot commented Sep 20, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 20, 2024

Uh oh!

davidberard98 commented Sep 20, 2024

Uh oh!

pytorchmergebot commented Sep 20, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

davidberard98 commented Sep 17, 2024 •

edited

Loading

pytorch-bot bot commented Sep 17, 2024 •

edited

Loading