Skip to content

Add checks for empty tensor list#155383

Closed
malfet wants to merge 5 commits intogh/malfet/385/basefrom
gh/malfet/385/head
Closed

Add checks for empty tensor list#155383
malfet wants to merge 5 commits intogh/malfet/385/basefrom
gh/malfet/385/head

Conversation

@malfet
Copy link
Contributor

@malfet malfet commented Jun 7, 2025

Stack from ghstack (oldest at bottom):

Vibe-coded with Codex, after collecting a backtrace, see https://chatgpt.com/s/cd_68438be8a1248191adbfa0a5f000e60b

Even though, check for empty tensor list exists in at::cat crash might happens while resolving named dimension to position, by calling dimname_to_position(tensors[0], dim), see backtrace below

(lldb) up
frame #1: 0x00000001101146dc libtorch_cpu.dylib`at::TensorBase::has_names(this=0x0000000000000000) const at TensorBase.h:559:10
   556 	  bool has_names() const {
   557 	    // If a user is using unnamed tensors, then we can short-circuit right here.
   558 	    // Otherwise, impl::has_names attempts to retrieve names.
-> 559 	    if (!impl_->has_named_tensor_meta()) {
   560 	      return false;
   561 	    }
   562 	    return impl::has_names(unsafeGetTensorImpl());
(lldb) up
frame #2: 0x00000001101144c4 libtorch_cpu.dylib`at::dimname_to_position(tensor=0x0000000000000000, dim=Dimname @ 0x000000016fdfe348) at NamedTensorUtils.cpp:23:3
   20  	int64_t dimname_to_position(const Tensor& tensor, Dimname dim) {
   21  	  TORCH_CHECK(dim.type() != NameType::WILDCARD,
   22  	      "Please look up dimensions by name, got: name = None.");
-> 23  	  TORCH_CHECK(tensor.has_names(),
   24  	      "Name ", dim, " not found in ", toDimnameRepr(tensor), ".");
   25  	  const auto names = tensor.names();
   26  	

TODOs:

  • May be move test from test_tensor_creation.py to OpInfo (not sure which one is more readable)
  • Replace TORCH_CHECK with TORCH_CHECK_VALUE and adjust unit tests

Fixes #155306

[ghstack-poisoned]
@malfet malfet mentioned this pull request Jun 7, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155383

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 6fe8e6a with merge base 7e4c097 (image):

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@malfet malfet requested review from Skylion007 and ezyang June 7, 2025 00:52
@malfet malfet added release notes: python_frontend python frontend release notes category topic: bug fixes topic category labels Jun 7, 2025
malfet added 3 commits June 6, 2025 18:16
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@ezyang
Copy link
Contributor

ezyang commented Jun 7, 2025

Can you confirm from the transcript that the tests fail before the change?

@ezyang
Copy link
Contributor

ezyang commented Jun 7, 2025

Also the diff is kind of suspicious, all the places annotated with checks call functions I would also expect to do the non empty check

@malfet
Copy link
Contributor Author

malfet commented Jun 7, 2025

Can you confirm from the transcript that the tests fail before the change?

Yes, run python -c "import torch; torch.concat([], dim='N')" now and it will crash. Add empty check to concat and it will throw the exception.

Also the diff is kind of suspicious, all the places annotated with checks call functions I would also expect to do the non empty check

It crashes inside dimname_to_position that is called with tensors[0] as first argument(see backtrace below). If it's documented somewhere, that calling tensor[0] on empty tensor list guarantees to return nullptr, I can move the check there.

Process 88909 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000110114940 libtorch_cpu.dylib`c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::operator->(this=0x0000000000000000) const at intrusive_ptr.h:423:12
   420 	  }
   421 	
   422 	  TTarget* operator->() const noexcept {
-> 423 	    return target_;
   424 	  }
   425 	
   426 	  operator bool() const noexcept {
Target 0: (Python) stopped.
(lldb) up
frame #1: 0x00000001101146dc libtorch_cpu.dylib`at::TensorBase::has_names(this=0x0000000000000000) const at TensorBase.h:559:10
   556 	  bool has_names() const {
   557 	    // If a user is using unnamed tensors, then we can short-circuit right here.
   558 	    // Otherwise, impl::has_names attempts to retrieve names.
-> 559 	    if (!impl_->has_named_tensor_meta()) {
   560 	      return false;
   561 	    }
   562 	    return impl::has_names(unsafeGetTensorImpl());
(lldb) up
frame #2: 0x00000001101144c4 libtorch_cpu.dylib`at::dimname_to_position(tensor=0x0000000000000000, dim=Dimname @ 0x000000016fdfe348) at NamedTensorUtils.cpp:23:3
   20  	int64_t dimname_to_position(const Tensor& tensor, Dimname dim) {
   21  	  TORCH_CHECK(dim.type() != NameType::WILDCARD,
   22  	      "Please look up dimensions by name, got: name = None.");
-> 23  	  TORCH_CHECK(tensor.has_names(),
   24  	      "Name ", dim, " not found in ", toDimnameRepr(tensor), ".");
   25  	  const auto names = tensor.names();
   26  	
(lldb) bt
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000000014007f588 libtorch_cpu.dylib`at::dimname_to_position(at::Tensor const&, at::Dimname) + 40
    frame #1: 0x00000001407b87c0 libtorch_cpu.dylib`at::native::concat(c10::ArrayRef<at::Tensor>, at::Dimname) + 36
    frame #2: 0x00000001410f0c24 libtorch_cpu.dylib`at::_ops::concat_names::call(c10::ArrayRef<at::Tensor>, at::Dimname) + 468
    frame #3: 0x00000001028cef24 libtorch_python.dylib`torch::autograd::THPVariable_concat(_object*, _object*, _object*) + 944


// torch.concat, alias for torch.cat
Tensor& concat_out(TensorList tensors, Dimname dim, Tensor& result) {
TORCH_CHECK(!tensors.empty(), "expected a non-empty list of Tensors");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TORCH_CHECK(!tensors.empty(), "expected a non-empty list of Tensors");
TORCH_CHECK_VALUE(!tensors.empty(), "expected a non-empty list of Tensors");

These should all be ValueErrors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll replace all of them in #155460

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood the change, current location is good.

@malfet
Copy link
Contributor Author

malfet commented Jun 8, 2025

@pytorchbot merge -f "Lint is green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/malfet/385/head branch July 13, 2025 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged release notes: python_frontend python frontend release notes category topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants