Fix where() for NJT #141500

jbschlosser · 2024-11-25T19:17:49Z

Stack from ghstack (oldest at bottom):

Background: It's common to use scalar_tensor() in the input to where() to convert any scalars present to compatible tensors with matching options, including layout. This shows up in various places, notably including derivative formulas (example). It causes problems for NJTs because they have layout=torch.jagged and it never makes sense to create a scalar tensor with this layout. Some of the breakage only seems to happen in CI for reasons I don't fully understand (see the revert of #140736 due to softshrink's derivative formula).

This PR:

Allows non-contiguous NJT inputs to where() + adds tests for this
Handles scalar tensor / dense tensor inputs for condition / other + adds tests for this
- Uses limited broadcast_tensors() / broadcast_to() support
- Improves expand() to work on non-contig NJTs
Changes scalar_tensor() to use torch.strided instead of torch.jagged in both eager and torch.compile (i.e. meta registration)
Changes backward formulas for sinc, pow, special.i1, and special.i1e to uses scalar_tensor() instead of e.g. zeros({})

Alternative approach: Update all problematic usages of scalar_tensor() to avoid ever passing layout=torch.jagged. This is an extensive change and includes torch.where() logic, a bunch of derivative formulas, and likely other places not yet discovered.

[ghstack-poisoned]

pytorch-bot · 2024-11-25T19:17:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141500

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 1f627cd with merge base efec302 ():

NEW FAILURE - The following job has failed:

pull / linux-focal-py3.9-clang10-onnx / test (default, 2, 2, lf.linux.2xlarge) (gh)
huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-67460d26-62211f7812815890433eb989;099fc7cd-c2ef-4c23-a4f0-b1c1c41f2630)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

**Background:** It's common to use `scalar_tensor()` in the input to `where()` to convert any scalars present to compatible tensors with matching options, *including layout*. This shows up in various places, notably including derivative formulas ([example](https://github.com/pytorch/pytorch/blob/78491d6afc163d1d84e81c015fad695caa8ec98a/tools/autograd/derivatives.yaml#L432-L434)). It causes problems for NJTs because they have `layout=torch.jagged` and it never makes sense to create a scalar tensor with this layout. Some of the breakage only seems to happen in CI for reasons I don't fully understand (see the revert of #140736 due to softshrink's derivative formula). **This PR:** * Allows non-contiguous NJT inputs to `where()` + adds tests for this * Handles scalar tensor inputs for `condition` / `other` + adds tests for this * Changes `scalar_tensor()` to use `torch.strided` instead of `torch.jagged` in both eager and torch.compile (i.e. meta registration) **Alternative approach:** Update all problematic usages of `scalar_tensor()` to avoid ever passing `layout=torch.jagged`. This is an extensive change and includes `torch.where()` logic, a bunch of derivative formulas, and likely other places not yet discovered. [ghstack-poisoned]

jbschlosser · 2024-11-25T21:58:09Z

@pytorchbot merge

pytorchmergebot · 2024-11-25T21:59:53Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

**Background:** It's common to use `scalar_tensor()` in the input to `where()` to convert any scalars present to compatible tensors with matching options, *including layout*. This shows up in various places, notably including derivative formulas ([example](https://github.com/pytorch/pytorch/blob/78491d6afc163d1d84e81c015fad695caa8ec98a/tools/autograd/derivatives.yaml#L432-L434)). It causes problems for NJTs because they have `layout=torch.jagged` and it never makes sense to create a scalar tensor with this layout. Some of the breakage only seems to happen in CI for reasons I don't fully understand (see the revert of #140736 due to softshrink's derivative formula). **This PR:** * Allows non-contiguous NJT inputs to `where()` + adds tests for this * Handles scalar tensor / dense tensor inputs for `condition` / `other` + adds tests for this * Uses limited `broadcast_tensors()` / `broadcast_to()` support * Improves `expand()` to work on non-contig NJTs * Changes `scalar_tensor()` to use `torch.strided` instead of `torch.jagged` in both eager and torch.compile (i.e. meta registration) * Changes backward formulas for `sinc`, `pow`, `special.i1`, and `special.i1e` to uses `scalar_tensor()` instead of e.g. `zeros({})` **Alternative approach:** Update all problematic usages of `scalar_tensor()` to avoid ever passing `layout=torch.jagged`. This is an extensive change and includes `torch.where()` logic, a bunch of derivative formulas, and likely other places not yet discovered. [ghstack-poisoned]

pytorchmergebot · 2024-11-25T22:05:14Z

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team

Raised by workflow job

jbschlosser · 2024-11-25T22:05:36Z

@pytorchbot merge

pytorchmergebot · 2024-11-25T22:07:18Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-11-25T22:33:55Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-py3.12-clang10 / test (default, 2, 5, lf.linux.4xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

aten/src/ATen/native/TensorFactories.cpp

**Background:** It's common to use `scalar_tensor()` in the input to `where()` to convert any scalars present to compatible tensors with matching options, *including layout*. This shows up in various places, notably including derivative formulas ([example](https://github.com/pytorch/pytorch/blob/78491d6afc163d1d84e81c015fad695caa8ec98a/tools/autograd/derivatives.yaml#L432-L434)). It causes problems for NJTs because they have `layout=torch.jagged` and it never makes sense to create a scalar tensor with this layout. Some of the breakage only seems to happen in CI for reasons I don't fully understand (see the revert of #140736 due to softshrink's derivative formula). **This PR:** * Allows non-contiguous NJT inputs to `where()` + adds tests for this * Handles scalar tensor / dense tensor inputs for `condition` / `other` + adds tests for this * Uses limited `broadcast_tensors()` / `broadcast_to()` support * Improves `expand()` to work on non-contig NJTs * Changes `scalar_tensor()` to use `torch.strided` instead of `torch.jagged` in both eager and torch.compile (i.e. meta registration) * Changes backward formulas for `sinc`, `pow`, `special.i1`, and `special.i1e` to uses `scalar_tensor()` instead of e.g. `zeros({})` **Alternative approach:** Update all problematic usages of `scalar_tensor()` to avoid ever passing `layout=torch.jagged`. This is an extensive change and includes `torch.where()` logic, a bunch of derivative formulas, and likely other places not yet discovered. [ghstack-poisoned]

jbschlosser · 2024-11-26T17:50:01Z

@pytorchbot merge

pytorchmergebot · 2024-11-26T17:51:45Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-11-26T18:07:49Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-py3.9-clang10-onnx / test (default, 2, 2, lf.linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

jbschlosser · 2024-11-26T18:08:07Z

@pytorchbot merge -i

pytorchmergebot · 2024-11-26T18:10:33Z

Merge started

Your change will be merged while ignoring the following 1 checks: pull / linux-focal-py3.9-clang10-onnx / test (default, 2, 2, lf.linux.2xlarge)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

) Several activation functions were unimplemented due to missing `pointwise` tags. This PR adds them and corresponding backwards implementations. Pull Request resolved: #140736 Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch ghstack dependencies: #141500

This PR introduces `ExtraOpData`, a structure that contains op metadata regarding whether the op is a view and the dim-related args it accepts. It also populates a huge database for dim-wise / view ops with this info. Test logic (sample input generation, references) have been updated to utilize this data. It allows for a fairly generic set of sample inputs & a reference for the class of ops that accept a single NJT and operate dim-wise (AKA "unary dimwise ops"). Testing is added over the following ops: * `chunk()` * `narrow()` * `select()` * `split()` * `split_with_sizes()` * `squeeze()` * `unflatten()` * `unsqueeze()` Most of the above do not operate on the ragged / batch dims or on non-contiguous NJTs, so the proper xfails are added as needed. I also slipped in a couple minor fixes (sorry): 1. The `_wrap_jagged_dim()` helper now avoids assuming the `nt._ragged_idx == 1` and allows for a batch dim to be a valid input, disambiguating the converted inner dim as necessary through an additional `operating_on_batch` return value (i.e. both dim=0 and dim=1 map to dim=0 on the inner values tensor, since that dim represents a packed ragged dim for all batch items) 2. Padded dense -> NJT conversion requires shape gymnastics to operate with the restrictive FBGEMM kernel. The gymnastics were slightly wrong for the transposed NJT case, and this PR fixes that Pull Request resolved: #140161 Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch ghstack dependencies: #141500, #140736

This PR contains three `unsqueeze()`-related fixes for NJT: 1. Adjusts the output's `_ragged_idx` when `unsqueeze()` inserts a dim before the ragged dim 2. Corrects the unbind reference for `unsqueeze()` after the last input dim. For this case, the dim kwarg canonicalization logic needs to be applied wrt `inp.dim() + 1` to account for `dim=-1` properly 3. Adds ragged dim support to `unsqueeze()`, allowing for e.g. `(B, j1, D) -> (B, 1, j1, D)`. This is okay now after #137125 Note that `unsqueeze()` still doesn't support batch dim operation, and arguably should never support this. Pull Request resolved: #141392 Approved by: https://github.com/cpuhrsch ghstack dependencies: #141500, #140736, #140161

) This fixes some bugs when performing reductions / select() on dims before the ragged dim. In this case, the output NJT has a smaller number of dims, and its ragged_idx should reflect that correctly. Pull Request resolved: pytorch#141506 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer ghstack dependencies: pytorch#141500, pytorch#140736, pytorch#140161, pytorch#141392

…141604) Old logic was completely wrong, returning `chunk_size` chunks instead of the intended number. The original test didn't catch this because `chunk_size == num_chunks` :p New OpInfo-based testing covers it though. Pull Request resolved: #141604 Approved by: https://github.com/soulitzer ghstack dependencies: #141500, #140736, #140161, #141392, #141506

**Background:** It's common to use `scalar_tensor()` in the input to `where()` to convert any scalars present to compatible tensors with matching options, *including layout*. This shows up in various places, notably including derivative formulas ([example](https://github.com/pytorch/pytorch/blob/78491d6afc163d1d84e81c015fad695caa8ec98a/tools/autograd/derivatives.yaml#L432-L434)). It causes problems for NJTs because they have `layout=torch.jagged` and it never makes sense to create a scalar tensor with this layout. Some of the breakage only seems to happen in CI for reasons I don't fully understand (see the revert of pytorch#140736 due to softshrink's derivative formula). **This PR:** * Allows non-contiguous NJT inputs to `where()` + adds tests for this * Handles scalar tensor / dense tensor inputs for `condition` / `other` + adds tests for this * Uses limited `broadcast_tensors()` / `broadcast_to()` support * Improves `expand()` to work on non-contig NJTs * Changes `scalar_tensor()` to use `torch.strided` instead of `torch.jagged` in both eager and torch.compile (i.e. meta registration) * Changes backward formulas for `sinc`, `pow`, `special.i1`, and `special.i1e` to uses `scalar_tensor()` instead of e.g. `zeros({})` **Alternative approach:** Update all problematic usages of `scalar_tensor()` to avoid ever passing `layout=torch.jagged`. This is an extensive change and includes `torch.where()` logic, a bunch of derivative formulas, and likely other places not yet discovered. Pull Request resolved: pytorch#141500 Approved by: https://github.com/malfet, https://github.com/cpuhrsch, https://github.com/soulitzer

…rch#140736) Several activation functions were unimplemented due to missing `pointwise` tags. This PR adds them and corresponding backwards implementations. Pull Request resolved: pytorch#140736 Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch ghstack dependencies: pytorch#141500

This PR introduces `ExtraOpData`, a structure that contains op metadata regarding whether the op is a view and the dim-related args it accepts. It also populates a huge database for dim-wise / view ops with this info. Test logic (sample input generation, references) have been updated to utilize this data. It allows for a fairly generic set of sample inputs & a reference for the class of ops that accept a single NJT and operate dim-wise (AKA "unary dimwise ops"). Testing is added over the following ops: * `chunk()` * `narrow()` * `select()` * `split()` * `split_with_sizes()` * `squeeze()` * `unflatten()` * `unsqueeze()` Most of the above do not operate on the ragged / batch dims or on non-contiguous NJTs, so the proper xfails are added as needed. I also slipped in a couple minor fixes (sorry): 1. The `_wrap_jagged_dim()` helper now avoids assuming the `nt._ragged_idx == 1` and allows for a batch dim to be a valid input, disambiguating the converted inner dim as necessary through an additional `operating_on_batch` return value (i.e. both dim=0 and dim=1 map to dim=0 on the inner values tensor, since that dim represents a packed ragged dim for all batch items) 2. Padded dense -> NJT conversion requires shape gymnastics to operate with the restrictive FBGEMM kernel. The gymnastics were slightly wrong for the transposed NJT case, and this PR fixes that Pull Request resolved: pytorch#140161 Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch ghstack dependencies: pytorch#141500, pytorch#140736

This PR contains three `unsqueeze()`-related fixes for NJT: 1. Adjusts the output's `_ragged_idx` when `unsqueeze()` inserts a dim before the ragged dim 2. Corrects the unbind reference for `unsqueeze()` after the last input dim. For this case, the dim kwarg canonicalization logic needs to be applied wrt `inp.dim() + 1` to account for `dim=-1` properly 3. Adds ragged dim support to `unsqueeze()`, allowing for e.g. `(B, j1, D) -> (B, 1, j1, D)`. This is okay now after pytorch#137125 Note that `unsqueeze()` still doesn't support batch dim operation, and arguably should never support this. Pull Request resolved: pytorch#141392 Approved by: https://github.com/cpuhrsch ghstack dependencies: pytorch#141500, pytorch#140736, pytorch#140161

) This fixes some bugs when performing reductions / select() on dims before the ragged dim. In this case, the output NJT has a smaller number of dims, and its ragged_idx should reflect that correctly. Pull Request resolved: pytorch#141506 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer ghstack dependencies: pytorch#141500, pytorch#140736, pytorch#140161, pytorch#141392

…ytorch#141604) Old logic was completely wrong, returning `chunk_size` chunks instead of the intended number. The original test didn't catch this because `chunk_size == num_chunks` :p New OpInfo-based testing covers it though. Pull Request resolved: pytorch#141604 Approved by: https://github.com/soulitzer ghstack dependencies: pytorch#141500, pytorch#140736, pytorch#140161, pytorch#141392, pytorch#141506

Fix where() for NJT

778f929

[ghstack-poisoned]

This was referenced Nov 25, 2024

Forward / backward NJT support for several activation functions #140736

Closed

Initial NJT testing over dim type / views #140161

Closed

NJT unsqueeze() fixes #141392

Closed

jbschlosser added topic: not user facing topic category topic: improvements topic category release notes: nested tensor Changes that have a direct impact on nested tensors and removed topic: not user facing topic category labels Nov 25, 2024

jbschlosser requested review from cpuhrsch and soulitzer November 25, 2024 19:26

jbschlosser mentioned this pull request Nov 25, 2024

Adjust output NJT ragged_idx for reductions and select() #141506

Closed

malfet approved these changes Nov 25, 2024

View reviewed changes

cpuhrsch approved these changes Nov 25, 2024

View reviewed changes

jbschlosser requested a review from albanD as a code owner November 25, 2024 21:53

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 25, 2024

pytorchmergebot added the merging label Nov 25, 2024

soulitzer approved these changes Nov 25, 2024

View reviewed changes

pytorchmergebot removed the merging label Nov 25, 2024

pytorchmergebot added the merging label Nov 25, 2024

pytorchmergebot removed the merging label Nov 25, 2024

Skylion007 reviewed Nov 26, 2024

View reviewed changes

aten/src/ATen/native/TensorFactories.cpp Outdated Show resolved Hide resolved

pytorchmergebot added the merging label Nov 26, 2024

pytorchmergebot removed the merging label Nov 26, 2024

pytorchmergebot added the merging label Nov 26, 2024

jbschlosser mentioned this pull request Nov 26, 2024

NJT: Return correct number of outputs for chunk() on the batch dim #141604

Closed

pytorchmergebot added the Merged label Nov 26, 2024

pytorchmergebot closed this in 8ba555e Nov 26, 2024

pytorchmergebot removed the merging label Nov 26, 2024

github-actions bot deleted the gh/jbschlosser/204/head branch December 27, 2024 02:06

This was referenced Feb 6, 2025

Scalar tensor fails to broadcast with shape of in-graph constructed NJT #146644

Open

Support torch.where with NJT and dense tensor #140392

Closed

Fix where() for NJT #141500

Fix where() for NJT #141500

Uh oh!

Conversation

jbschlosser commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141500

❌ 1 New Failure

Uh oh!

jbschlosser commented Nov 25, 2024

Uh oh!

pytorchmergebot commented Nov 25, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 25, 2024

Merge failed

Uh oh!

jbschlosser commented Nov 25, 2024

Uh oh!

pytorchmergebot commented Nov 25, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 25, 2024

Merge failed

Uh oh!

Uh oh!

jbschlosser commented Nov 26, 2024

Uh oh!

pytorchmergebot commented Nov 26, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 26, 2024

Merge failed

Uh oh!

jbschlosser commented Nov 26, 2024

Uh oh!

pytorchmergebot commented Nov 26, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jbschlosser commented Nov 25, 2024 •

edited

Loading

pytorch-bot bot commented Nov 25, 2024 •

edited

Loading