Skip to content

Conversation

@laithsakka
Copy link
Contributor

@laithsakka laithsakka commented Oct 23, 2024

Stack from ghstack (oldest at bottom):

Tested internally here: https://www.internalfb.com/diff/D64057744
This is a reland after previous internal failures.
main change is

 if min is None and max is None:
        torch._check_is_size(size)
        return

Partially addresses #128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation. Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments. Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

Signed-off-by: Edward Z. Yang [email protected]

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @ezyang @SherlockNoMad @EikanWang @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @zhuhaozhe @blzheng @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @rec

Partially addresses #128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

Signed-off-by: Edward Z. Yang <[email protected]>

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138660

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (1 Unrelated Failure)

As of commit c84b2af with merge base 72dde6e (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo module: inductor release notes: fx release notes category labels Oct 23, 2024
@laithsakka
Copy link
Contributor Author

@pytorchbot merge -i

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 23, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 18 checks: pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 1, 3, linux.4xlarge), Lint / lintrunner-noclang / linux-job, inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_torchbench, 2, 2, linux.12xlarge), inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_freezing_torchbench, 2, 2, linux.12xlarge), inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_amp_freezing_torchbench, 2, 2, linux.16xlarge.spr), inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 2, 2, linux.12xlarge), inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_aot_inductor_freezing_torchbench, 2, 2, linux.12xlarge), inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_aot_inductor_amp_freezing_torchbench, 2, 2, linux.12xlarge), inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_aot_inductor_freezing_torchbench, 2, 2, linux.12xlarge), inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_aot_inductor_amp_freezing_torchbench, 2, 2, linux.12xlarge), inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_freezing_avx2_torchbench, 2, 2, linux.10xlarge.avx2), inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu), inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu), inductor / cuda12.1-py3.10-gcc9-sm86 / test (aot_inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu), inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamo_eager_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu), inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (aot_eager_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu), inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamic_aot_eager_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu), inductor-periodic / cuda12.1-py3.10-gcc9-sm80 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.gcp.a100)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Tested internally here: https://www.internalfb.com/diff/D64057744
This is a reland after previous internal failures.
main change is 
```
 if min is None and max is None:
        torch._check_is_size(size)
        return
```



Partially addresses #128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 ezyang SherlockNoMad EikanWang wenzhe-nrv voznesenskym penguinwu Guobing-Chen zhuhaozhe blzheng jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov rec

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 23, 2024
Partially addresses #128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: a170651
Pull Request resolved: #138660
@laithsakka
Copy link
Contributor Author

rebase

Comment on lines +1016 to +1046
# Special case for sum on tuple/list of ints
if (
self.fn is builtins.sum
and len(args) == 1
and not kwargs
and isinstance(args[0], (variables.ListVariable, variables.TupleVariable))
and all(
(isinstance(x, variables.ConstantVariable) and isinstance(x.value, int))
or (isinstance(x, variables.SymNodeVariable) and x.python_type() is int)
for x in args[0].items
)
):
return variables.SymNodeVariable.create(
tx,
tx.output.create_proxy(
"call_function",
torch.sym_sum,
(tuple(a.as_proxy() for a in args[0].items),),
{},
),
sym_num=torch.sym_sum(
[
(
x.value
if isinstance(x, variables.ConstantVariable)
else x.sym_num
)
for x in args[0].items
]
),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to create a dispatch registry to make the code easier to maintain.

check_fn, dispatch_fn = self.shortcuts.get(self.fn, (None, None))
if check_fn is not None and check_fn(args, kwargs):
    return dispatch_fn(self, tx, args, kwargs)

@laithsakka
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

SamGinzburg pushed a commit that referenced this pull request Oct 28, 2024
Tested internally here: https://www.internalfb.com/diff/D64057744
This is a reland after previous internal failures.
main change is
```
 if min is None and max is None:
        torch._check_is_size(size)
        return
```

Partially addresses #128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

Signed-off-by: Edward Z. Yang <[email protected]>

Pull Request resolved: #138660
Approved by: https://github.com/ezyang, https://github.com/bobrenjc93
@github-actions github-actions bot deleted the gh/laithsakka/89/head branch November 23, 2024 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request fx Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo module: inductor release notes: fx release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants