-
Notifications
You must be signed in to change notification settings - Fork 26.3k
add inductor-cu126 tag #141110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add inductor-cu126 tag #141110
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141110
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Fixes pytorch#82159 Add note description about type promotion of `torch.sum`. **Test Result** **Before**  **After**  Pull Request resolved: pytorch#140939 Approved by: https://github.com/zou3519
) Fix pytorch#140462 . Horace found that when we implicitly fallback to eager, some eager kernels may not work correctly if Inductor provide non-contiguous inputs (due to padding etc.). The original issue is found for the backward op of weight_norm. The fix in this PR is a general one: we force inputs to all implicit fallback kernels to be contiguous. I have to refactor the code a bit to make it work. Previously we apply layout constraint in `GraphLowering.run_node`. We looks for implicit fallback in `call_function`. The problem here is, when we setup the implicit fallback in `call_function` with a layout constraint, we don't have a chance to apply the constraints.. The refactor moves the code that applies layout constraints to `call_function`. Pull Request resolved: pytorch#140996 Approved by: https://github.com/jansel
…0084) Summary: The way we've been de/serializing sympy.Exprs is not roundtrippable in all cases (serialize by calling `str(expr)`, and deserialize by calling `sympy.sympify(expr_str)`). This has led to expressions being mathematically equivalent but structurally different, causing issues in ValueRanges. Example issue: pytorch#136797 This starts to deprecate the use of `expr_str` and stores expressions in AST format instead. For BC purposes, `expr_str` deserialization is still supported, but we will always serialize to `expr_ast`. We'll kill this once the serialization upgrader design is finalized and implemented. Test Plan: test_export Differential Revision: D65638757 Pull Request resolved: pytorch#140084 Approved by: https://github.com/angelayi
…41003) Fixes `python test/inductor/test_fused_attention.py SDPAPatternRewriterCpuTests.test_pattern_fails_with_unsupported_mask_cpu` when `specialize_float=False`. You might wonder how it's related, it's because there is a "negative" test that expects us not to match. Previously it would fail on isinstance(param, Tensor), but now that we tensorify the float, it did match and caused a failure. This check ensures the mask has the same shape to ensure this negative test case actually fails. Pull Request resolved: pytorch#141003 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#140983
Implement RNG Generator by falling back to CPUGeneratorImpl. Pull Request resolved: pytorch#138449 Approved by: https://github.com/ezyang
…1022) Detailed description: The codes below will raise an error ```Python import torch from torch.fx.experimental.proxy_tensor import make_fx def func(a): b = a + 1 c = b.view(-1) c.add_(1) return b input = torch.randn(2) out = make_fx(func)(input) ``` The error info are like below: ```Python ... File "/root/Git.d/pytorch/pytorch/torch/_dynamo/codegen.py", line 34, in <module> from .variables.torch_function import TensorWithTFOverrideVariable File "/root/Git.d/pytorch/pytorch/torch/_dynamo/variables/torch_function.py", line 185, in <module> populate_builtin_to_tensor_fn_map() File "/root/Git.d/pytorch/pytorch/torch/_dynamo/variables/torch_function.py", line 146, in populate_builtin_to_tensor_fn_map inp0 = torch.ones(1) File "/root/Git.d/pytorch/pytorch/torch/fx/experimental/proxy_tensor.py", line 1240, in __torch_function__ return func(*args, **kwargs) File "/root/Git.d/pytorch/pytorch/torch/utils/_stats.py", line 21, in wrapper return fn(*args, **kwargs) File "/root/Git.d/pytorch/pytorch/torch/fx/experimental/proxy_tensor.py", line 1342, in __torch_dispatch__ return proxy_call(self, func, self.pre_dispatch, args, kwargs) File "/root/Git.d/pytorch/pytorch/torch/fx/experimental/proxy_tensor.py", line 907, in proxy_call name=proxy_mode.tracer.graph._target_to_str(func.overloadpacket.__name__), AttributeError: 'PythonKeyTracer' object has no attribute 'graph' ... ``` Solutions: Import torch._dynamo before dispatch_trace is called to avoid the context set before dispatch_trace from affecting the torch._dynamo import. Pull Request resolved: pytorch#141022 Approved by: https://github.com/ezyang
Detailed description: As the title stated. Pull Request resolved: pytorch#141023 Approved by: https://github.com/Skylion007, https://github.com/ezyang
Tracking issue: pytorch#138399 This PR changes the `pow` C++ implementation, making its C++ meta kernel consistent with its Python ref implementation. The following example shows the inconsistency between the two: ```python def run(device): S = (5,) a = torch.rand(S, device=device, dtype=torch.float32) b = 2 out = torch.empty(S, device=device, dtype=torch.float64) return torch.pow(a, b, out=out) >>> run("cpu") Traceback (most recent call last): File "test.py", line 34, in run return torch.pow(a, b, out=out) RuntimeError: Found dtype Double but expected Float >>> run("meta") tensor(..., device='meta', size=(5,), dtype=torch.float64) ``` **~Update:~** ~Note that this happens only for `pow.Tensor_Scalar` overloads. Therefore, this PR needed further 2 modifications:~ - ~Split the `pow` ref implementation, making `pow.Tensor_Scalar` error on mismatching output dtypes~ - ~Create a dispatch for `pow` when `_refs.pow()` is called~ **Update:** Changing the `TensorIteratorConfig` for `pow.Tensor_Scalar` was easier and, after the discussion below, more correct. The solution was to change the `TensorIteratorBase::build_output_borrowing_argument_owning_unary_op` function, setting: - `cast_common_dtype_to_outputs`; and - `enforce_safe_casting_to_output`. Pull Request resolved: pytorch#140287 Approved by: https://github.com/ezyang
Pull Request resolved: pytorch#141092 Approved by: https://github.com/Skylion007, https://github.com/kulinseth ghstack dependencies: pytorch#141089, pytorch#141090
Instead of calling `REGISTER_FUSED_ADAM_OP` macro with 7 parameters 16 times, 4 type parameter macros for each op and then one op to define the quartet of ops: Adam, AdamW and their grad functions Pull Request resolved: pytorch#141103 Approved by: https://github.com/kulinseth ghstack dependencies: pytorch#141089, pytorch#141090, pytorch#141092
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
Summary: **wins** on torchrec benchmark, for 2K nodes it save 40seconds with the recent sympy changes (https://www.internalfb.com/diff/D65883538) we save around 13 second ( with the max opt on). ``` buck2 run fbcode//mode/opt fbcode//torchrec/distributed/tests:pt2_compile_benchmark -- --num-features=200 ``` This diff optimizes construction expressions of the form a+b+c... (all unique symbols). which are very common in torchrec models. **How** Expressions of the form a+b+c are not optimized by add, the only needed optimization is sorting them. If we have a+b+c and we are adding (d) to it, we can do a binary search to know the position of (d) and avoid optimizing the new expression by passing the new order. **Extensions**: 1. support constant terms. 2. support 10a+10b+.. (this will give even more wins will extend the support in second PR) Differential Revision: D66008482 Pull Request resolved: pytorch#140822 Approved by: https://github.com/ezyang
* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables. * list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize. * Manually went back and made mypy happy after the change. * Also fixed style lints in files covered by flake8 but not by pyfmt Pull Request resolved: pytorch#140980 Approved by: https://github.com/justinchuby, https://github.com/malfet
Pull Request resolved: pytorch#140250 Approved by: https://github.com/jansel ghstack dependencies: pytorch#140223
Easy fix to missing reserve calls in NT Matmul CUDA kernel to improve perf. Pull Request resolved: pytorch#141130 Approved by: https://github.com/malfet
This uploads the MPS benchmark results to benchmark database. The data can then be queried, for example: ``` select benchmark, model, metric from oss_ci_benchmark_v3 where head_sha = '99a133116fee15aa1467165f2b209b37da53f189' and metric.name in ['eager_peak_mem', 'dynamo_peak_mem', 'speedup'] and model.name = 'BERT_pytorch' ``` I'm documenting the JSON format at https://github.com/pytorch/pytorch/wiki/How-to-integrate-with-PyTorch-OSS-benchmark-database ### Testing Locally, ``` PYTHONPATH=/Users/huydo/Storage/mine/benchmark python benchmarks/dynamo/torchbench.py --performance --only resnet152 --backend eager --training --devices mps --output test/test-reports/torchbench_training.csv ``` Workflow dispatch https://github.com/pytorch/pytorch/actions/runs/11927990520 Pull Request resolved: pytorch#141087 Approved by: https://github.com/malfet
I'm trying to make this benchmark results available on OSS benchmark database, so that people can query it from outside. The first step is to also record the results in the JSON format compatible with the database schema defined in pytorch/test-infra#5839. Existing CSV files remain unchanged. ### Testing The JSON results are uploaded as artifacts to S3 https://github.com/pytorch/pytorch/actions/runs/11809725848/job/32901411180#step:26:13, for example https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/11809725848/1/artifact/test-jsons-test-pr_time_benchmarks-1-1-linux.g4dn.metal.nvidia.gpu_32901411180.zip Pull Request resolved: pytorch#140493 Approved by: https://github.com/laithsakka
…#141145) A small fix I just realize after pytorch#141087. Pull Request resolved: pytorch#141145 Approved by: https://github.com/malfet
Fixes pytorch#107302 This is a clone and fix for pytorch#139199. This PR is a small step for the overall NumPy 2 support. It adds a new CI job for testing with NumPy 2 with one test file only. More tests to be fixed and added later in follow-up pull requests. Pull Request resolved: pytorch#140586 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <[email protected]>
|
Closing as an accidental wrong rebase happened. |
related to #138440
for testing #140793
cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @mcarilli @ptrblck @leslie-fang-intel @voznesenskym @penguinwu @EikanWang @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @LucasLLC @MeetVadakkanchery @mhorowitz @pradeepfn @xmfan @atalman