[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-08-22 #3

pragupta · 2025-08-22T13:54:53Z

Merged latest changes from upstream/main into rocm7.1_internal_testing on 2025-08-22

keep existing unbacked semantics unchanged, just use guard_or_false instead of guard_size_obl Pull Request resolved: pytorch#160250 Approved by: https://github.com/ColinPeppler, https://github.com/jingsh

Pull Request resolved: pytorch#160251 Approved by: https://github.com/jingsh, https://github.com/ColinPeppler ghstack dependencies: pytorch#160250

This reverts commit e0488d9. Reverted pytorch#160458 on behalf of https://github.com/wdvr due to need to rerun workflow generation (failing workflow-checks) ([comment](pytorch#160458 (comment)))

Which is manylinux2_28 compatible, even on aarch64 platform archive contents and URL pattern changed quite drastically between 3.3.9 and 3.3.20, but hopefully it still works. Package `libnvshmem_host.so.3` into gigantic aarch64+CUDA wheel Should fix pytorch#160425 Pull Request resolved: pytorch#160458 Approved by: https://github.com/Skylion007, https://github.com/kwen2501, https://github.com/nWEIdia, https://github.com/atalman, https://github.com/tinglvv

…ytorch#159790) This is a similar change to pytorch#153986, this time adding flags to the hipcc command under `cpp_extension.py`. The `-Wno-ignored-attributes` flag in particular avoids about 200MB of warning spam when building torchvision, like these: ``` In file included from D:\b\vision_main\torchvision\csrc\ops\hip\deform_conv2d_kernel.hip:72: In file included from D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\ATen/ATen.h:13: In file included from D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\ATen/Functions.h:386: In file included from D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\ATen/ops/_sparse_softmax.h:21: D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\ATen/ops/_sparse_softmax_ops.h:18:8: warning: __declspec attribute 'dllimport' is not supported [-Wignored-attributes] 18 | struct TORCH_API _sparse_softmax_int { | ^~~~~~~~~ D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\torch/headeronly/macros/Export.h:100:19: note: expanded from macro 'TORCH_API' 100 | #define TORCH_API C10_IMPORT | ^~~~~~~~~~ D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\torch/headeronly/macros/Export.h:53:31: note: expanded from macro 'C10_IMPORT' 53 | #define C10_IMPORT __declspec(dllimport) | ^~~~~~~~~ ``` The `-fms-extensions` flag just seems beneficial to include: https://clang.llvm.org/docs/MSVCCompatibility.html. See also this downstream issue where these changes were tested: ROCm/TheRock#910. Pull Request resolved: pytorch#159790 Approved by: https://github.com/jeffdaily

Summary: as title This is requested by the zoomer team so they can add stack trace information to profiler result. Test Plan: ``` buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r stack_traces ``` Rollback Plan: Differential Revision: D80050233 Pull Request resolved: pytorch#160779 Approved by: https://github.com/angelayi

) Typo mistake. This should be `dataclasses_json` https://github.com/pytorch/pytorch/actions/runs/17000197828/job/48200676725#step:10:23 Pull Request resolved: pytorch#160796 Approved by: https://github.com/yangw-dev

Pull Request resolved: pytorch#160698 Approved by: https://github.com/huydhn ghstack dependencies: pytorch#160116

This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned vllm hash. Pull Request resolved: pytorch#160699 Approved by: https://github.com/pytorchbot

This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned audio hash. Pull Request resolved: pytorch#160797 Approved by: https://github.com/pytorchbot

…pass (pytorch#158811) Pull Request resolved: pytorch#158811 Approved by: https://github.com/anijain2305 ghstack dependencies: pytorch#158810

Set dynamo=True and enable fallback. 1. Implemented the compatible behavior where BytesIO objects as `f` is accepted 2. Update tests to explicitly set dynamo=False pytorch#151693 Pull Request resolved: pytorch#159646 Approved by: https://github.com/titaiwangms

Fixes pytorch#160650. I added type ignore comment to `LeafSpec` class inheritance in `torch/utils/_cxx_pytree.py` to handle `PyTreeSpec` being marked as final in optree's type stubs. Pull Request resolved: pytorch#160652 Approved by: https://github.com/Skylion007

…0635) My proposal here is to use GitHub Dependabot to make sure that `transformers` version used in CI are always up-to-date. To achieve this, this PR does 2 things: 1. Pin `transformers` version across all CI jobs to only one place at `.ci/docker/ci_commit_pins/huggingface.txt`. This file is now a regular pip requirements instead of a pinned commit text. There isn't any need to pin `transformers` to a specific commit and the file already refers to a stable version `v4.54.0` 2. Create `.github/dependabot.yml` to config the bot to update `transformers` automatically when there is a new version. Those labels will ensure that the right reviewers from torch.compile and Dev Infra are notified. I'm not sure how to test this out in PR, but it feels ok to land and test this in main. If this works, we should see a PR to update `v4.54.0` to the current latest `v4.55.0` ### Reference https://docs.github.com/en/code-security/dependabot/working-with-dependabot/dependabot-options-reference Pull Request resolved: pytorch#160635 Approved by: https://github.com/ZainRizvi

… add aten.sym_is_contiguous. (pytorch#159197) This might cause some new DDEs on call sites that do not use is_contiguous_or_false() or sym_is_contiguous() but want to find those call sites to handle this properly by calling is_contiguous_or_false() and not is_contiguous() explitly when appropriate. I had to fix one issue after removing the implicit size oblivious reasoning. here is context we defined in this pytorch#157472 sym_is_contiguous to be the function computing contiguity for dynamic shapes in c++. It returns a symbolic expression that represents contiguity and guaranteed not to throw a DDE. when people call is_contiguous we do sym_is_contiguous().guard_bool() when people call is_contiguous_or_false we do sym_is_contiguous().guard_or_false() one issue not handled well was this path ``` c10::SymBool TensorImpl::sym_is_contiguous_custom( at::MemoryFormat memory_format) const { if (C10_UNLIKELY(matches_python_custom(SizesStridesPolicy::CustomStrides))) { return pyobj_slot_.load_pyobj_interpreter()->is_contiguous( this, memory_format); } return sym_is_contiguous_default(memory_format); } ``` namely if we call sym_is_contiguous_custom but we have matches_python_custom(SizesStridesPolicy::CustomStrides) return true , then we used to call is_contiguous(this, memory_format); This used to go through the load_pyobj_interpreter and end up calling the python is_contiguous call which used implicit size oblivious reasoning. once we removed that implicit size oblivious reasoning, the right thing we want is to call return pyobj_slot_.load_pyobj_interpreter()->sym_is_contiguous(this, memory_format); otherwise we would get DDE even if the caller is doing sym_is_contiguous. so I had to define it for pyinterpreter, and then I had to override it for nested tensors. Pull Request resolved: pytorch#159197 Approved by: https://github.com/ezyang

Differential Revision: D80201622 Pull Request resolved: pytorch#160599 Approved by: https://github.com/bdhirsh

…unner-mypy` (pytorch#160806) Like `MYPY`, linter `MYPYSTRICT` will need `--all-files` too. See also: - pytorch#160652 (comment) Pull Request resolved: pytorch#160806 Approved by: https://github.com/seemethere

Summary: - Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation. Testing: - Add test to verify structure and contents of tlparse artifiact Pull Request resolved: pytorch#160132 Approved by: https://github.com/xmfan ghstack dependencies: pytorch#160260

pytorch#159902) Pull Request resolved: pytorch#159902 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#159365, pytorch#159366, pytorch#159368, pytorch#159483

…ytorch#159864) Pull Request resolved: pytorch#159864 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#159365, pytorch#159366, pytorch#159368, pytorch#159483, pytorch#159902

…ts (pytorch#159865) Changes: (1) Replace UserDefinedSetVariable by UserDefinedObjectVariable in all binop calls Test plan: (1) The three tests from CPython `test_collections.py` ensures that Dynamo can trace through a dunder method (e.g. __add__, __ixor__, etc) defined in a user defined class Pull Request resolved: pytorch#159865 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#159365, pytorch#159366, pytorch#159368, pytorch#159483, pytorch#159902, pytorch#159864

…0132)" This reverts commit 2603e40. Reverted pytorch#160132 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/17010600949/job/48226137423) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/2603e40be5fa4a66301e6654e34a82a67f2e4913). landrace with another PR that changed some had_cuda related things ([comment](pytorch#160132 (comment)))

…#160747) Summary: Inductor's 3.4 Triton release is the most common used variant of Triton, but if someone is working with an alternative version of Triton this may not match. This moves the version check from 3.4 Triton to any variant that has support for the TMA APIs. Test Plan: Testing the previously failing test `inductor/test_torchinductor_strided_blocks.py::TritonTensorDescriptorTestCUDA::test_welford_non_block_pointer_cuda` Rollback Plan: Differential Revision: D80348643 Pull Request resolved: pytorch#160747 Approved by: https://github.com/NikhilAPatel

This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned vllm hash. Pull Request resolved: pytorch#160831 Approved by: https://github.com/pytorchbot

Summary: - Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation. Testing: - Add test to verify structure and contents of tlparse artifiact Pull Request resolved: pytorch#160132 Approved by: https://github.com/xmfan

To a commit containing pytorch/tensorpipe#464 that fixes compilation with CUDA-13 Fixes pytorch#160104 Pull Request resolved: pytorch#160808 Approved by: https://github.com/nWEIdia, https://github.com/Skylion007, https://github.com/malfet

…ytorch#160747)" This reverts commit 8f43454. Reverted pytorch#160747 on behalf of https://github.com/malfet due to Looks like this breaks rocm, see https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=rocm%20%2F%20linux-jammy-rocm-py3.10 ([comment](pytorch#160747 (comment)))

Remove CONDA_CMAKE from `.ci/docker/build.sh` Pull Request resolved: pytorch#160832 Approved by: https://github.com/malfet

Purely a refactor, improve typing and get rid of some type errors. Make certain fields as nonnull, since in general it's not empty. The goal of this stack of PRs is to move the save/load logic of guard serialization into separate, flat phases, instead of being embedded in guard creation. This way, we can put a try/catch around it and fail safely if certain guards are not serializable. Pull Request resolved: pytorch#160530 Approved by: https://github.com/Lucaskabela, https://github.com/Skylion007

Because numpy 1.22.4 had reached EOL 3 years ago. Pull Request resolved: pytorch#160836 Approved by: https://github.com/malfet

Bumps [uv](https://github.com/astral-sh/uv) from 0.8.4 to 0.8.6. - [Release notes](https://github.com/astral-sh/uv/releases) - [Changelog](https://github.com/astral-sh/uv/blob/main/CHANGELOG.md) - [Commits](astral-sh/uv@0.8.4...0.8.6) --- updated-dependencies: - dependency-name: uv dependency-version: 0.8.6 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…ch#160205) Parallelize reading of data behind thread_count argument to HFStorageReader Test plan: ensure existing tests pass and run a job successfully with these changes Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/) Pull Request resolved: pytorch#160205 Approved by: https://github.com/meetv18

Summary: att - changed one of the tests to get rid of torcharrow dep. Test Plan: ``` buck2 test //caffe2/test/cpp/nativert:layout_planner_tests Tests finished: Pass 15. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` Rollback Plan: Reviewed By: SherlockNoMad Differential Revision: D80108549 Pull Request resolved: pytorch#160942 Approved by: https://github.com/georgiaphillips, https://github.com/henryoier

This fixes an assertion we were running into in the memory planning about not having an acyclic graph. The repro is very long so hard to make local test of, but fixes repro I am looking at. Pull Request resolved: pytorch#161205 Approved by: https://github.com/IvanKobzarev, https://github.com/bdhirsh

…61185) Summary: Removed `Model`, it's not being used anywhere so it's safe. Removed `tensor_paths` and `constant_paths` fields in `ExportedProgram` - BC: when the current deserializer load a previously serialized EP (that comes with empty `tensor_paths` and `constant_paths`), it will just ignore those two fields - FC: when the old deserializer load a newly serialized EP (that doesn't come with `tensor_paths` and `constant_paths`, it will also ignore those two fields in `_dict_to_dataclass()` Differential Revision: D80725094 Pull Request resolved: pytorch#161185 Approved by: https://github.com/SherlockNoMad

Pull Request resolved: pytorch#161168 Approved by: https://github.com/mikaylagawarecki, https://github.com/Skylion007

…ytorch#160373) Following up on pytorch#152951 (comment), this removes a few lines added in that pull request, fixing link errors like ``` [7019/7028] Linking CXX shared library bin\torch_hip.dll FAILED: [code=4294967295] bin/torch_hip.dll lib/torch_hip.lib C:\Windows\system32\cmd.exe /C "cd . && D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\cmake\data\bin\cmake.exe -E vs_link_dll --msvc-ver=1942 --intdir=caffe2\CMakeFiles\torch_hip.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100261~1.0\x64\rc.exe --mt=C:\PROGRA~2\MICROS~2\2022\BUILDT~1\VC\Tools\Llvm\x64\bin\llvm-mt.exe --manifests -- D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\_rocm_sdk_devel\lib\llvm\bin\lld-link.exe /nologo @CMakeFiles\torch_hip.rsp /out:bin\torch_hip.dll /implib:lib\torch_hip.lib /pdb:bin\torch_hip.pdb /dll /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO && cd ." LINK: command "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\_rocm_sdk_devel\lib\llvm\bin\lld-link.exe /nologo @CMakeFiles\torch_hip.rsp /out:bin\torch_hip.dll /implib:lib\torch_hip.lib /pdb:bin\torch_hip.pdb /dll /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO /MANIFEST:EMBED,ID=2" failed (exit code 1) with the following output: lld-link: error: undefined symbol: __declspec(dllimport) class std::tuple<class at::Tensor, class at::Tensor, class at::Tensor> __cdecl at::native::transform_bias_rescale_qkv_cuda(class at::Tensor const &, class at::Tensor const &, __int64) >>> referenced by caffe2\CMakeFiles\torch_hip.dir\__\aten\src\ATen\RegisterCUDA_0.cpp.obj:(class std::tuple<class at::Tensor, class at::Tensor, class at::Tensor> __cdecl at::`anonymous namespace'::`anonymous namespace'::wrapper_CUDA___transform_bias_rescale_qkv(class 0xE9BF7323::Tensor const &, class 0xE9BF7323::Tensor const &, __int64)) >>> referenced by caffe2\CMakeFiles\torch_hip.dir\__\aten\src\ATen\RegisterNestedTensorCUDA_0.cpp.obj:(class std::tuple<class at::Tensor, class at::Tensor, class at::Tensor> __cdecl at::`anonymous namespace'::`anonymous namespace'::wrapper_NestedTensorCUDA___transform_bias_rescale_qkv(class 0xEFEB5304::Tensor const &, class 0xEFEB5304::Tensor const &, __int64)) ``` The `native_transformers_hip_hip` and `native_transformers_hip_cpp` sources are okay to define (and are required) even if accelerated versions of these operations are not available. I've tested downstream builds of torch with ROCm on native Windows via https://github.com/ROCm/TheRock both with and without aotriton and these changes were needed for the build to succeed in both cases. I have _not_ tested Linux, WSL, or with the HIP SDK. Pull Request resolved: pytorch#160373 Approved by: https://github.com/alugorey, https://github.com/jeffdaily

Note: Adding unit test for this is tricky as having errors in the specific unit test would cause test_utils.py to crash all together. Tested as follows: 1. Added x = 1/0 after guarded_code = compile_inner(code, one_graph, hooks, transform) in convert_frame.py 2. Printed exception_stack_trace and got: ['Traceback (most recent call last):\n File "/data/users/jovian/pytorch/torch/_dynamo/convert_frame.py", line 1207, in _compile\n x = 1/0\n ~^~\nZeroDivisionError: division by zero\n'] Pull Request resolved: pytorch#161096 Approved by: https://github.com/c00w

…59233) Fixes pytorch#158076 Basically, the gemm template generates code like ``` cpp_CppMicroGemmRef_micro_gemm<static_cast<bool>(false), static_cast<bool>(false)>( &(X[static_cast<int64_t>(k_start + 196LL*m_start + 38416LL*ks_b_index)]), &(W[static_cast<int64_t>(200704000LL + n_start + 80LL*k_start + 15680LL*ks_b_index)]), &(local_acc_buf[static_cast<int64_t>(Nr*nci + ((-1LL)*Nr*nc))]), static_cast<int64_t>(m_end + ((-1LL)*m_start)), static_cast<int64_t>(Nr), static_cast<int64_t>(k_end + ((-1LL)*k_start)), static_cast<int64_t>(196LL), static_cast<int64_t>(80LL), static_cast<int64_t>(Nc_blocks*Nr) ); ``` However, when the input tensor W has a storage offset, this results in a double offset issue. That is, the resulting pointer is `2 * 200704000LL` away from `W.storage().data_ptr()`, which causes an out-of-bounds access. The storage offset of `W` is introduced by [this patch](https://github.com/pytorch/pytorch/pull/136421/files), but I think it's a reasonable fix. So `cpp_gemm_template.py` should handle input matrices with storage offsets properly. I think a good way to fix this issue is to create a new matrix that has no storage offset. When `should_block_weights` is true, `block_weight()` creates a clean new matrix, so that branch is not affected by this issue. BTW I've also examined the FX IRs generated by `torch.compile()`, as well as the generated python module, and they are correct. The newly-added test in `test_cpu_select_algorithm.py` can reproduce the issue. With this patch, the crash is fixed. It also resolves the crash reported in pytorch#158076. I ran CPU tests in `test_cpu_select_algorithm.py`, but many of them are skipped due to MKL and AMX. I'd be appreciated if someone can help verify the test. Pull Request resolved: pytorch#159233 Approved by: https://github.com/leslie-fang-intel, https://github.com/swolchok

…#161203) Summary: We use tempfile.NamedTemporaryFile to create a temporary pt2 file in `test_nativert.py` However, it is not recognized as an allowed file format and a warning will be thrown. Test Plan: CI Rollback Plan: Differential Revision: D80740916 Pull Request resolved: pytorch#161203 Approved by: https://github.com/angelayi

This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned audio hash. Pull Request resolved: pytorch#161226 Approved by: https://github.com/pytorchbot

@anijain2305

…orch#161036) Fixes silent incorrectness for autograd function tracing, where we rely on FakeTensor metadata (requires_grad) to determine whether to HOP or not: https://github.com/pytorch/pytorch/blob/5ee464db5c4293ac09521f9069fa7d2106680a7f/torch/_dynamo/variables/misc.py#L671 Stared at this with @anijain2305 yesterday, `Tensor.__setitem__` can update tensor metadata, and we can just run the fake prop and extract the output metadata from the updated FakeTensor. FIXES pytorch#160901 It should also be the root cause behind the issue in pytorch/torchtitan#1604 @bdhirsh @ruisizhang123 Pull Request resolved: pytorch#161036 Approved by: https://github.com/anijain2305 ghstack dependencies: pytorch#160805

Pull Request resolved: pytorch#160583 Approved by: https://github.com/huydhn, https://github.com/atalman

…rch#161137) This doesn't make sense to have this default to Maxwell, which is too old. All other places in CI/CD needs to overwrite this value. IMO, it makes more sense to not set this at all and let CI/CD jobs set it for their own use cases instead. This is partly responsible for the build failure in pytorch#160988 Pull Request resolved: pytorch#161137 Approved by: https://github.com/msaroufim

Optimize [zero_grad doc](https://docs.pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html) format and description. ## Test Result ### Before <img width="996" height="534" alt="image" src="https://github.com/user-attachments/assets/e1db973c-57e8-4525-90e7-0500cde2263d" /> ### After <img width="890" height="496" alt="image" src="https://github.com/user-attachments/assets/5579c4fb-a857-4030-9303-34770083d1a5" /> Pull Request resolved: pytorch#161239 Approved by: https://github.com/janeyx99

…#161196) Enable max compatible to msvc for oneAPI headers. The key context is `The /permissive- option is compatible with almost all of the header files from the latest Windows Kits` from https://learn.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=msvc-170 Pull Request resolved: pytorch#161196 Approved by: https://github.com/jansel

Changes: 1. Math related build option is not supported by msvc, skip them on Windows. 2. Move all math related build option to `_get_ffast_math_flags` function. Pull Request resolved: pytorch#161197 Approved by: https://github.com/jansel

…orch#161159) Pull Request resolved: pytorch#161159 Approved by: https://github.com/eellison

# Motivation pytorch#160505 enables background threads for XPU host allocator. However, it will hang on Windows during program exit. Now disable it until we narrow down the issue. Pull Request resolved: pytorch#161242 Approved by: https://github.com/EikanWang

Removes a redundant if statement. Does not impact logic so no test changes needed. Pull Request resolved: pytorch#161215 Approved by: https://github.com/StrongerXi

…58568) Adds support for FlightRecorder in ProcessGroupXCCL. See intel/torch-xpu-ops#1867 for XCCL implementation and more details. Pull Request resolved: pytorch#158568 Approved by: https://github.com/guangyey, https://github.com/fduwjj

…#161043) As the title stated. Pull Request resolved: pytorch#161043 Approved by: https://github.com/Skylion007

Pull Request resolved: pytorch#159361 Approved by: https://github.com/anijain2305

Add magma build 13.0 for Windows Add cuda_install.bat 13.0 for Windows build pytorch#159779 Pull Request resolved: pytorch#161073 Approved by: https://github.com/atalman Co-authored-by: Andrey Talman <[email protected]>

pytorch#159779 CUDA 13.0.0 NVSHMEM 3.3.20 CUDNN 9.12.0.46 Adding x86 linux builds for CUDA 13. Adding libtorch docker. Package naming changed for CUDA 13 (removed postfix -cu13 for some packages). Preparation checklist: 1. Update index https://download.pytorch.org/whl/nightly/cu130 with pypi packages 2. Update packaging name based on https://pypi.org/project/cuda-toolkit/ metadata Pull Request resolved: pytorch#160956 Approved by: https://github.com/atalman Co-authored-by: atalman <[email protected]>

This reverts commit 523bffd. Reverted pytorch#149218 on behalf of https://github.com/atalman due to Lets not use no-cache flags on test binaries ([comment](pytorch#149218 (comment)))

…sting_IFU_2025-08-22 # Conflicts: # .ci/docker/requirements-ci.txt # aten/src/ATen/Context.cpp # aten/src/ATen/cuda/tunable/GemmHipblaslt.h # aten/src/ATen/native/Normalization.cpp # aten/src/ATen/native/cuda/Blas.cpp # requirements.txt # test/distributed/_tools/test_fsdp2_mem_tracker.py # test/dynamo/test_activation_checkpointing.py # test/dynamo/test_structured_trace.py # test/inductor/test_combo_kernels.py # test/test_matmul_cuda.py # torch/_higher_order_ops/triton_kernel_wrap.py # torch/_inductor/choices.py # torch/_inductor/codegen/triton.py # torch/testing/_internal/common_cuda.py

…rch#165479) These happen when building with CMAKE_BUILD_TYPE=RelWithAssert This should fix two types of failures that started with pytorch#163665 Disclaimer that I used a lot of AI since I don't how pybind works or what refcounts and pointers are, so idk if this is a good solution, or even a solution at all (fwiw the tests pass now) The first one type is Truncated: ``` default_pg, _ = _new_process_group_helper( File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2096, in _new_process_group_helper backend_class = creator_fn(dist_backend_opts, backend_options) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/fake_pg.py", line 25, in _create_fake_pg return FakeProcessGroup._create_internal( RuntimeError: new_refcount != 1 INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/c10/util/intrusive_ptr.h":319, please report a bug to PyTorch. intrusive_ptr: Cannot increase refcount after it reached zero. Exception raised from retain_ at /var/lib/jenkins/workspace/c10/util/intrusive_ptr.h:319 (most recent call first): C++ CapturedTraceback: #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0 #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) from ??:0 #7 c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) from ??:0 #8 void pybind11::class_<c10d::FakeProcessGroup, (anonymous namespace)::IntrusivePtrNoGilDestructor<c10d::FakeProcessGroup> >::init_instance<(anonymous namespace)::IntrusivePtrNoGilDestructor<c10d::FakeProcessGroup>, 0>(pybind11::detail::instance*, void const*) from init.cpp:0 #9 pybind11::detail::type_caster_generic::cast(void const*, pybind11::return_value_policy, pybind11::handle, pybind11::detail::type_info const*, void* (*)(void const*), void* (*)(void const*), void const*) from :0 #10 pybind11::cpp_function::initialize<torch::distributed::c10d::(anonymous namespace)::c10d_init(_object*, _object*)::{lambda(int, int, c10::intrusive_ptr<c10d::FakeProcessGroup::Options, c10::detail::intrusive_target_default_null_type<c10d::FakeProcessGroup::Options> >)ROCm#127}, c10::intrusive_ptr<c10d::FakeProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::FakeProcessGroup> >, int, int, c10::intrusive_ptr<c10d::FakeProcessGroup::Options, c10::detail::intrusive_target_default_null_type<c10d::FakeProcessGroup::Options> >, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg_v>(torch::distributed::c10d::(anonymous namespace)::c10d_init(_object*, _object*)::{lambda(int, int, c10::intrusive_ptr<c10d::FakeProcessGroup::Options, c10::detail::intrusive_target_default_null_type<c10d::FakeProcessGroup::Options> >)ROCm#127}&&, c10::intrusive_ptr<c10d::FakeProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::FakeProcessGroup> > (*)(int, int, c10::intrusive_ptr<c10d::FakeProcessGroup::Options, c10::detail::intrusive_target_default_null_type<c10d::FakeProcessGroup::Options> >), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 ``` and I fix it here by getting rid of `DontIncreaseRefcount` and using make_intrusive to do the ref count handling instead. However, I also had to move the constructor to be public, which I think is not good, based on the reasoning of the original PR The other one type is ``` Traceback (most recent call last): File "/var/lib/jenkins/workspace/test/test_testing.py", line 2415, in test_no_warning_on_import self.assertEqual(out, "") File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4233, in assertEqual raise error_metas.pop()[0].to_error( # type: ignore[index] AssertionError: String comparison failed: "/opt/conda/envs/py_3.10/lib/python3.10/s[352 chars]):\n" != '' - /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/__init__.py:29: FutureWarning: pybind11-bound class 'torch._C._distributed_c10d.FakeProcessGroup' is using an old-style placement-new '__init__' which has been deprecated. See the upgrade guide in pybind11's docs. This message is only visible when compiled in debug mode. - if is_available() and not torch._C._c10d_init(): To execute this test, run the following from the base repo dir: python test/test_testing.py TestImports.test_no_warning_on_import ``` which I fix by getting rid of the `__init__` which I think is ok since it'll just error if you try to make one? Pull Request resolved: pytorch#165479 Approved by: https://github.com/ezyang

laithsakka and others added 30 commits August 16, 2025 00:54

[vllm test] add vllm.yml and additional package (pytorch#160698)

75ea934

Pull Request resolved: pytorch#160698 Approved by: https://github.com/huydhn ghstack dependencies: pytorch#160116

[Dynamo][Hierarchical Compile] Flatten tuple outputs in graph dedupe …

fb7e60b

…pass (pytorch#158811) Pull Request resolved: pytorch#158811 Approved by: https://github.com/anijain2305 ghstack dependencies: pytorch#158810

[MTIA] add correct name for CFF in tlparse (pytorch#160599)

cff6def

Differential Revision: D80201622 Pull Request resolved: pytorch#160599 Approved by: https://github.com/bdhirsh

Remove unused CONDA_CMAKE option (pytorch#160832)

960c03d

Remove CONDA_CMAKE from `.ci/docker/build.sh` Pull Request resolved: pytorch#160832 Approved by: https://github.com/malfet

Use numpy 1.26.2 for Python 3.9 and 3.10 (pytorch#160836)

7a68d02

Because numpy 1.22.4 had reached EOL 3 years ago. Pull Request resolved: pytorch#160836 Approved by: https://github.com/malfet

dependabot bot and others added 27 commits August 21, 2025 15:54

[VLLM TEST]setup test workflow (pytorch#160583)

0dea191

Pull Request resolved: pytorch#160583 Approved by: https://github.com/huydhn, https://github.com/atalman

[bucketing] allow convert_element_type after fsdp reduce_scatter (pyt…

595987d

…orch#161159) Pull Request resolved: pytorch#161159 Approved by: https://github.com/eellison

Refactoring TensorImpl by using constexpr and std::is_same_v (pytorch…

2beffb3

…#161043) As the title stated. Pull Request resolved: pytorch#161043 Approved by: https://github.com/Skylion007

[BE] [dynamo] Simplify two methods in ConstDictVariable (pytorch#159361)

774b4be

Pull Request resolved: pytorch#159361 Approved by: https://github.com/anijain2305

pragupta closed this Aug 29, 2025

pragupta deleted the rocm7.1_internal_testing_IFU_2025-08-22 branch August 29, 2025 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-08-22 #3

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-08-22 #3

Uh oh!

pragupta commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-08-22 #3

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-08-22 #3

Uh oh!

Conversation

pragupta commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants