Skip to content

Commit 415b882

Browse files
committed
Update on "[inductor] [cpp] use non-temporal tile load for A"
Use non-temporal tile load `_tile_stream_loadd` for A to keep B in L1. Verified AMP static shapes and dynamic shapes on CPU with AMX support and no obvious performance boost (no regression either) at end-to-end level. We're expecting to get performance gain when adding #129348 (also in this ghstack) on top of this PR. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
2 parents f30b79f + 2a23495 commit 415b882

File tree

1,582 files changed

+38714
-31499
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,582 files changed

+38714
-31499
lines changed

.ci/docker/aotriton_version.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
0.6b
22
manylinux_2_17
3-
rocm6
4-
04b5df8c8123f90cba3ede7e971e6fbc6040d506
5-
3db6ecbc915893ff967abd6e1b43bd5f54949868873be60dc802086c3863e648
3+
rocm6.1
4+
7f07e8a1cb1f99627eb6d77f5c0e9295c775f3c7
5+
77c29fa3f3b614e187d7213d745e989a92708cee2bc6020419ab49019af399d1
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
172574a6be5910a4609e4ed1bef2b6b8475ddb3d
1+
c572f9e509b5ec5d56f4d218271e36269bba244f
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
01cbe5045a6898c9a925f01435c8277b2fe6afcc
1+
21eae954efa5bf584da70324b640288c3ee7aede
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
aac14a3b93f11d781d1d5ebc5400b15ae8df5185
1+
1b2f15840e0d70eec50d84c7a0575cb835524def

.ci/docker/common/install_aotriton.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ TARBALL='aotriton.tar.bz2'
99
read -d "\n" VER MANYLINUX ROCMBASE PINNED_COMMIT SHA256 < aotriton_version.txt || true
1010
ARCH=$(uname -m)
1111
AOTRITON_INSTALL_PREFIX="$1"
12-
AOTRITON_URL="https://github.com/ROCm/aotriton/releases/download/${VER}/aotriton-${VER}-${MANYLINUX}_${ARCH}-${ROCMBASE}.tar.bz2"
12+
AOTRITON_URL="https://github.com/ROCm/aotriton/releases/download/${VER}/aotriton-${VER}-${MANYLINUX}_${ARCH}-${ROCMBASE}-shared.tar.bz2"
1313

1414
cd "${AOTRITON_INSTALL_PREFIX}"
1515
# Must use -L to follow redirects

.ci/docker/common/install_conda.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ fi
8585
else
8686
CONDA_COMMON_DEPS="astunparse pyyaml mkl=2021.4.0 mkl-include=2021.4.0 setuptools"
8787

88-
if [ "$ANACONDA_PYTHON_VERSION" = "3.11" ] || [ "$ANACONDA_PYTHON_VERSION" = "3.12" ]; then
88+
if [ "$ANACONDA_PYTHON_VERSION" = "3.11" ] || [ "$ANACONDA_PYTHON_VERSION" = "3.12" ] || [ "$ANACONDA_PYTHON_VERSION" = "3.13" ]; then
8989
conda_install numpy=1.26.0 ${CONDA_COMMON_DEPS}
9090
else
9191
conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS}

.ci/docker/requirements-ci.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -306,7 +306,7 @@ pywavelets==1.5.0 ; python_version >= "3.12"
306306
#Pinned versions: 1.4.1
307307
#test that import:
308308

309-
lxml==5.0.0.
309+
lxml==5.0.0
310310
#Description: This is a requirement of unittest-xml-reporting
311311

312312
# Python-3.9 binaries

.ci/pytorch/build.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,10 @@ if [[ "${BUILD_ENVIRONMENT}" != *android* && "${BUILD_ENVIRONMENT}" != *cuda* ]]
230230
export BUILD_STATIC_RUNTIME_BENCHMARK=ON
231231
fi
232232

233+
if [[ "$BUILD_ENVIRONMENT" == *-debug* ]]; then
234+
export CMAKE_BUILD_TYPE=RelWithAssert
235+
fi
236+
233237
# Do not change workspace permissions for ROCm CI jobs
234238
# as it can leave workspace with bad permissions for cancelled jobs
235239
if [[ "$BUILD_ENVIRONMENT" != *rocm* ]]; then

.ci/pytorch/common_utils.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,6 +222,8 @@ function checkout_install_torchbench() {
222222
# to install and test other models
223223
python install.py --continue_on_fail
224224
fi
225+
echo "Print all dependencies after TorchBench is installed"
226+
python -mpip freeze
225227
popd
226228
}
227229

.ci/pytorch/multigpu-test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ time python test/run_test.py --verbose -i distributed/test_c10d_gloo
1818
time python test/run_test.py --verbose -i distributed/test_c10d_nccl
1919
time python test/run_test.py --verbose -i distributed/test_c10d_spawn_gloo
2020
time python test/run_test.py --verbose -i distributed/test_c10d_spawn_nccl
21-
time python test/run_test.py --verbose -i distributed/test_cuda_p2p
2221
time python test/run_test.py --verbose -i distributed/test_store
22+
time python test/run_test.py --verbose -i distributed/test_symmetric_memory
2323
time python test/run_test.py --verbose -i distributed/test_pg_wrapper
2424
time python test/run_test.py --verbose -i distributed/rpc/cuda/test_tensorpipe_agent
2525
# FSDP tests

0 commit comments

Comments
 (0)