Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
9f0b88b to
de947c8
Compare
de947c8 to
6c786a5
Compare
This comment was marked as outdated.
This comment was marked as outdated.
6c786a5 to
96a18b2
Compare
96a18b2 to
71a8ed0
Compare
|
@kwryankrattiger that doesn't seem to have made anything worse, although it doesn't fix any of the above errors. |
160c192 to
4184de6
Compare
|
Requires spack/gitlab-runners#49 to build a new image with a modern GCC version. |
|
Remaining aarch64 issues are mostly known issues:
Can anyone look at the llvm-amdgpu build issue on the ROCm stack? @haampie @renjithravindrankannath @srekolam |
|
Nope, I'm the one who was missing something. I was looking in @aweits want to submit a PR? If not I can get to it later tonight or tomorrow. |
|
@spackbot run pipeline |
|
I've started that pipeline for you! |
|
@aweits looks like llvm-amdgpu has a new build error now |
|
Todd thinks we should just give up on aarch64 + ROCm, so the JAX build is higher priority. |
I'd be thrilled to see it but I don't think it exists! |
|
We could comment out JAX for now. Once #46346 is complete the issue will likely resolve itself. But that PR is pretty far from working. |
|
i should be able to get some time to poke at this again today... |
|
here's the tl;dr on a fix for arm/llvm-amdgpu - will put appropriate conditionals on this and submit a pr: [aweits@gg-00 spack]$ git diff
diff --git a/var/spack/repos/builtin/packages/llvm-amdgpu/package.py b/var/spack/repos/builtin/packages/llvm-amdgpu/package.py
index 3e39a4445ba..c342098588f 100644
--- a/var/spack/repos/builtin/packages/llvm-amdgpu/package.py
+++ b/var/spack/repos/builtin/packages/llvm-amdgpu/package.py
@@ -222,7 +222,7 @@ def cmake_args(self):
self.define("LIBCXXABI_ENABLE_STATIC", "ON"),
self.define("LIBCXXABI_INSTALL_STATIC_LIBRARY", "OFF"),
self.define("LLVM_ENABLE_RTTI", "ON"),
- self.define("LLVM_TARGETS_TO_BUILD", "AMDGPU;X86"),
+ self.define("LLVM_TARGETS_TO_BUILD", "AMDGPU;AArch64"),
self.define("LLVM_AMDGPU_ALLOW_NPI_TARGETS", "ON"),
self.define("PACKAGE_VENDOR", "AMD"),
self.define("CLANG_ENABLE_AMDCLANG", "ON"), |
|
got jax building - again, will open pr with appropriate conditionals diff --git a/var/spack/repos/builtin/packages/py-jaxlib/jaxxlatsl.patch b/var/spack/repos/builtin/packages/py-jaxlib/jaxxlatsl.patch
new file mode 100644
index 00000000000..e96cc32e263
--- /dev/null
+++ b/var/spack/repos/builtin/packages/py-jaxlib/jaxxlatsl.patch
@@ -0,0 +1,100 @@
+From 8fce7378ed8ce994107568449806cd99274ab22b Mon Sep 17 00:00:00 2001
+From: Andrew Elble <[email protected]>
+Date: Mon, 21 Oct 2024 19:42:31 -0400
+Subject: [PATCH] patchit
+
+---
+ ...ch-for-Abseil-to-fix-build-on-Jetson.patch | 68 +++++++++++++++++++
+ third_party/xla/workspace.bzl | 1 +
+ 2 files changed, 69 insertions(+)
+ create mode 100644 third_party/xla/0001-Add-patch-for-Abseil-to-fix-build-on-Jetson.patch
+
+diff --git a/third_party/xla/0001-Add-patch-for-Abseil-to-fix-build-on-Jetson.patch b/third_party/xla/0001-Add-patch-for-Abseil-to-fix-build-on-Jetson.patch
+new file mode 100644
+index 000000000000..5138a045082b
+--- /dev/null
++++ b/third_party/xla/0001-Add-patch-for-Abseil-to-fix-build-on-Jetson.patch
+@@ -0,0 +1,68 @@
++From 40da87a0476436ca1da2eafe08935787a05e9a61 Mon Sep 17 00:00:00 2001
++From: David Dunleavy <[email protected]>
++Date: Mon, 5 Aug 2024 11:42:53 -0700
++Subject: [PATCH] Add patch for Abseil to fix build on Jetson
++
++Patches in https://github.com/abseil/abseil-cpp/commit/372124e6af36a540e74a2ec31d79d7297a831f98
++
++PiperOrigin-RevId: 659627531
++---
++ .../tsl/third_party/absl/nvidia_jetson.patch | 35 +++++++++++++++++++
++ .../tsl/third_party/absl/workspace.bzl | 1 +
++ 2 files changed, 36 insertions(+)
++ create mode 100644 third_party/tsl/third_party/absl/nvidia_jetson.patch
++
++diff --git a/third_party/tsl/third_party/absl/nvidia_jetson.patch b/third_party/tsl/third_party/absl/nvidia_jetson.patch
++new file mode 100644
++index 000000000000..5328c3a0d605
++--- /dev/null
+++++ b/third_party/tsl/third_party/absl/nvidia_jetson.patch
++@@ -0,0 +1,35 @@
+++From 372124e6af36a540e74a2ec31d79d7297a831f98 Mon Sep 17 00:00:00 2001
+++From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20Bastien?= <[email protected]>
+++Date: Thu, 1 Aug 2024 12:38:52 -0700
+++Subject: [PATCH] PR #1732: Fix build on NVIDIA Jetson board. Fix #1665
+++
+++Imported from GitHub PR https://github.com/abseil/abseil-cpp/pull/1732
+++
+++Fix build on NVIDIA Jetson board. Fix #1665
+++
+++This patch is already used by the spark project.
+++I'm fixing this as this break the build of Tensorflow and JAX on Jetson board.
+++Merge 7db2d2ab9fbed1f0fabad10a6ec73533ba71bfff into 6b8ebb35c0414ef5a2b6fd4a0f59057e41beaff9
+++
+++Merging this change closes #1732
+++
+++COPYBARA_INTEGRATE_REVIEW=https://github.com/abseil/abseil-cpp/pull/1732 from nouiz:fix_neon_on_jetson 7db2d2ab9fbed1f0fabad10a6ec73533ba71bfff
+++PiperOrigin-RevId: 658501520
+++Change-Id: If502ede4efc8c877fb3fed227eca6dc7622dd181
+++---
+++ absl/base/config.h | 2 +-
+++ 1 file changed, 1 insertion(+), 1 deletion(-)
+++
+++diff --git a/absl/base/config.h b/absl/base/config.h
+++index 97c9a22a109..ab1e9860a91 100644
+++--- a/absl/base/config.h
++++++ b/absl/base/config.h
+++@@ -926,7 +926,7 @@ static_assert(ABSL_INTERNAL_INLINE_NAMESPACE_STR[0] != 'h' ||
+++ // https://llvm.org/docs/CompileCudaWithLLVM.html#detecting-clang-vs-nvcc-from-code
+++ #ifdef ABSL_INTERNAL_HAVE_ARM_NEON
+++ #error ABSL_INTERNAL_HAVE_ARM_NEON cannot be directly set
+++-#elif defined(__ARM_NEON) && !defined(__CUDA_ARCH__)
++++#elif defined(__ARM_NEON) && !(defined(__NVCC__) && defined(__CUDACC__))
+++ #define ABSL_INTERNAL_HAVE_ARM_NEON 1
+++ #endif
+++
++diff --git a/third_party/tsl/third_party/absl/workspace.bzl b/third_party/tsl/third_party/absl/workspace.bzl
++index 06f75166ce4b..9565a82c3319 100644
++--- a/third_party/tsl/third_party/absl/workspace.bzl
+++++ b/third_party/tsl/third_party/absl/workspace.bzl
++@@ -44,4 +44,5 @@ def repo():
++ system_link_files = SYS_LINKS,
++ strip_prefix = "abseil-cpp-{commit}".format(commit = ABSL_COMMIT),
++ urls = tf_mirror_urls("https://github.com/abseil/abseil-cpp/archive/{commit}.tar.gz".format(commit = ABSL_COMMIT)),
+++ patch_file = ["//third_party/absl:nvidia_jetson.patch"],
++ )
++--
++2.31.1
++
+diff --git a/third_party/xla/workspace.bzl b/third_party/xla/workspace.bzl
+index af52e7671507..70481bc970a5 100644
+--- a/third_party/xla/workspace.bzl
++++ b/third_party/xla/workspace.bzl
+@@ -29,6 +29,7 @@ def repo():
+ name = "xla",
+ sha256 = XLA_SHA256,
+ strip_prefix = "xla-{commit}".format(commit = XLA_COMMIT),
++ patch_file = ["//third_party/xla:0001-Add-patch-for-Abseil-to-fix-build-on-Jetson.patch"],
+ urls = tf_mirror_urls("https://github.com/openxla/xla/archive/{commit}.tar.gz".format(commit = XLA_COMMIT)),
+ )
+
+--
+2.31.1
+
diff --git a/var/spack/repos/builtin/packages/py-jaxlib/package.py b/var/spack/repos/builtin/packages/py-jaxlib/package.py
index 09eb522c56a..ce7f217e8fe 100644
--- a/var/spack/repos/builtin/packages/py-jaxlib/package.py
+++ b/var/spack/repos/builtin/packages/py-jaxlib/package.py
@@ -127,7 +127,7 @@ class PyJaxlib(PythonPackage, CudaPackage, ROCmPackage):
sha256="4dfb9f32d4eeb0a0fb3a6f4124c4170e3fe49511f1b768cd634c78d489962275",
when="@:0.4.25",
)
-
+ patch("jaxxlatsl.patch")
conflicts(
"cuda_arch=none",
when="+cuda", |
|
You can assign me as a reviewer on these and I will merge them |
I think we're probably best off at least separating the efforts and get this in for aarch64/cuda. Thoughts? |
|
@afzpatel helped add our x86_64 ROCm pipeline and may be willing to work on an aarch64 ROCm pipeline in a separate PR. I also want to get to ppc64le soon. |
I'm certainly willing to help with this. |
|
rocminfo is next. At this point, we basically need to hack every single package in the ROCm stack. I also want to make sure we're contributing these changes upstream. Let me remove all of the ROCm stuff from this PR and we can tackle ROCm another day. |
248abea to
c2a0c46
Compare
Hi @adamjstewart, ARM architecture based build is not supported in ROCm as of today. If there is any change in plan it will be announced through ROCm release notes. |
Reboot of #34300
Hoping there will be fewer build issues this time around.