🐛 Describe the bug
The torch==2.5.0 PyPI wheel distribution logs an error message to the console on import on arm64 platforms.
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
>>> torch.__version__
'2.5.0'
The import succeeds and the console error does not appear to impact functionality. The issue is not reproducible after downgrading to torch==2.4.1.
Running cpuinfo directly does not return an error:
$ cpuinfo
Python Version: 3.10.12.final.0 (64 bit)
Cpuinfo Version: 9.0.0
...
Reproduced across two local arm64 systems, could not reproduce on x86.
Versions
Running in a Docker container based on nvcr.io/nvidia/clara-holoscan/holoscan:v2.5.0-dgpu
PyTorch version: 2.5.0
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.2 LTS (aarch64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Clang version: Could not collect
CMake version: version 3.24.0
Libc version: glibc-2.35
Python version: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] (64-bit runtime)
Is CUDA available: False
CUDA runtime version: 12.2.128
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA RTX 6000 Ada Generation
Nvidia driver version: 535.183.01
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.9.4
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.9.4
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.9.4
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.9.4
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.9.4
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.9.4
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.9.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: ARM
Model name: Cortex-A78AE
Model: 1
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 3
Stepping: r0p1
CPU max MHz: 1971.2000
CPU min MHz: 115.2000
BogoMIPS: 62.50
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm paca pacg
L1d cache: 768 KiB (12 instances)
L1i cache: 768 KiB (12 instances)
L2 cache: 3 MiB (12 instances)
L3 cache: 6 MiB (3 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-11
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] onnx==1.17.0
[pip3] torch==2.5.0
[pip3] torchvision==0.20.0
[conda] Could not collect
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @malfet @snadampal @milpuz01
🐛 Describe the bug
The
torch==2.5.0PyPI wheel distribution logs an error message to the console on import on arm64 platforms.The import succeeds and the console error does not appear to impact functionality. The issue is not reproducible after downgrading to
torch==2.4.1.Running
cpuinfodirectly does not return an error:Reproduced across two local arm64 systems, could not reproduce on x86.
Versions
Running in a Docker container based on nvcr.io/nvidia/clara-holoscan/holoscan:v2.5.0-dgpu
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @malfet @snadampal @milpuz01