-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Description
Describe the issue:
This is cpu architecture and numpy version dependent. I observe these issues on an i9-10900K with numpy 1.26.0. I do not observe an issue on the same i9-10900K and numpy 1.25.2 - all other factors unchanged. I do not observe any issues on an i5-4300U laptop (same version for os, python, numpy - 1.25.2 or 1.26.0, mkl, and test script). See context below for additional detail.
os: Ubuntu 22.04.3 LTS
python (in a virtual env): Python 3.11.4
note that I use "-march=native" amongst other flags for both python and numpy builds on both cpu's.
mkl: 2023.2.0
Reproduce the code example:
import numpy as np
from time import time
import sysconfig
import psutil
import re
mkl = True
# Print numpy see whether mkl/blas is available
np.show_config()
print("python compile flags")
print(sysconfig.get_config_var('CFLAGS'))
re_cpu = re.compile("^model name : (.*)")
with open('/proc/cpuinfo') as f:
for line in f:
model = re_cpu.match(line)
if model:
print(model.group(1))
break
print("mkl: %s" % mkl)
print("physical cores: %s" % psutil.cpu_count())
print("logical cores: %s" % psutil.cpu_count(logical=True))
print("cpu min freq: %s" % psutil.cpu_freq().min)
print("cpu current freq: %s" % psutil.cpu_freq().current)
print("cpu max freq: %s" % psutil.cpu_freq().max)
print("load average: %s %s %s" % psutil.getloadavg())
print("Total memory: %s GB" % round(psutil.virtual_memory().total/1000000000, 2))
print("Available memory: %s GB" % round(psutil.virtual_memory().available/1000000000, 2))
np.random.seed(0)
size = 4096
A, B = np.random.random((size, size)), np.random.random((size, size))
C, D = np.random.random((size * 128,)), np.random.random((size * 128,))
E = np.random.random((int(size / 2), int(size / 4)))
F = np.random.random((int(size / 2), int(size / 2)))
F = np.dot(F, F.T)
G = np.random.random((int(size / 2), int(size / 2)))
# Matrix multiplication
N = 10
t = time()
for i in range(N):
np.dot(A, B)
delta = time() - t
print('Dotted two %dx%d matrices in %0.2f s.' % (size, size, delta / N))
del A, B
# Vector multiplication
N = 10
t = time()
for i in range(N):
np.dot(C, D)
delta = time() - t
print('Dotted two vectors of length %d in %0.2f ms.' % (size * 128, 1e3 * delta / N))
del C, D
# Singular Value Decomposition (SVD)
N = 3
t = time()
for i in range(N):
np.linalg.svd(E, full_matrices = False)
delta = time() - t
print("SVD of a %dx%d matrix in %0.2f s." % (size / 2, size / 4, delta / N))
del E
# Cholesky Decomposition
N = 3
t = time()
for i in range(N):
np.linalg.cholesky(F)
delta = time() - t
print("Cholesky decomposition of a %dx%d matrix in %0.2f s." % (size / 2, size /\
2, delta / N))
"""tests continue, but segfault on i9-10900K, numpy 1.26 at "Cholesky Decomposition""""Error message:
MKL_VERBOSE=1, there is no "traceback", full output from numpy 1.26 env in the hope it helps...
MKL_VERBOSE oneMKL 2023.0 Update 2 Product build 20230613 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.70GHz ilp64 intel_thread
MKL_VERBOSE SDOT(2,0x556cea252820,1,0x556cea252820,1) 765.28us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
Build Dependencies:
blas:
detection method: cmake
found: true
name: MKL
version: 2023.2.0
lapack:
detection method: cmake
found: true
name: MKL
version: 2023.2.0
Compilers:
c:
commands: ccache, cc
linker: ld.bfd
name: gcc
version: 11.4.0
c++:
commands: ccache, c++
linker: ld.bfd
name: gcc
version: 11.4.0
cython:
commands: cython
linker: cython
name: cython
version: 3.0.2
Machine Information:
build:
cpu: x86_64
endian: little
family: x86_64
system: linux
host:
cpu: x86_64
endian: little
family: x86_64
system: linux
Python Information:
path: /tmp/build-env-3t41cf80/bin/python
version: '3.11'
SIMD Extensions:
baseline:
- SSE
- SSE2
- SSE3
- SSSE3
- SSE41
- POPCNT
- SSE42
- AVX
- F16C
- FMA3
- AVX2
not found:
- AVX512F
- AVX512CD
- AVX512_KNL
- AVX512_KNM
- AVX512_SKX
- AVX512_CLX
- AVX512_CNL
- AVX512_ICL
python compile flags
-Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -O3 -march=native -O3 -march=native
Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz
mkl: True
physical cores: 10
logical cores: 10
cpu min freq: 800.0
cpu current freq: 4395.1093
cpu max freq: 5300.0
load average: 1.39501953125 0.7900390625 0.330078125
Total memory: 33.48 GB
Available memory: 31.87 GB
MKL_VERBOSE DSYRK(L,T,2048,2048,0x7fff7c2069a0,0x7f9d51624010,2048,0x7fff7c2069a8,0x7f9d4f623010,2048) 18.58ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 281.93ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 280.20ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 285.86ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 281.18ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 283.32ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 279.40ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 286.43ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 281.06ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 281.67ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 285.34ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
Dotted two 4096x4096 matrices in 0.29 s.
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 352.45us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 173.90us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 29.56us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 28.30us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 28.26us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 27.08us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 26.16us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 28.40us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 26.52us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 25.99us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
Dotted two vectors of length 524288 in 0.08 ms.
Intel MKL ERROR: Parameter 10 was incorrect on entry to DGESDD.
MKL_VERBOSE DGESDD(S,4398046513152,8796093023232,0x7f9d60e1d010,8796093024256,0x7f9d61e1d010,0x7f9d61e1f010,4398046513152,0x7f9d62e1f010,-4294966272,0x7fff7c204cd0,8944236082153652223,0x7f9d6361f010,-10) 42.27us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
init_gesdd failed init
Intel MKL ERROR: Parameter 10 was incorrect on entry to DGESDD.
MKL_VERBOSE DGESDD(S,4398046513152,8796093023232,0x7f9d6261f010,8796093024256,0x7f9d6361f010,0x7f9d63621010,4398046513152,0x7f9d64621010,-4294966272,0x7fff7c204cd0,8944236082153652223,0x7f9d64e21010,-10) 8.10us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
init_gesdd failed init
Intel MKL ERROR: Parameter 10 was incorrect on entry to DGESDD.
MKL_VERBOSE DGESDD(S,4398046513152,8796093023232,0x7f9d6261f010,8796093024256,0x7f9d6361f010,0x7f9d63621010,4398046513152,0x7f9d64621010,-4294966272,0x7fff7c204cd0,8944236082153652223,0x7f9d64e21010,-10) 6.55us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
init_gesdd failed init
SVD of a 2048x1024 matrix in 0.00 s.
Segmentation fault (core dumped)Runtime information:
MKL_VERBOSE oneMKL 2023.0 Update 2 Product build 20230613 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.70GHz ilp64 intel_thread
MKL_VERBOSE SDOT(2,0x5568bfa04550,1,0x5568bfa04550,1) 723.91us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
1.26.0
3.11.4 (main, Jul 26 2023, 10:09:23) [GCC 11.3.0]
Context for the issue:
Note that suitesparse 7.2.0 demos (not python) built with the same mkl installation all complete without error on the i9-10900K. scikit-sparse and scikits.odes tests (python based and linked against the same suitesparse libs), also now error and seg fault with numpy 1.26.0 but not numpy 1.25.2.
EDIT: different errors for scikit-sprase. e.g.
nose2 -v sksparse
... output truncated ...
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x5582018360b8,4,0x55820160dee8,4) 91ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x5582018360c0,4,0x55820160def0,4) 61ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x5582018360c8,4,0x55820160def8,4) 50ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
ok
sksparse.test_cholmod.test_cholesky_matrix_market ... /home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:189: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
answer = np.linalg.lstsq(X.todense(), y)[0]
Intel MKL ERROR: Parameter 5 was incorrect on entry to DGELSD.
MKL_VERBOSE DGELSD(1374389535753,4294967616,72151610872037377,0x7f0d8b96c010,1033,0x7f0d8bbf1a10,140728898421769,0x7f0d8bbf3a58,0x7f0da1f65d80,139696528745944,0x7ffee9469e40,139698106269695,0x7ffee9469de0,-5) 31.53us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
init_gelsd failed init
FAIL
sksparse.test_cholmod.test_cholesky_smoke_test ... /home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:66: CholmodTypeConversionWarning: converting matrix of class dia_matrix to CSC format
f = cholesky(sparse.eye(10, 10))
dense
sparse
MKL_VERBOSE DAXPY(20,0x7ffee946acc8,0x5582018e0040,1,0x558201858840,1) 1.05us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
csr
/home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:76: CholmodTypeConversionWarning: converting matrix of class csr_matrix to CSC format
assert sparse.issparse(f(s_csr))
/home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:77: CholmodTypeConversionWarning: converting matrix of class csr_matrix to CSC format
assert_allclose(f(s_csr).todense(), s_csr.todense())
MKL_VERBOSE DAXPY(20,0x7ffee946acc8,0x5582018e0040,1,0x558201858840,1) 292ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
extract
ok
sksparse.test_cholmod.test_complex ... MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de50,4,0x55820160dee0,4) 341ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de58,4,0x55820160dee8,4) 147ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de60,4,0x55820160def0,4) 37ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de68,4,0x55820160def8,4) 37ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
Segmentation fault (core dumped)