Skip to content

BUG: mkl errors and segfaults with numpy 1.26.0 on i9-10900K #24846

@sn6526

Description

@sn6526

Describe the issue:

This is cpu architecture and numpy version dependent. I observe these issues on an i9-10900K with numpy 1.26.0. I do not observe an issue on the same i9-10900K and numpy 1.25.2 - all other factors unchanged. I do not observe any issues on an i5-4300U laptop (same version for os, python, numpy - 1.25.2 or 1.26.0, mkl, and test script). See context below for additional detail.

os: Ubuntu 22.04.3 LTS
python (in a virtual env): Python 3.11.4
note that I use "-march=native" amongst other flags for both python and numpy builds on both cpu's.
mkl: 2023.2.0

Reproduce the code example:

import numpy as np                                                               
from time import time                                                            
import sysconfig                                                                 
import psutil                                                                    
import re                                                                        

mkl = True                                                                                  
# Print numpy see whether mkl/blas is available                                  
np.show_config()                                                                 
print("python compile flags")                                                    
print(sysconfig.get_config_var('CFLAGS'))                                        
                                                                                 
re_cpu = re.compile("^model name        : (.*)")                                 
with open('/proc/cpuinfo') as f:                                                 
    for line in f:                                                               
        model = re_cpu.match(line)                                               
        if model:                                                                
            print(model.group(1))                                                
            break                                                                
                                                                                 
print("mkl: %s" % mkl)                                                           
print("physical cores: %s" % psutil.cpu_count())                                 
print("logical cores: %s" % psutil.cpu_count(logical=True))                      
print("cpu min freq: %s" % psutil.cpu_freq().min)                                
print("cpu current freq: %s" % psutil.cpu_freq().current)                        
print("cpu max freq: %s" % psutil.cpu_freq().max)                                
print("load average: %s %s %s" % psutil.getloadavg())                            
print("Total memory: %s GB" % round(psutil.virtual_memory().total/1000000000, 2))                                                                                
print("Available memory: %s GB" % round(psutil.virtual_memory().available/1000000000, 2))                                                                        

np.random.seed(0)                                                                
                                                                                 
size = 4096                                                                      
A, B = np.random.random((size, size)), np.random.random((size, size))            
C, D = np.random.random((size * 128,)), np.random.random((size * 128,))          
E = np.random.random((int(size / 2), int(size / 4)))                             
F = np.random.random((int(size / 2), int(size / 2)))                             
F = np.dot(F, F.T)                                                               
G = np.random.random((int(size / 2), int(size / 2)))                             
                                                                                 
# Matrix multiplication                                                          
N = 10                                                                           
t = time()                                                                       
for i in range(N):                                                               
    np.dot(A, B)                                                                 
delta = time() - t                                                               
print('Dotted two %dx%d matrices in %0.2f s.' % (size, size, delta / N))         
del A, B                                                                         
# Vector multiplication                                                          
N = 10                                                                           
t = time()                                                                       
for i in range(N):                                                               
    np.dot(C, D)                                                                 
delta = time() - t                                                               
print('Dotted two vectors of length %d in %0.2f ms.' % (size * 128, 1e3 * delta / N))                                                                            
del C, D                                                                         
                                                                                 
# Singular Value Decomposition (SVD)                                             
N = 3                                                                            
t = time()                                                                       
for i in range(N):                                                               
    np.linalg.svd(E, full_matrices = False)                                      
delta = time() - t                                                               
print("SVD of a %dx%d matrix in %0.2f s." % (size / 2, size / 4, delta / N))     
del E

# Cholesky Decomposition                                                         
N = 3                                                                            
t = time()                                                                       
for i in range(N):                                                               
    np.linalg.cholesky(F)                                                        
delta = time() - t                                                               
print("Cholesky decomposition of a %dx%d matrix in %0.2f s." % (size / 2, size /\
 2, delta / N))

"""tests continue, but segfault on i9-10900K, numpy 1.26 at "Cholesky Decomposition""""

Error message:

MKL_VERBOSE=1, there is no "traceback", full output from numpy 1.26 env in the hope it helps...

MKL_VERBOSE oneMKL 2023.0 Update 2 Product build 20230613 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.70GHz ilp64 intel_thread
MKL_VERBOSE SDOT(2,0x556cea252820,1,0x556cea252820,1) 765.28us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
Build Dependencies:
  blas:
    detection method: cmake
    found: true
    name: MKL
    version: 2023.2.0
  lapack:
    detection method: cmake
    found: true
    name: MKL
    version: 2023.2.0
Compilers:
  c:
    commands: ccache, cc
    linker: ld.bfd
    name: gcc
    version: 11.4.0
  c++:
    commands: ccache, c++
    linker: ld.bfd
    name: gcc
    version: 11.4.0
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 3.0.2
Machine Information:
  build:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
  host:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
Python Information:
  path: /tmp/build-env-3t41cf80/bin/python
  version: '3.11'
SIMD Extensions:
  baseline:
  - SSE
  - SSE2
  - SSE3
  - SSSE3
  - SSE41
  - POPCNT
  - SSE42
  - AVX
  - F16C
  - FMA3
  - AVX2
  not found:
  - AVX512F
  - AVX512CD
  - AVX512_KNL
  - AVX512_KNM
  - AVX512_SKX
  - AVX512_CLX
  - AVX512_CNL
  - AVX512_ICL

python compile flags
-Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -O3 -march=native -O3 -march=native
Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz
mkl: True
physical cores: 10
logical cores: 10
cpu min freq: 800.0
cpu current freq: 4395.1093
cpu max freq: 5300.0
load average: 1.39501953125 0.7900390625 0.330078125
Total memory: 33.48 GB
Available memory: 31.87 GB
MKL_VERBOSE DSYRK(L,T,2048,2048,0x7fff7c2069a0,0x7f9d51624010,2048,0x7fff7c2069a8,0x7f9d4f623010,2048) 18.58ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 281.93ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 280.20ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 285.86ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 281.18ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 283.32ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 279.40ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 286.43ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 281.06ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 281.67ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DGEMM(N,N,4096,4096,4096,0x7fff7c2069b8,0x7f9d54e28010,4096,0x7f9d5ce29010,4096,0x7fff7c2069c0,0x7f9d1ffff010,4096) 285.34ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
Dotted two 4096x4096 matrices in 0.29 s.
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 352.45us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 173.90us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 29.56us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 28.30us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 28.26us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 27.08us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 26.16us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 28.40us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 26.52us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
MKL_VERBOSE DDOT(524288,0x7f9d54a27010,1,0x7f9d54626010,1) 25.99us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
Dotted two vectors of length 524288 in 0.08 ms.

Intel MKL ERROR: Parameter 10 was incorrect on entry to DGESDD.
MKL_VERBOSE DGESDD(S,4398046513152,8796093023232,0x7f9d60e1d010,8796093024256,0x7f9d61e1d010,0x7f9d61e1f010,4398046513152,0x7f9d62e1f010,-4294966272,0x7fff7c204cd0,8944236082153652223,0x7f9d6361f010,-10) 42.27us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
init_gesdd failed init

Intel MKL ERROR: Parameter 10 was incorrect on entry to DGESDD.
MKL_VERBOSE DGESDD(S,4398046513152,8796093023232,0x7f9d6261f010,8796093024256,0x7f9d6361f010,0x7f9d63621010,4398046513152,0x7f9d64621010,-4294966272,0x7fff7c204cd0,8944236082153652223,0x7f9d64e21010,-10) 8.10us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
init_gesdd failed init

Intel MKL ERROR: Parameter 10 was incorrect on entry to DGESDD.
MKL_VERBOSE DGESDD(S,4398046513152,8796093023232,0x7f9d6261f010,8796093024256,0x7f9d6361f010,0x7f9d63621010,4398046513152,0x7f9d64621010,-4294966272,0x7fff7c204cd0,8944236082153652223,0x7f9d64e21010,-10) 6.55us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:10
init_gesdd failed init
SVD of a 2048x1024 matrix in 0.00 s.
Segmentation fault (core dumped)

Runtime information:

MKL_VERBOSE oneMKL 2023.0 Update 2 Product build 20230613 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.70GHz ilp64 intel_thread
MKL_VERBOSE SDOT(2,0x5568bfa04550,1,0x5568bfa04550,1) 723.91us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
1.26.0
3.11.4 (main, Jul 26 2023, 10:09:23) [GCC 11.3.0]

Context for the issue:

Note that suitesparse 7.2.0 demos (not python) built with the same mkl installation all complete without error on the i9-10900K. scikit-sparse and scikits.odes tests (python based and linked against the same suitesparse libs), also now error and seg fault with numpy 1.26.0 but not numpy 1.25.2.

EDIT: different errors for scikit-sprase. e.g.

nose2 -v sksparse
... output truncated ...
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x5582018360b8,4,0x55820160dee8,4) 91ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x5582018360c0,4,0x55820160def0,4) 61ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x5582018360c8,4,0x55820160def8,4) 50ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
ok
sksparse.test_cholmod.test_cholesky_matrix_market ... /home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:189: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
answer = np.linalg.lstsq(X.todense(), y)[0]

Intel MKL ERROR: Parameter 5 was incorrect on entry to DGELSD.
MKL_VERBOSE DGELSD(1374389535753,4294967616,72151610872037377,0x7f0d8b96c010,1033,0x7f0d8bbf1a10,140728898421769,0x7f0d8bbf3a58,0x7f0da1f65d80,139696528745944,0x7ffee9469e40,139698106269695,0x7ffee9469de0,-5) 31.53us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
init_gelsd failed init
FAIL
sksparse.test_cholmod.test_cholesky_smoke_test ... /home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:66: CholmodTypeConversionWarning: converting matrix of class dia_matrix to CSC format
f = cholesky(sparse.eye(10, 10))
dense
sparse
MKL_VERBOSE DAXPY(20,0x7ffee946acc8,0x5582018e0040,1,0x558201858840,1) 1.05us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
csr
/home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:76: CholmodTypeConversionWarning: converting matrix of class csr_matrix to CSC format
assert sparse.issparse(f(s_csr))
/home/ul/.pyenv/versions/3.11.4/envs/p3114.mkl/lib/python3.11/site-packages/sksparse/test_cholmod.py:77: CholmodTypeConversionWarning: converting matrix of class csr_matrix to CSC format
assert_allclose(f(s_csr).todense(), s_csr.todense())
MKL_VERBOSE DAXPY(20,0x7ffee946acc8,0x5582018e0040,1,0x558201858840,1) 292ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
extract
ok
sksparse.test_cholmod.test_complex ... MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de50,4,0x55820160dee0,4) 341ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de58,4,0x55820160dee8,4) 147ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de60,4,0x55820160def0,4) 37ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
MKL_VERBOSE DAXPY(4,0x7ffee946ace8,0x55820160de68,4,0x55820160def8,4) 37ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:10
Segmentation fault (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions