This repository was archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
Assertion fail when creating very large matrix #10158
Copy link
Copy link
Closed
Description
Description
MXNet crashes with assertion failure when creating matrix with more than 4 billion entries.
MXNetError: [17:43:16] include/mxnet/././tensor_blob.h:276: Check failed: this->shape_.Size() == shape.Size() (4352000000 vs. 57032704) TBlob.get_with_shape: new and old shape do not match total elements
Environment info (Required)
----------Python Info----------
Version : 3.6.4
Compiler : GCC 7.2.0
Build : ('default', 'Jan 16 2018 18:10:19')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 9.0.1
Directory : /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.1.0
Directory : /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet
Commit Hash : 07a83a0325a3d782513a04f47d711710972cb144
----------System Info----------
Platform : Linux-4.4.0-1052-aws-x86_64-with-debian-stretch-sid
system : Linux
node : ip-172-31-14-183
release : 4.4.0-1052-aws
version : #61-Ubuntu SMP Mon Feb 12 23:05:58 UTC 2018
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2699.984
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.10
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single retpoline kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0018 sec, LOAD: 1.3588 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0643 sec, LOAD: 0.1102 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.2234 sec, LOAD: 0.1722 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0266 sec, LOAD: 0.1238 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0093 sec, LOAD: 0.1161 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0105 sec, LOAD: 0.0586 sec.
Package used (Python/R/Scala/Julia):
(I'm using ...)
For Scala user, please provide:
1. Java version: (`java -version`)
2. Maven version: (`mvn -version`)
3. Scala runtime if applicable: (`scala -version`)
For R user, please provide R sessionInfo():
Error Message:
---------------------------------------------------------------------------
MXNetError Traceback (most recent call last)
<ipython-input-2-4d3e062d9a75> in <module>()
----> 1 print(mx.nd.zeros(shape=(34000000,128)))
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py in __repr__(self)
180 """Returns a string representation of the array."""
181 shape_info = 'x'.join(['%d' % x for x in self.shape])
--> 182 return '\n%s\n<%s %s @%s>' % (str(self.asnumpy()),
183 self.__class__.__name__,
184 shape_info, self.context)
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py in asnumpy(self)
1791 self.handle,
1792 data.ctypes.data_as(ctypes.c_void_p),
-> 1793 ctypes.c_size_t(data.size)))
1794 return data
1795
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/base.py in check_call(ret)
144 """
145 if ret != 0:
--> 146 raise MXNetError(py_str(_LIB.MXGetLastError()))
147
148
MXNetError: [17:43:16] include/mxnet/././tensor_blob.h:276: Check failed: this->shape_.Size() == shape.Size() (4352000000 vs. 57032704) TBlob.get_with_shape: new and old shape do not match total elements
Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x276938) [0x7f2820f26938]
[bt] (1) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x276d48) [0x7f2820f26d48]
[bt] (2) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2a61c8) [0x7f2820f561c8]
[bt] (3) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x496e80) [0x7f2821146e80]
[bt] (4) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x25cbd8c) [0x7f282327bd8c]
[bt] (5) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x259f54d) [0x7f282324f54d]
[bt] (6) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(MXNDArraySyncCopyToCPU+0xa) [0x7f282303dd3a]
[bt] (7) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f28b2a06ec0]
[bt] (8) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f28b2a0687d]
[bt] (9) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f28b2c1bdee]
Minimum reproducible example
print(mx.nd.zeros(shape=(34000000,128)))
Steps to reproduce
Seems to be a problem instantiating a matrix with more than 4B entries. I've tried mx.nd.zeros, and mx.random.uniform -- both do about the same thing. If the number of entries is less than 2^32 it's fine.
Metadata
Metadata
Assignees
Labels
No labels