BLD: use BUFFERSIZE=20 in OpenBLAS#17759
Conversation
|
Thanks Matti, let's see how it goes. The relevant wheels should get built tonight. |
|
The wheels are up at https://anaconda.org/scipy-wheels-nightly/numpy. @moylop260, @MarkBel could you test them out? |
|
FYI our CIs is reproducing an error I have not checked what is the output but I think it is related. If you have a script to build the package like the wheel I can run it before to release it if you want. Or if you want ssh access just write me Regards! Better, let me check if the error is related since that the numpy version installed is numpy-1.19.4-cp36-cp36m-manylinux2010_x86_64.whl |
|
@moylop260 You can install the nightly wheels like so: |
|
Result: |
|
Reverting to |
|
@moylop260 when it segfaults, is the docker using all the memory allocated to it? |
|
Also - could you try |
without environment variables
The result was after [Thread 0x7fcfbcba8700 (LWP 28452) exited]
[Thread 0x7fcf78ba0700 (LWP 28460) exited]
[New Thread 0x7fced738d700 (LWP 28484)]
Unhandled exception in thread started by <bound method Thread._bootstrap of <Thread(odoo.service.httpd, initial daemon)>>
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
MemoryError
libgcc_s.so.1 must be installed for pthread_cancel to work
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fced738d700 (LWP 28484)]
0x00007fcfd9cf6c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6with
|
|
You have access to ssh with your github public keys. You can use the following command to connect:
In order to reproduce the error just run the following command:
after 3 seconds you will see the error. NOTE: You can use the following image: But I can't reproduce it using other kind of processors even if they are using the same docker-version and so on. But you are lucky and you can reproduce it so you can use: Into the container: |
|
Thanks. I tried it out. The machine seems to only have ~8.5GB memory free. With that little memory available, you should limit the number of threads |
|
Running with the environment variable Thank you! The weird part here is that we are not using numpy directly it is just imported Is it an expected behaviour? |
|
We have another server where all processors were used at 100% just importing What environment variables should I set in order to fix entirely the processors and memory overload (Considering that we are using just Is possible to set the lower possible values by default? I mean, I don't know I just trying to bypassing our production errors. We have an auto-scaling server deploys but using Thanks in advance! |
Sounds like there is some pre-allocation going on. |
That would be a real loss IMO for most users who expect BLAS in NumPy to be MT be default. It may be possible to consider something more modest like 8, similar to what NumExpr does. |
One solution to your problem is to build NumPy from source without BLAS. This will have minimal memory usage and will be just as fast as wheels if you don't use BLAS, which it sounds like you may not. |
Currently, if numpy is available in the modules even if you are not using it
Odoo try to compile and the system is down only for a type of processor
Currently we know 2 server reproducing the error:
B&F-production
Runbot
More info about:
numpy/numpy#17674
numpy/numpy#17759
It is reproducing in the following MR:
https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197
Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end=
OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server),
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
Currently, if numpy is available in the modules even if you are not using it
Odoo try to compile and the system is down only for a type of processor
Currently we know 2 server reproducing the error:
B&F-production
Runbot
More info about:
numpy/numpy#17674
numpy/numpy#17759
It is reproducing in the following MR:
https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197
Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end=
OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server),
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
Currently, if numpy is available in the modules even if you are not using it
Odoo try to compile and the system is down only for a type of processor
Currently we know 2 server reproducing the error:
B&F-production
Runbot
More info about:
numpy/numpy#17674
numpy/numpy#17759
It is reproducing in the following MR:
https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197
Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end=
OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server),
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
The reason of websocket-client was deactivated is: numpy has the following issue: - numpy/numpy#13059 It is a corner case using a kind of processor, using docker and using python3 More info about: - numpy/numpy#17674 - numpy/numpy#17759 But who is using numpy? There are different projects using libraries that depends of numpy: ./web/requirements.txt:2:bokeh==1.1.0 ./reporting-engine/requirements.txt:1:altair ./icm/requirements.txt:1:pandas ./maintainer-quality-tools/requirements.txt:7:websocket-client So, if we run odoo-bin with loglevel=debug to know what is the last line before to crash. It was the path: - Last logging https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/loading.py#L152 - Using pdb I have trace the following line https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/module.py#L368 - The last module imported was `resource` https://github.com/odoo/odoo/tree/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource - I removed all imports and I reproduced the error again and again because one change fixed the issue It was when I commented the following import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource/tests/common.py#L4 - I started to debug in this file line by line, so finally I found the problematic import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/tests/common.py#L50 - But now what is the reason this is raising the error. It is because the following line: https://github.com/websocket-client/websocket-client/blob/29c15714ac9f5272e1adefc9c99b83420b409f63/websocket/_abnf.py#L34 is importing numpy if you are using python3 numpy is installed because of the requirements.txt files above and the disaster was did. We could have removed all numpy requirements but there are a lot of them. But we decided that better option was avoid to import the websocket line that import numpy (faster solution) non-installing websocket-client. However, after researching, we found that: OpenBLAS creates a number of threads equal to the number of core threads available, so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue. After a test building an image to reproduce the error and using that environment variable and it was fixed. That change was applied in following PRs: - Vauxoo/docker-ubuntu-base#89 - Vauxoo/docker-ubuntu-base#90 With change applied in docker-ubuntu-base, it's not neccesary avoid to import websocket-client (allow JS tests work again), we are covered with env var OPENBLAS_NUM_THREADS.
The reason of websocket-client was deactivated is: numpy has the following issue: - numpy/numpy#13059 It is a corner case using a kind of processor, using docker and using python3 More info about: - numpy/numpy#17674 - numpy/numpy#17759 But who is using numpy? There are different projects using libraries that depends of numpy: ./web/requirements.txt:2:bokeh==1.1.0 ./reporting-engine/requirements.txt:1:altair ./icm/requirements.txt:1:pandas ./maintainer-quality-tools/requirements.txt:7:websocket-client So, if we run odoo-bin with loglevel=debug to know what is the last line before to crash. It was the path: - Last logging https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/loading.py#L152 - Using pdb I have trace the following line https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/module.py#L368 - The last module imported was `resource` https://github.com/odoo/odoo/tree/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource - I removed all imports and I reproduced the error again and again because one change fixed the issue It was when I commented the following import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource/tests/common.py#L4 - I started to debug in this file line by line, so finally I found the problematic import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/tests/common.py#L50 - But now what is the reason this is raising the error. It is because the following line: https://github.com/websocket-client/websocket-client/blob/29c15714ac9f5272e1adefc9c99b83420b409f63/websocket/_abnf.py#L34 is importing numpy if you are using python3 numpy is installed because of the requirements.txt files above and the disaster was did. We could have removed all numpy requirements but there are a lot of them. But we decided that better option was avoid to import the websocket line that import numpy (faster solution) non-installing websocket-client. However, after researching, we found that: OpenBLAS creates a number of threads equal to the number of core threads available, so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue. After a test building an image to reproduce the error and using that environment variable and it was fixed. That change was applied in following PRs: - Vauxoo/docker-ubuntu-base#89 - Vauxoo/docker-ubuntu-base#90 With change applied in docker-ubuntu-base, it's not neccesary avoid to import websocket-client (allow JS tests work again), we are covered with env var OPENBLAS_NUM_THREADS.
Currently, if numpy is available in the modules even if you are not using it
Odoo try to compile and the system is down only for a type of processor
Currently we know 2 server reproducing the error:
B&F-production
Runbot
More info about:
numpy/numpy#17674
numpy/numpy#17759
It is reproducing in the following MR:
https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197
Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end=
OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server),
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
Currently, if numpy is available in the modules even if you are not using it
Odoo try to compile and the system is down only for a type of processor
Currently we know 2 server reproducing the error:
B&F-production
Runbot
More info about:
numpy/numpy#17674
numpy/numpy#17759
It is reproducing in the following MR:
https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197
Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end=
OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server),
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.
Currently, if numpy is available in the modules even if you are not using it
Odoo try to compile and the system is down only for a type of processor
Currently we know 2 server reproducing the error:
B&F-production
Runbot
More info about:
numpy/numpy#17674
numpy/numpy#17759
It is reproducing in the following MR:
https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197
Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end=
OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server),
so it quickly reached limit_memory_hard
and the process was killed (SIGSEGV)
Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.




xref OpenMathLib/OpenBLAS#2970 where it was suggested to compile OpenBLAS with BUFFERSIZE=20 to revert the memory footprint to what it was in OpenBLAS 0.3.9 (we now use 0.3.12). This was done in MacPython/openblas-libs#46, and this PR uses it in NumPy.
xref issue gh-17674, gh-17684 which triggered the discussion. Once we have wheels that use this, we should ask the reporters on those issues @moylop260 and @MarkBel to try it out.