BLD: use BUFFERSIZE=20 in OpenBLAS by mattip · Pull Request #17759 · numpy/numpy

mattip · 2020-11-12T10:44:50Z

xref OpenMathLib/OpenBLAS#2970 where it was suggested to compile OpenBLAS with BUFFERSIZE=20 to revert the memory footprint to what it was in OpenBLAS 0.3.9 (we now use 0.3.12). This was done in MacPython/openblas-libs#46, and this PR uses it in NumPy.

xref issue gh-17674, gh-17684 which triggered the discussion. Once we have wheels that use this, we should ask the reporters on those issues @moylop260 and @MarkBel to try it out.

charris · 2020-11-14T22:32:17Z

Thanks Matti, let's see how it goes. The relevant wheels should get built tonight.

mattip · 2020-11-15T16:46:12Z

The wheels are up at https://anaconda.org/scipy-wheels-nightly/numpy. @moylop260, @MarkBel could you test them out?

moylop260 · 2020-11-15T23:55:41Z

FYI our CIs is reproducing an error

I have not checked what is the output but I think it is related.

If you have a script to build the package like the wheel I can run it before to release it if you want.

Or if you want ssh access just write me

Regards!

Better, let me check if the error is related since that the numpy version installed is

numpy-1.19.4-cp36-cp36m-manylinux2010_x86_64.whl

charris · 2020-11-16T00:02:35Z

@moylop260 You can install the nightly wheels like so:

python3 -mpip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy

moylop260 · 2020-11-17T18:44:12Z

python3 -mpip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy

numpy==1.20.0.dev0+a645106

Result:

Starting program: /.repo_requirements/virtualenv/python3.6/bin/python3 /home/odoo/odoo-12.0/odoo-bin -d openerp_template -i benandfrank --xmlrpc-port=18069 --logfile=out.txt --workers=0 --max-cron-threads=0
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f5d94966700 (LWP 26925)]
[New Thread 0x7f5d8cb35700 (LWP 26927)]
[New Thread 0x7f5d8c334700 (LWP 26928)]
[New Thread 0x7f5d83b33700 (LWP 26929)]
[New Thread 0x7f5d7b332700 (LWP 26930)]
[New Thread 0x7f5d72b31700 (LWP 26931)]
[New Thread 0x7f5d6a330700 (LWP 26932)]
[New Thread 0x7f5d61b2f700 (LWP 26933)]
[New Thread 0x7f5d5932e700 (LWP 26934)]
[New Thread 0x7f5d50b2d700 (LWP 26935)]
[New Thread 0x7f5d4832c700 (LWP 26936)]
[New Thread 0x7f5d3fb2b700 (LWP 26937)]
[New Thread 0x7f5d3732a700 (LWP 26938)]
[New Thread 0x7f5d2eb29700 (LWP 26939)]
[New Thread 0x7f5d26328700 (LWP 26940)]
[New Thread 0x7f5d1db27700 (LWP 26941)]
[New Thread 0x7f5d15326700 (LWP 26942)]
[New Thread 0x7f5d0cb25700 (LWP 26943)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f5d15326700 (LWP 26942)]
0x0000000000000000 in ?? ()
(gdb)

moylop260 · 2020-11-17T18:46:28Z

Reverting to numpy==1.19.4 it works well again

mattip · 2020-11-17T19:17:09Z

@moylop260 when it segfaults, is the docker using all the memory allocated to it?

mattip · 2020-11-17T19:23:07Z

Also - could you try export OPENBLAS_CORETYPE=Haswell or export OPENBLAS_CORETYPE=Prescott to reduce the features used, maybe the cpu detection code is not working correctly.

moylop260 · 2020-11-18T00:49:55Z

without environment variables

python3 -m pip install memory_profiler

mprof run gdb --args python3 ~/odoo-12.0/odoo-bin

The result was after Program received signal SIGSEGV, Segmentation fault.:

[Thread 0x7fcfbcba8700 (LWP 28452) exited]
[Thread 0x7fcf78ba0700 (LWP 28460) exited]
[New Thread 0x7fced738d700 (LWP 28484)]
Unhandled exception in thread started by <bound method Thread._bootstrap of <Thread(odoo.service.httpd, initial daemon)>>
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
MemoryError
libgcc_s.so.1 must be installed for pthread_cancel to work

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fced738d700 (LWP 28484)]
0x00007fcfd9cf6c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6

with `export OPENBLAS_CORETYPE=Haswell`

[Thread 0x7f6f656c9700 (LWP 28393) exited]
[Thread 0x7f6f6e6cb700 (LWP 28391) exited]
[New Thread 0x7f6e776ad700 (LWP 28426)]
Unhandled exception in thread started by <bound method Thread._bootstrap of <Thread(odoo.service.httpd, initial daemon)>>
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
MemoryError
libgcc_s.so.1 must be installed for pthread_cancel to work

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7f6e776ad700 (LWP 28426)]
0x00007f6f7a016c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6

with `export OPENBLAS_CORETYPE=Prescott`

[Thread 0x7f72e8d88700 (LWP 28736) exited]
[Thread 0x7f72e8587700 (LWP 28737) exited]
[New Thread 0x7f71f1d6a700 (LWP 28771)]
Unhandled exception in thread started by <bound method Thread._bootstrap of <Thread(odoo.service.httpd, initial daemon)>>
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
MemoryError
libgcc_s.so.1 must be installed for pthread_cancel to work

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7f71f1d6a700 (LWP 28771)]
0x00007f72f46d3c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6

moylop260 · 2020-11-18T00:57:07Z

Using `numpy==1.19.4`

No error (I just forced a exit(1))

[Thread 0x7f78a4292700 (LWP 29320) exited]
[Thread 0x7f789ca8f700 (LWP 29323) exited]
[New Thread 0x7f7860a77700 (LWP 29352)]
[Thread 0x7f7860a77700 (LWP 29352) exited]
[Inferior 1 (process 29312) exited with code 01]
(gdb) q

moylop260 · 2020-11-18T02:09:31Z

@charris @mattip

You have access to ssh with your github public keys.

You can use the following command to connect:

ssh -p 5022 [email protected] -t "(tmux new -s $USER;tmux attach -dt $USER)"

In order to reproduce the error just run the following command:

gdb --args python3 ~/odoo-13.0/odoo-bin -i account_loan

after 3 seconds you will see the error.

NOTE:
You can install new package using pip install... since that it is a virtualenv (don't require sudo)
You can uninstall or re-install what you want since that it docker-image is backed it.

You can use the following image:

https://hub.docker.com/repository/docker/vauxoo/numpy_memerror

But I can't reproduce it using other kind of processors even if they are using the same docker-version and so on.

But you are lucky and you can reproduce it so you can use:
docker pull vauxoo/numpy_memerror
docker run -it --name=numpy_memerror --entrypoint=bash vauxoo/numpy_memerror

Into the container:
/etc/init.d/postgresql start
gdb --args python3 ~/odoo-13.0/odoo-bin -i account_loan

mattip · 2020-11-18T06:53:57Z

Thanks. I tried it out. The machine seems to only have ~8.5GB memory free. With that little memory available, you should limit the number of threads OMP_NUM_THREADS=8 python3 ~/odoo-13.0/odoo-bin -i account_loan, which allows the program to run for more than 10 secs.

$ free -h
             total       used       free     shared    buffers     cached
Mem:          251G       243G       8.5G       7.5G        29G       141G
-/+ buffers/cache:        72G       179G
Swap:         2.2G       1.2G       1.0G

moylop260 · 2020-11-18T20:14:23Z

@mattip

Running with the environment variable OMP_NUM_THREADS=8 it is running fine.

Thank you!

The weird part here is that we are not using numpy directly it is just imported import numpy
and in this point the memory is overloaded.

Is it an expected behaviour?

moylop260 · 2020-11-18T20:20:40Z

We have another server where all processors were used at 100% just importing

What environment variables should I set in order to fix entirely the processors and memory overload (Considering that we are using just import numpy)?

Is possible to set the lower possible values by default?

I mean, OMP_NUM_THREADS=1 by default in order to be compatible with all devices but if you like to use more resources so a customization of environment variables is required. (it is like most of database manager works, e.g. postgresql has by default lowest values)

I don't know I just trying to bypassing our production errors.

We have an auto-scaling server deploys but using import numpy consuming a lot of memory and processors a lot of server will be deploys even if it is not used.

Thanks in advance!

charris · 2020-11-18T21:05:53Z

numpy directly it is just imported import numpy and in this point the memory is overloaded.

Sounds like there is some pre-allocation going on.

bashtage · 2020-11-26T17:12:18Z

I mean, OMP_NUM_THREADS=1 by default in order to be compatible with all devices but if you like to use more resources so a customization of environment variables is required. (it is like most of database manager works, e.g. postgresql has by default lowest values)

That would be a real loss IMO for most users who expect BLAS in NumPy to be MT be default. It may be possible to consider something more modest like 8, similar to what NumExpr does.

bashtage · 2020-11-26T17:13:30Z

We have an auto-scaling server deploys but using import numpy consuming a lot of memory and processors a lot of server will be deploys even if it is not used.

One solution to your problem is to build NumPy from source without BLAS. This will have minimal memory usage and will be just as fast as wheels if you don't use BLAS, which it sounds like you may not.

Currently, if numpy is available in the modules even if you are not using it Odoo try to compile and the system is down only for a type of processor Currently we know 2 server reproducing the error: B&F-production Runbot More info about: numpy/numpy#17674 numpy/numpy#17759 It is reproducing in the following MR: https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197 Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end= OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server), so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.

The reason of websocket-client was deactivated is: numpy has the following issue: - numpy/numpy#13059 It is a corner case using a kind of processor, using docker and using python3 More info about: - numpy/numpy#17674 - numpy/numpy#17759 But who is using numpy? There are different projects using libraries that depends of numpy: ./web/requirements.txt:2:bokeh==1.1.0 ./reporting-engine/requirements.txt:1:altair ./icm/requirements.txt:1:pandas ./maintainer-quality-tools/requirements.txt:7:websocket-client So, if we run odoo-bin with loglevel=debug to know what is the last line before to crash. It was the path: - Last logging https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/loading.py#L152 - Using pdb I have trace the following line https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/modules/module.py#L368 - The last module imported was `resource` https://github.com/odoo/odoo/tree/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource - I removed all imports and I reproduced the error again and again because one change fixed the issue It was when I commented the following import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/addons/resource/tests/common.py#L4 - I started to debug in this file line by line, so finally I found the problematic import: https://github.com/odoo/odoo/blob/92ef3b2dd4655913198d10d06598b799fdcae6d0/odoo/tests/common.py#L50 - But now what is the reason this is raising the error. It is because the following line: https://github.com/websocket-client/websocket-client/blob/29c15714ac9f5272e1adefc9c99b83420b409f63/websocket/_abnf.py#L34 is importing numpy if you are using python3 numpy is installed because of the requirements.txt files above and the disaster was did. We could have removed all numpy requirements but there are a lot of them. But we decided that better option was avoid to import the websocket line that import numpy (faster solution) non-installing websocket-client. However, after researching, we found that: OpenBLAS creates a number of threads equal to the number of core threads available, so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue. After a test building an image to reproduce the error and using that environment variable and it was fixed. That change was applied in following PRs: - Vauxoo/docker-ubuntu-base#89 - Vauxoo/docker-ubuntu-base#90 With change applied in docker-ubuntu-base, it's not neccesary avoid to import websocket-client (allow JS tests work again), we are covered with env var OPENBLAS_NUM_THREADS.

Currently, if numpy is available in the modules even if you are not using it Odoo try to compile and the system is down only for a type of processor Currently we know 2 server reproducing the error: B&F-production Runbot More info about: numpy/numpy#17674 numpy/numpy#17759 It is reproducing in the following MR: https://git.vauxoo.com/vauxoo/lasec/-/merge_requests/197 Check the following discussion https://odoo-community.org/groups/contributors-15/contributors-186006?mode=thread&date_begin=&date_end= OpenBLAS creates a number of threads equal to the number of core threads available: 56 in my case (production server), so it quickly reached limit_memory_hard and the process was killed (SIGSEGV) Forcing OPENBLAS_NUM_THREADS=1 fixed the issue.

BLD: use BUFFERSIZE=20 in OpenBLAS

af0bfeb

mattip added the component: build label Nov 12, 2020

charris merged commit c231355 into numpy:master Nov 14, 2020

charris added 03 - Maintenance 36 - Build Build related PR 09 - Backport-Candidate PRs tagged should be backported and removed 03 - Maintenance labels Nov 14, 2020

martin-frbg mentioned this pull request Nov 28, 2020

maybe a problem with 0.3.12 and NumPy? OpenMathLib/OpenBLAS#2970

Closed

charris added this to the 1.19.5 release milestone Dec 4, 2020

charris mentioned this pull request Dec 4, 2020

BLD: use BUFFERSIZE=20 in OpenBLAS #17924

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Dec 4, 2020

charris removed this from the 1.19.5 release milestone Dec 4, 2020

moylop260 mentioned this pull request Jan 30, 2021

workaround to avoid numpy weird errors Vauxoo/docker-ubuntu-base#85

Closed

fernandahf mentioned this pull request Feb 3, 2021

[FIX] Dockerfile: add env var to avoid numpy weird errors Vauxoo/docker-ubuntu-base#89

Merged

fernandahf mentioned this pull request Feb 3, 2021

[FIX] Dockerfile: add env var to avoid numpy weird errors Vauxoo/docker-ubuntu-base#90

Merged

mattip deleted the openblas-buffersize branch April 8, 2021 11:13

fernandahf mentioned this pull request Apr 19, 2021

[REV] revert "Segmentation fault (core dumped)" Vauxoo/maintainer-quality-tools#320

Merged

fernandahf mentioned this pull request Apr 19, 2021

[ADD] odoo-shippable: add env var to avoid numpy weird errors Vauxoo/docker-odoo-image#401

Merged

Uh oh!

Conversation

mattip commented Nov 12, 2020

Uh oh!

charris commented Nov 14, 2020

Uh oh!

mattip commented Nov 15, 2020

Uh oh!

moylop260 commented Nov 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Nov 16, 2020

Uh oh!

moylop260 commented Nov 17, 2020

Uh oh!

moylop260 commented Nov 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Nov 17, 2020

Uh oh!

mattip commented Nov 17, 2020

Uh oh!

moylop260 commented Nov 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

without environment variables

with export OPENBLAS_CORETYPE=Haswell

with export OPENBLAS_CORETYPE=Prescott

Uh oh!

moylop260 commented Nov 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Using numpy==1.19.4

Uh oh!

moylop260 commented Nov 18, 2020

Uh oh!

mattip commented Nov 18, 2020

Uh oh!

moylop260 commented Nov 18, 2020

Uh oh!

moylop260 commented Nov 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Nov 18, 2020

Uh oh!

bashtage commented Nov 26, 2020

Uh oh!

bashtage commented Nov 26, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

moylop260 commented Nov 15, 2020 •

edited

Loading

moylop260 commented Nov 17, 2020 •

edited

Loading

moylop260 commented Nov 18, 2020 •

edited

Loading

with `export OPENBLAS_CORETYPE=Haswell`

with `export OPENBLAS_CORETYPE=Prescott`

moylop260 commented Nov 18, 2020 •

edited

Loading

Using `numpy==1.19.4`

moylop260 commented Nov 18, 2020 •

edited

Loading