Skip to content

Comments

Scipy packaging#211

Merged
mdboom merged 22 commits intopyodide:masterfrom
rth:scipy
Dec 3, 2018
Merged

Scipy packaging#211
mdboom merged 22 commits intopyodide:masterfrom
rth:scipy

Conversation

@rth
Copy link
Member

@rth rth commented Oct 5, 2018

This is a continuation of #75 that eventually aims to fix partially #72

This still needs a lot of works, so this PR mostly aims to facilitate discussion of intermediary results.

As outlined in #184, f2c does not work for f90 that is used in scipy 1.1. Here I have reverted to scipy 0.17.1 (from 2 years ago) which is a last known version to be f77 only AFAIK. Then the issue is that .tar.gz from PyPi includes Cythonized files with an old Cython version that don't work on python 3.7, so here we download the sources from github and cythonize them manually.

I can confirm that CLAPACK-WA builds, here with,

make packages/scipy/CLAPACK-WA/lapack_WA.bc

and produces lapack_WA.bc and blas_WA.bc files, however I'm not yet sure how to make numpy.distutils detect it properly during scipy installation. For now I removed all setup.py that explicitly require LAPACK to see what parts of scipy we can build without it.

In terms of packaging, some of what I have done here is a not very clean -- it is only meant as a temporary solution until we have something that actually builds.

Comments or suggestions along the way would be very welcome.

@rth
Copy link
Member Author

rth commented Oct 5, 2018

Currently (locally for me) this errors with invalid call target: $_malloc when running asm2wasm to generate scipy/optimize/_zeros.cpython-37m-x86_64-linux-gnu.wasm.

Details
emcc -O3 -s BINARYEN_METHOD=native-wasm -Werror -s EMULATED_FUNCTION_POINTERS=1 -s EMULATE_FUNCTION_POINTER_CASTS=1 -s SIDE_MODULE=1 -s WASM=1 -s BINARYEN_TRAP_MODE=clamp --memory-init-file 0 -pthread -shared build/temp.linux-x86_64-3.7/scipy/optimize/zeros.bc -Lbuild/temp.linux-x86_64-3.7 -lrootfind -o build/lib.linux-x86_64-3.7/scipy/optimize/_zeros.cpython-37m-x86_64-linux-gnu.wasm
invalid call target: $_malloc
ERROR:root:'/home/rth/src/pyodide/emsdk/emsdk/binaryen/tag-1.38.12_64bit_binaryen/bin/asm2wasm build/lib.linux-x86_64-3.7/scipy/optimize/_zeros.cpython-37m-x86_64-linux-gnu.asm.js --total-memory=16777216 --trap-mode=clamp -O3 --mem-init=build/lib.linux-x86_64-3.7/scipy/optimize/_zeros.cpython-37m-x86_64-linux-gnu.wasm.mem --table-max=-1 --mem-max=-1 -o build/lib.linux-x86_64-3.7/scipy/optimize/_zeros.cpython-37m-x86_64-linux-gnu.wasm' failed
Traceback (most recent call last):
  File "/home/rth/.miniconda3/envs/pyodide-env/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/rth/.miniconda3/envs/pyodide-env/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/rth/src/pyodide/pyodide_build/__main__.py", line 25, in <module>
    main()
  File "/home/rth/src/pyodide/pyodide_build/__main__.py", line 21, in main
    args.func(args)
  File "/home/rth/src/pyodide/pyodide_build/buildpkg.py", line 185, in main
    build_package(path, args)
  File "/home/rth/src/pyodide/pyodide_build/buildpkg.py", line 157, in build_package
    compile(path, srcpath, pkg, args)
  File "/home/rth/src/pyodide/pyodide_build/buildpkg.py", line 96, in compile
    '--target', args.target], check=True)
  File "/home/rth/.miniconda3/envs/pyodide-env/lib/python3.7/subprocess.py", line 468, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/home/rth/src/pyodide/cpython/build/3.7.0/host/bin/python3', '-m', 'pyodide_build', 'pywasmcross', '--cflags', ' -I../../CLAPACK-WA/F2CLIBS/libf2c/ -Wno-implicit-function-declaration', '--ldflags', '-O3 -s BINARYEN_METHOD=native-wasm -Werror -s EMULATED_FUNCTION_POINTERS=1 -s EMULATE_FUNCTION_POINTER_CASTS=1 -s SIDE_MODULE=1 -s WASM=1 -s BINARYEN_TRAP_MODE=clamp --memory-init-file 0 ', '--host', '/home/rth/src/pyodide/cpython/build/3.7.0/host', '--target', '/home/rth/src/pyodide/cpython/installs/python-3.7.0']' returned non-zero exit status 1.
make: *** [Makefile:224: package] Error 1

This might be related to the dynamic linking with the SIDE_MODULE=1 option issue discussed in emscripten-core/emscripten#6047

pip install pytest pytest-xdist pytest-instafail selenium PyYAML flake8

# Download BLAS/LAPACK
git clone https://github.com/adrianbg/CLAPACK-WA.git packages/scipy/CLAPACK-WA
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, this should probably happen as part of the Scipy package build, or maybe its own package that Scipy is dependent on.

// can't just detect this automatically in the module we see.)
-static const int NUM_PARAMS = 15;
+static const int NUM_PARAMS = 32;
+static const int NUM_PARAMS = 37;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CI, we'll need to bump the emsdk cache value in .circleci/config.yaml

@mdboom
Copy link
Collaborator

mdboom commented Oct 5, 2018

Currently (locally for me) this errors with invalid call target: $_malloc when running asm2wasm to generate scipy/optimize/_zeros.cpython-37m-x86_64-linux-gnu.wasm.

I've worked around this in other packages where this happens by forcing the use of malloc somewhere that will definitely get called, like in the init function. See: https://github.com/iodide-project/pyodide/blob/master/packages/matplotlib/patches/force_malloc_free.patch

@rth
Copy link
Member Author

rth commented Oct 7, 2018

Thanks a lot for the suggestions @mdboom -- great that you have already encountered that malloc issue !

@rth
Copy link
Member Author

rth commented Oct 9, 2018

The malloc fix did help. Now I am running into errors of the form,

In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:104:9: error: no member named 'isalnum' in the global namespace; did you mean 'iswalnum'?
using ::isalnum;

(see more detailed log below)

Details
em++ -I../../CLAPACK-WA/F2CLIBS/libf2c/ -Wno-implicit-function-declaration -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -D__STDC_FORMAT_MACROS=1 -Iscipy/sparse/sparsetools -I/home/rth/src/pyodide/cpython/build/3.7.0/host/lib/python3.7/site-packages/numpy-1.15.1-py3.7-linux-x86_64.egg/numpy/core/include -I/home/rth/src/pyodide/cpython/installs/python-3.7.0/include/python3.7 -c scipy/sparse/sparsetools/sparsetools.cxx -o build/temp.linux-x86_64-3.7/scipy/sparse/sparsetools/sparsetools.bc -MMD -MF build/temp.linux-x86_64-3.7/scipy/sparse/sparsetools/sparsetools.o.d
In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:104:9: error: no member named 'isalnum' in the global namespace; did you mean 'iswalnum'?
using ::isalnum;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libc/wchar.h:159:11: note: 'iswalnum' declared here
int       iswalnum(wint_t);
          ^
In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:105:9: error: no member named 'isalpha' in the global namespace; did you mean 'iswalpha'?
using ::isalpha;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libc/wchar.h:160:11: note: 'iswalpha' declared here
int       iswalpha(wint_t);
          ^
In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:106:9: error: no member named 'isblank' in the global namespace; did you mean 'iswblank'?
using ::isblank;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libc/wchar.h:161:11: note: 'iswblank' declared here
int       iswblank(wint_t);
          ^
In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:107:9: error: no member named 'iscntrl' in the global namespace; did you mean 'iswcntrl'?
using ::iscntrl;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libc/wchar.h:162:11: note: 'iswcntrl' declared here
int       iswcntrl(wint_t);
          ^
In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:108:9: error: no member named 'isdigit' in the global namespace
using ::isdigit;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:109:9: error: no member named 'isgraph' in the global namespace; did you mean 'iswgraph'?
using ::isgraph;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libc/wchar.h:164:11: note: 'iswgraph' declared here
int       iswgraph(wint_t);
          ^
In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:110:9: error: no member named 'islower' in the global namespace; did you mean 'iswlower'?
using ::islower;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libc/wchar.h:165:11: note: 'iswlower' declared here
int       iswlower(wint_t);
          ^
In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:111:9: error: no member named 'isprint' in the global namespace; did you mean 'iswprint'?
using ::isprint;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libc/wchar.h:166:11: note: 'iswprint' declared here
int       iswprint(wint_t);
          ^
In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:112:9: error: no member named 'ispunct' in the global namespace; did you mean 'iswpunct'?
using ::ispunct;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libc/wchar.h:167:11: note: 'iswpunct' declared here
int       iswpunct(wint_t);
          ^
In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:113:9: error: no member named 'isspace' in the global namespace; did you mean 'iswspace'?
using ::isspace;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libc/wchar.h:168:11: note: 'iswspace' declared here
int       iswspace(wint_t);
          ^
In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:114:9: error: no member named 'isupper' in the global namespace; did you mean 'iswupper'?
using ::isupper;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libc/wchar.h:169:11: note: 'iswupper' declared here
int       iswupper(wint_t);
          ^
In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:115:9: error: no member named 'isxdigit' in the global namespace
using ::isxdigit;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:116:9: error: no member named 'tolower' in the global namespace; did you mean 'towlower'?
using ::tolower;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libc/wchar.h:172:11: note: 'towlower' declared here
wint_t    towlower(wint_t);
          ^
In file included from scipy/sparse/sparsetools/sparsetools.cxx:29:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/string:481:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwchar:107:
In file included from /home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cwctype:54:
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libcxx/cctype:117:9: error: no member named 'toupper' in the global namespace; did you mean 'towupper'?
using ::toupper;
      ~~^
/home/rth/src/pyodide/emsdk/emsdk/emscripten/tag-1.38.12/system/include/libc/wchar.h:173:11: note: 'towupper' declared here
wint_t    towupper(wint_t);
          ^
In file included from scipy/sparse/sparsetools/sparsetools.cxx:34:
In file included from /home/rth/src/pyodide/cpython/build/3.7.0/host/lib/python3.7/site-packages/numpy-1.15.1-py3.7-linux-x86_64.egg/numpy/core/include/numpy/ndarrayobject.h:18:
In file included from /home/rth/src/pyodide/cpython/build/3.7.0/host/lib/python3.7/site-packages/numpy-1.15.1-py3.7-linux-x86_64.egg/numpy/core/include/numpy/ndarraytypes.h:1821:
/home/rth/src/pyodide/cpython/build/3.7.0/host/lib/python3.7/site-packages/numpy-1.15.1-py3.7-linux-x86_64.egg/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
 ^
1 warning and 14 errors generated.
ERROR:root:compiler frontend failed to generate LLVM bitcode, halting
make: *** [Makefile:228: sandbox] Error 1

Have you encountered this before @mdboom ? sparsetools.cxx:29 just does a,

#include <string>

If not I'll ask about it on emscripten channels as my attempts to fix it so far were not successful.

@mdboom
Copy link
Collaborator

mdboom commented Oct 9, 2018

Sorry, I haven't run into that. It seems like something is incomplete in the C++ stdlib, but I'm not sure...

@rth
Copy link
Member Author

rth commented Oct 9, 2018

Thanks for the confirmation @mdboom !

@rth
Copy link
Member Author

rth commented Oct 11, 2018

So the status on this, is that by aggressively patching to skip all compilations that fail, we currently have a scipy build that somewhat passes, out of a total of 69 .so, we have 29 that are currently included in the resulting package,

Details
 cluster/_hierarchy.so
 cluster/_vq.so
 integrate/_test_multivariate.so
 interpolate/_ppoly.so
 interpolate/interpnd.so
 io/matlab/mio5_utils.so
 io/matlab/mio_utils.so
 io/matlab/streams.so
 ndimage/_nd_image.so
 ndimage/_ni_label.so
 optimize/_group_columns.so
 optimize/_lsq/givens_elimination.so
 optimize/_zeros.so
 optimize/moduleTNC.so
 signal/_max_len_seq_inner.so
 signal/_spectral.so
 signal/sigtools.so
 signal/spline.so
 sparse/_csparsetools.so
 sparse/csgraph/_min_spanning_tree.so
 sparse/csgraph/_reordering.so
 sparse/csgraph/_shortest_path.so
 sparse/csgraph/_tools.so
 sparse/csgraph/_traversal.so
 sparse/linalg/dsolve/_superlu.so
 spatial/_distance_wrap.so
 spatial/qhull.so
 stats/_rank.so
 stats/vonmises_cython.so
while 40 more are still missing,
Details
-fftpack/_fftpack.so
-fftpack/convolve.so
-integrate/_dop.so
-integrate/_odepack.so
-integrate/_quadpack.so
-integrate/_test_odeint_banded.so
-integrate/lsoda.so
-integrate/vode.so
-interpolate/_fitpack.so
-interpolate/_interpolate.so
-interpolate/dfitpack.so
-linalg/_calc_lwork.so
-linalg/_decomp_update.so
-linalg/_fblas.so
-linalg/_flapack.so
-linalg/_flinalg.so
-linalg/_interpolative.so
-linalg/_solve_toeplitz.so
-linalg/cython_blas.so
-linalg/cython_lapack.so
-odr/__odrpack.so
-optimize/_cobyla.so
-optimize/_lbfgsb.so
-optimize/_minpack.so
-optimize/_nnls.so
-optimize/_slsqp.so
-optimize/minpack2.so
-sparse/_sparsetools.so
-sparse/linalg/eigen/arpack/_arpack.so
-sparse/linalg/isolve/_iterative.so
-spatial/ckdtree.so
-special/_ellip_harm_2.so
-special/_ufuncs.so
-special/_ufuncs_cxx.so
-special/specfun.so
-stats/mvn.so
-stats/statlib.so

For future reference the above results were obtained with the script below,

Details
import scipy as sp
import os

base_dir = os.path.dirname(sp.__file__)

for (dirpath, dirnames, filenames) in os.walk(base_dir):
    for path in filenames:
        if path.endswith('.so'):
            rel_path = os.path.relpath(dirpath, base_dir)
            print(os.path.join(rel_path, path))

The resulting scipy.data takes 14MB (to compare to 22MB for pandas), so when all .so are included (as well as BLAS) it will be more but maybe not that much more than pandas. The good news is that the build time is currently around 10min: which means that the build step would be 36min instead of 26now with the new docker setup, which remains somewhat managable. Though it will increase somewhat once BLAS is compiled (currently a subset of LAPACK vendored inside scipy is built)
as well as skipped linalg modules.

The major directions that need more work are,

  • Currently the fortran files are built with f2c and emcc is run on them, but somehow they don't end up in the resulting wasm package: I feel like some step is missing in the current build setup. => Found the issue, lines with .so and without .f were skipped.

  • The issue with the global cpp namespace (libcxx/cctype: error: no member named 'isalnum' in the global namespace emscripten-core/emscripten#7253). It affects only,

    interpolate/_interpolate.so
    sparse/_sparsetools.so
    spatial/ckdtree.so
    special/_ufuncs_cxx.so
    

    but these, and particularly the last one, are imported a bit everywhere so it's a real blocker.

  • Still need to link against BLAS, for LAPACK I think it's fine to use the vendored one.

  • There is some issue that affects stats/mvn.so,

    Details
    emcc -I../../CLAPACK-WA/F2CLIBS/libf2c/ -Wno-implicit-function-declaration -Wall -g -ffixed-form -fno-second-underscore -fPIC -O3 -funroll-loops -Ibuild/
    src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/scipy/stats -I/home/rth/src/pyodide/cpython/build/3.7.0/host/lib/python3.7/site-packages/numpy-1.15.1-py3.7-linux-x86_64.egg/numpy/core/include -I/home/rth/src/pyodide/cpython/installs/python-3.7.0/include/python3.7 -c -c build/src.linux-x86_64-3.7/scipy/stats/mvn-f2pywrappers.c -o build/temp.linux-x86_64-3.7/build/src.linux-x86_64-3.7/scipy/stats/mvn-f2pywrappers.bc
    clang-6.0: warning: argument unused during compilation: '-ffixed-form' [-Wunused-command-line-argument]
    clang-6.0: warning: argument unused during compilation: '-fno-second-underscore' [-Wunused-command-line-argument]
    mvndst.f:
      mvnun:
    Error on line 76 of mvndst.f: Declaration error for rho: adjustable dimension on non-argument
    Error on line 76 of mvndst.f: Declaration error for infin: adjustable dimension on non-argument
    Error on line 76 of mvndst.f: Declaration error for stdev: adjustable dimension on non-argument
    Error on line 76 of mvndst.f: Declaration error for nlower: adjustable dimension on non-argument
    Error on line 76 of mvndst.f: Declaration error for nupper: adjustable dimension on non-argument
    Error on line 76 of mvndst.f: wr_ardecls:  nonconstant array size
    Error on line 76 of mvndst.f: wr_ardecls:  nonconstant array size
    Error on line 76 of mvndst.f: wr_ardecls:  nonconstant array size
    Error on line 76 of mvndst.f: wr_ardecls:  nonconstant array size
    Error on line 76 of mvndst.f: wr_ardecls:  nonconstant array size
    

    related to this discussion bit that's not too critical..

@rth rth force-pushed the scipy branch 3 times, most recently from 2a088d8 to e47d762 Compare October 13, 2018 21:29
- ./emsdk/emsdk
- ~/.ccache
key: v1-emsdk-{{ checksum "emsdk/Makefile" }}-v8-{{ .BuildNum }}
key: v1-emsdk-{{ checksum "emsdk/Makefile" }}-v11-{{ .BuildNum }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shift is because it took a few iterations of rebuilding emsdk to find a sufficiently large NUM_PARAMS: now it's 61 (as opposed to 32 before) as it's required by scipy's fitpack (~47) and odrpack (61). I'm not sure if it can have adverse effect (e.g. on performance).

@rth rth force-pushed the scipy branch 3 times, most recently from f358d4a to 6c8dcee Compare October 19, 2018 12:49
@rth
Copy link
Member Author

rth commented Oct 19, 2018

The linking of Fortran files was fixed, and global cpp namespace issue somehow disappeared when moving to dockerized builds (happy enough with that and I have not investigated further -- probably something different in my environment). So now we have around ~46 .so modules included out of 60 and the total WASM package size is 31 MB compressed (80 MB uncompressed). It loads fine in Firefox (less so in Chrome same as pandas) and the submodules 'constants', 'fftpack', 'odr', 'sparse' import successfully -- for the rest we get import errors due to missing parts of scipy.linalg (due to missing BLAS/LAPACK).

Regarding last major point remaining -- linking to BLAS/LAPACK I was hoping @mdboom (or possibly @jakirkham) that you would have some suggestions. As suggested earlier, I'm using the CLAPACK setup with the Makesfiles patched to work for Webassembly https://github.com/rth/CLAPACK-WA. Because it uses a Makefile without a configure step, one can't simply run make / emmake make to get a version for the host environment and one for target webassembly environment with the same setup. Instead one has to physically change the Makefile in each case. This means that to build CLAPACK, I'm currently using two separate branches in a fork of that repo: master for host and wasm for target. That's certainly not ideal, the alternative might be to use CLAPACK with cmake which could work better with emmake, but I'm not sure I want to go there and for now I'm just trying to get a successful build.

Regarding linking itself, I have read https://github.com/kripken/emscripten/wiki/Linking and have tried,

Linking dynamically

This setup is contained in https://github.com/rth/pyodide/tree/scipy-dynamic-link-blas which includes 1 commit in addition to this PR.

The CLAPACK needs some adaptation to produce .so. For host this works fine, for the target I'm still not sure how this is expected to work: I get lapack_WA.bc and blas_WA.bc, it is possible to convert those to .wasm, with,

emcc $(SIDE_LDFLAGS) libf2c.bc blas_WA.bc -o libblas_WA.wasm

however then the linking during scipy compilation fails,

emcc -O3 -s BINARYEN_METHOD=native-wasm -Werror -s EMULATED_FUNCTION_POINTERS=1 -s EMULATE_FUNCTION_POINTER_CASTS=1 -s SIDE_MODULE=1 -s WASM=1 -s BINARYEN_TRAP_MODE=clamp --memory-init-file 0 -Wall -g -Wall -g -shared build/temp.linux-x86_64-3.7/build/src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/scipy/linalg/_fblasmodule.bc build/temp.linux-x86_64-3.7/build/src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/scipy/linalg/fortranobject.bc build/temp.linux-x86_64-3.7/src/packages/scipy/build/scipy-0.17.1/scipy/_build_utils/src/wrap_dummy_g77_abi.bc build/temp.linux-x86_64-3.7/src/packages/scipy/build/scipy-0.17.1/scipy/_build_utils/src/wrap_dummy_accelerate.bc build/temp.linux-x86_64-3.7/build/src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/scipy/linalg/_fblas-f2pywrappers.bc -L../../CLAPACK-WA/build/target/ -Lbuild/temp.linux-x86_64-3.7 -lf2c -lblas_WA -llapack_WA -lgfortran -o build/lib.linux-x86_64-3.7/scipy/linalg/_fblas.cpython-37m-x86_64-linux-gnu.wasm

with,

WARNING:root:emcc: cannot find library "gfortran"
WARNING:root:emcc: cannot find library "lapack_WA"
WARNING:root:emcc: cannot find library "gfortran"
WARNING:root:emcc: cannot find library "blas_WA"
[wasm-validator error in function $_wsdot_] 4 != 3: set_local type must match function, on 
[none] (set_local $5
 [f32] (f32.demote/f64
  [f64] (call $_sdot_)
 )
)
[wasm-validator error in function $_wsasum_] 4 != 3: set_local type must match function, on 
[none] (set_local $3
 [f32] (f32.demote/f64
  [f64] (call $_sasum_)
 )
)
[...]
Fatal: error in validating output

I'm not sure if this error is due to missing symbols, or is something else. I would have expected missing symbols (e.g. because blas/lapack wasn't properly linked) would trigger an error at runtime. The ../../CLAPACK-WA/build/target/ folder does contain,

$ ls build/target
libblas_WA.bc  libblas_WA.wasm  libblas_WA.wasm.pre  libf2c.bc  liblapack_WA.bc  liblapack_WA.wasm  liblapack_WA.wasm.pre

and the relative path is correct. FWIW, I did try to generate .so instead of .wasm,

emcc -O3 -s "BINARYEN_METHOD='native-wasm'" -Werror -s EMULATED_FUNCTION_POINTERS=1 -s EMULATE_FUNCTION_POINTER_CASTS=1 -s SIDE_MODULE=1 -s WASM=1 -s "BINARY
EN_TRAP_MODE='clamp'" --memory-init-file 0 packages/scipy/CLAPACK-WA//build/target/libblas_WA.bc  \
        packages/scipy/CLAPACK-WA//build/target/libf2c.bc -o packages/scipy/CLAPACK-WA//build/target/libblas_WA.so
ERROR:root:SIDE_MODULE must only be used when compiling to an executable shared library, and not when emitting LLVM bitcode. That is, you should be emitting 
a .wasm file (for wasm) or a .js file (for asm.js). Note that when compiling to a typical native suffix for a shared library (.so, .dylib, .dll; which many b
uild systems do) then Emscripten emits an LLVM bitcode file, which you should then compile to .wasm or .js with SIDE_MODULE.

While renaming .bc files to .so and trying to link those (again just to see), is producing,

WARNING:root:ignoring dynamic library libblas_WA.so because not compiling to JS or HTML, remember to link it when compiling to JS or HTML at the end

Even assuming that this step succeeds, I'm not sure how to include the resulting blas_WA.wasm into the final package. In the long term BLAS should probably be a separate package, but for now including it inside scipy might be enough (even though it does increase the overall package size).

Linking statically

Not convinced this makes much sense for BLAS/LAPACK, but since I had trouble with the previous approach I though it was worth a try.
This setup is contained in https://github.com/rth/pyodide/tree/scipy-dynamic-link-blas which includes 1 commit in addition to this PR.

In scipy-blas-static-link I patched pyodide_build/pywasmcross.py to replace -L<link_dir> + -lblas_WA by the actual path to libblas_WA.bc. This should work, except

emcc -O3 -s BINARYEN_METHOD=native-wasm -Werror -s EMULATED_FUNCTION_POINTERS=1 -s EMULATE_FUNCTION_POINTER_CASTS=1 -s SIDE_MODULE=1 -s WASM=1 -s BINARYEN_TRAP_MODE=clamp --memory-init-file 0 -Wall -g -Wall -g -shared build/temp.linux-x86_64-3.7/build/src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/scipy/linalg/_fblasmodule.bc build/temp.linux-x86_64-3.7/build/src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/scipy/linalg/fortranobject.bc build/temp.linux-x86_64-3.7/src/packages/scipy/build/scipy-0.17.1/scipy/_build_utils/src/wrap_dummy_g77_abi.bc build/temp.linux-x86_64-3.7/src/packages/scipy/build/scipy-0.17.1/scipy/_build_utils/src/wrap_dummy_accelerate.bc build/temp.linux-x86_64-3.7/build/src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/scipy/linalg/_fblas-f2pywrappers.bc -Lbuild/temp.linux-x86_64-3.7 ../../CLAPACK-WA/build/target/libf2c.bc ../../CLAPACK-WA/build/target/libblas_WA.bc ../../CLAPACK-WA/build/target/liblapack_WA.bc -lgfortran -o build/lib.linux-x86_64-3.7/scipy/linalg/_fblas.cpython-37m-x86_64-linux-gnu.wasm
WARNING:root:emcc: cannot find library "gfortran"
error: Linking globals named 'xerbla_': symbol multiply defined!

and indeed that symbol is redundantly defined in libblas_WA.bc and liblapack.bc: I can see that with llvm-nm but haven't yet figured out how to strip that symbol with an equivalent of strip in one of them. In any case, for static linking with emscripten the symbols need to be exactly right which is a bit painful. Also, as far I understood, emscripten strips unused symbols by default, making static linking manageable, but I guess this only works for applications not shared libraries. Generally, I'm looking on a second option on whether static linking is a direction worth exploring in this context.

@mdboom
Copy link
Collaborator

mdboom commented Oct 22, 2018

Because it uses a Makefile without a configure step, one can't simply run make / emmake make to get a version for the host environment and one for target webassembly environment with the same setup.

Why do you need a version for the host environment? Is that just to support pywasmcross and the way it's building Scipy? Maybe you could step around that (at least temporarily) by installing a system blas/lapack that scipy will find?

You are right that when statically linking things together into a shared object, emscripten can't remove much stuff (because it doesn't know what the application it's eventually linked into will use).

However, I don't know how the linking from a Scipy extension module to a dynamically linked blas/lapack should work. I assume in the native case, it's using ldd to find the blas module? emscripten doesn't have that: it would need to dlopen it explicitly, I think.

An alternative not explored above might be to statically link blas and lapack into the main executable (this is how libpng and libfreetype are currently handled, even though they are used only by the optional matplotlib package). You could look at how emscripten handles the libraries it "ships" and maybe implement that?

For these errors:

[wasm-validator error in function $_wsdot_] 4 != 3: set_local type must match function, on 
[none] (set_local $5
 [f32] (f32.demote/f64
  [f64] (call $_sdot_)
 )
)

I've never seen this before, but maybe someone in the emscripten project can at least point in the right direction?

@rth
Copy link
Member Author

rth commented Oct 22, 2018

Why do you need a version for the host environment? Is that just to support pywasmcross and the way it's building Scipy?

Yes, that was the reason.

Maybe you could step around that (at least temporarily) by installing a system blas/lapack that scipy will find?

I was also wondering if we really needed it. Thanks for the suggestion, I think I can also also just strip the -lblas -llapack flags when compiling for target. It should still compile fine I think though the scipy.linalg so might not be able to load, but it shouldn't matter too much for target I guess.

I assume in the native case, it's using ldd to find the blas module? emscripten doesn't have that: it would need to dlopen it explicitly, I think.

Hah, I have not though about that. Well I guess it did raise an error at run time that some symbols are missing, it could have been be possible to dlopen some specific .so, but we are not there, since even the compilation fails..

An alternative not explored above might be to statically link blas and lapack into the main executable (this is how libpng and libfreetype are currently handled, even though they are used only by the optional matplotlib package). You could look at how emscripten handles the libraries it "ships" and maybe implement that?

Interesting idea. Thanks for the suggestion, and for the confirmation that static linking may be the way to go.

I'll try to ask some questions on the emscripten mailing list as well.

@rth rth mentioned this pull request Oct 23, 2018
@rth rth force-pushed the scipy branch 2 times, most recently from 500d145 to 2a6e042 Compare October 24, 2018 15:35
navytux added a commit to navytux/emscripten that referenced this pull request Nov 15, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

	pyodide/pyodide#211,
	pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

	pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

	-s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

	libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

	(import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
	(import "env" "_ffunc1" (func $fimport$1))
	...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)

5. since we also have to teach dlopen() to handle needed libraries, and since dlopen
   was already duplicating loadDynamicLibrary() code in many ways, instead of
   adding more duplication, dlopen is now reworked to use loadDynamicLibrary
   itself.

   This moves functionality to keep track of loaded DSO, their handles,
   refcounts, etc into the dynamic linker itself, with loadDynamicLibrary now
   accepting various flags (global/nodelete) to handle e.g.
   RTLD_LOCAL/RTLD_GLOBAL and RTLD_NODELETE dlopen cases (RTLD_NODELETE
   semantic is needed for initially-linked-in libraries).

   Also, since dlopen was using FS to read libraries, and loadDynamicLibrary was
   previously using Module['read'] and friends, loadDynamicLibrary now also
   accepts fs interface, which if provided, is used as FS-like interface to load
   library data, and if not - native loading capabilities of the environment
   are still used.

   Another aspect related to deduplication is that loadDynamicLibrary now also
   uses preloaded/precompiled wasm modules, that were previously only used by
   dlopen (see a5866a5 "Add preload plugin to compile wasm side modules async
   (emscripten-core#6663)").

   (see changes to dlopen and loadDynamicLibrary)

6. The functionality to asynchronously load dynamic libraries is also
   integrated into loadDynamicLibrary.

   Libraries were asynchronously preloaded for the case when
   Module['readBinary'] is absent (browser, see 3446d2a "preload wasm dynamic
   libraries when we can't load them synchronously").

   Since this codepath was also needed to be taught of DSO -> DSO dependency,
   the most straightforward thing to do was to teach loadDynamicLibrary to do
   its work asynchronously (under flag) and to switch the preloading to use

	loadDynamicLibrary(..., {loadAsync: true})

   (see changes to src/preamble.js and loadDynamicLibrary)

7. A test is added for verifying linking/dlopening a DSO with other needed library.

   browser.test_dynamic_link is also amended to verify linking to DSO with
   dependencies.

With the patch I've made sure that all core tests (including test_dylink_* and
test_dlfcn_*) are passing for asm{0,1,2} and binaryen{0,1,2}.

However since I cannot get full browser tests to pass even on pristine incoming
(1.38.19-2-g77246e0c1 as of today; there are many failures for both Firefox
63.0.1 and Chromium 70.0.3538.67), I did not tried to verify full browser tests
with my patch. Bit I've made sure that

	browser.test_preload_module
	browser.test_dynamic_link

are passing.

"other" kind of tests also do not pass on pristine incoming for me. This
way I did not tried to verify "other" with my patch.

Thanks beforehand,
Kirill

P.S.

This is my first time I do anything with WebAssembly/Emscripten, and only a
second time with JavaScript, so please forgive me if I missed something.

P.P.S.

I can split the patch into smaller steps, if it will help review.

/cc @kripken, @juj, @sbc100, @max99x, @junjihashimoto, @mdboom, @rth
@rth
Copy link
Member Author

rth commented Nov 15, 2018

I rebased to fix a merge conflcit.

Now that iodide-project/iodide#1122 is merged, @mdboom please let me know if there is anything else I need to to get this accepted.

@rth
Copy link
Member Author

rth commented Nov 15, 2018

BTW, after rebase, I get a RangeError: Maximum call stack size exceeded in test/test_python.py::test_cpython_core[test_descr-chrome] I don't think it's due to the changes in this PR..

@mdboom
Copy link
Collaborator

mdboom commented Nov 15, 2018

This is now blocked by iodide-project/iodide#1178, but I don't think there's anything remaining in this PR.

I agree, the chrome failure seems unrelated. Probably need to tweak fixRecursionLimit again.

navytux added a commit to navytux/emscripten that referenced this pull request Nov 16, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
navytux added a commit to navytux/emscripten that referenced this pull request Nov 16, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
navytux added a commit to navytux/emscripten that referenced this pull request Nov 18, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
navytux added a commit to navytux/emscripten that referenced this pull request Nov 20, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
navytux added a commit to navytux/emscripten that referenced this pull request Nov 21, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
navytux added a commit to navytux/emscripten that referenced this pull request Nov 27, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
navytux added a commit to navytux/emscripten that referenced this pull request Nov 29, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
navytux added a commit to navytux/emscripten that referenced this pull request Nov 29, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
@rth
Copy link
Member Author

rth commented Dec 3, 2018

Gentle ping @mdboom :)

Looks like the deployment PR was merged, and I think it would help to have this in master for basing possible future improvements on top of it.

@mdboom mdboom merged commit 04603d5 into pyodide:master Dec 3, 2018
@rth
Copy link
Member Author

rth commented Dec 3, 2018

Thanks a lot @mdboom !

@rth rth deleted the scipy branch December 3, 2018 15:41
@rth rth restored the scipy branch December 3, 2018 15:41
navytux added a commit to navytux/emscripten that referenced this pull request Dec 3, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
navytux added a commit to navytux/emscripten that referenced this pull request Dec 5, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
navytux added a commit to navytux/emscripten that referenced this pull request Dec 5, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
navytux added a commit to navytux/emscripten that referenced this pull request Dec 5, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
navytux added a commit to navytux/emscripten that referenced this pull request Dec 6, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
navytux added a commit to navytux/emscripten that referenced this pull request Dec 8, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
kripken pushed a commit to emscripten-core/emscripten that referenced this pull request Dec 9, 2018
Currently Emscripten allows to create shared libraries (DSO) and link
them to main module. However a shared library itself cannot be linked to
another shared library.

The lack of support for DSO -> DSO linking becomes problematic in cases when
there are several shared libraries that all need to use another should-be
shared functionality, while linking that should-be shared functionality to main
module is not an option for size reasons. My particular use-case is SciPy
support for Pyodide:

        pyodide/pyodide#211,
        pyodide/pyodide#240

where several of `*.so` scipy modules need to link to LAPACK. If we link to
LAPACK statically from all those `*.so` - it just blows up compiled size

        pyodide/pyodide#211 (comment)

and if we link in LAPACK statically to main module, the main module size is
also increased ~2x, which is not an option, since LAPACK functionality is not
needed by every Pyodide user.

This way we are here to add support for DSO -> DSO linking:

1. similarly to how it is already working for main module -> side module
   linking, when building a side module it is now possible to specify

        -s RUNTIME_LINKED_LIBS=[...]

   with list of shared libraries that side module needs to link to.

2. to store that information, for asm.js, similarly to how it is currently
   handled for main module (which always has js part), we transform
   RUNTIME_LINKED_LIBS to

        libModule.dynamicLibraries = [...]

   (see src/preamble_sharedlib.js)

3. for wasm module, in order to store the information about to which libraries
   a module links, we could in theory use "module" attribute in wasm imports.
   However currently emscripten almost always uses just "env" for that "module"
   attribute, e.g.

        (import "env" "abortStackOverflow" (func $fimport$0 (param i32)))
        (import "env" "_ffunc1" (func $fimport$1))
        ...

   and this way we have to embed the information about required libraries for
   the dynamic linker somewhere else.

   What I came up with is to extend "dylink" section with information about
   which shared libraries a shared library needs. This is similar to DT_NEEDED
   entries in ELF.

   (see tools/shared.py)

4. then, the dynamic linker (loadDynamicLibrary) is reworked to handle that information:

   - for asm.js, after loading a libModule, we check libModule.dynamicLibraries
     and post-load them recursively. (it would be better to load needed modules
     before the libModule, but for js we currently have to first eval whole
     libModule's code to be able to read .dynamicLibraries)

   - for wasm the needed libraries are loaded before the wasm module in
     question is instantiated.

     (see changes to loadWebAssemblyModule for details)
@rth rth deleted the scipy branch November 1, 2020 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants