Skip to content

Comments

WIP: Porting to LLVM upstream#531

Closed
mdboom wants to merge 1 commit intopyodide:masterfrom
mdboom:llvm-upstream
Closed

WIP: Porting to LLVM upstream#531
mdboom wants to merge 1 commit intopyodide:masterfrom
mdboom:llvm-upstream

Conversation

@mdboom
Copy link
Collaborator

@mdboom mdboom commented Oct 10, 2019

Ref: #476

This doesn't work yet due to some bugs in emscripten. But hopefully this will be useful to others tackling the same problem.

This is based on work started by @rth, but I had to rearrange the history and his original commits were lost. I intend to come back and give him authorship credit on those commits before this is finally merged.

@@ -1,12 +1,11 @@
export EMSCRIPTEN_VERSION = 1.38.30
export EMSCRIPTEN_VERSION = 1.38.42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like 1.38.47 is latest now...

@rth
Copy link
Member

rth commented Oct 10, 2019

Thanks for opening the PR @mdboom!

I intend to come back and give him authorship credit on those commits before this is finally merged.

Don't worry about it, we are squashing PRs anyway.. :)

@rth
Copy link
Member

rth commented Oct 10, 2019

Looks like 1.38.47 is latest now...

In particular the CircleCI build error:

wasm-ld: /b/s/w/ir/cache/builder/emscripten-releases/llvm-project/lld/wasm/Symbols.cpp:115: void lld::wasm::Symbol::setGOTIndex(uint32_t): Assertion `gotIndex == INVALID_INDEX' failed.

might have been fixed in 1.38.46 emscripten-core/emscripten#9317

@mdboom
Copy link
Collaborator Author

mdboom commented Oct 10, 2019

Nice. I'll probably see what upgrading emscripten does for us now...

@mdboom
Copy link
Collaborator Author

mdboom commented Oct 10, 2019

With emscripten 1.38.47, compiling sqlite crashes the compiler:

/bin/sh ./libtool  --tag=CC   --mode=link /home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/emscripten/emcc -D_REENTRANT=1 -DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_RTREE  -DSQLITE_ENABLE_EXPLAIN_COMMENTS -DSQLITE_ENABLE_DBPAGE_VTAB -DSQLITE_ENABLE_STMTVTAB -DSQLITE_ENABLE_DBSTAT_VTAB  -fPIC   -o sqlite3 sqlite3-shell.o sqlite3-sqlite3.o  
libtool: link: /home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/emscripten/emcc -D_REENTRANT=1 -DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_RTREE -DSQLITE_ENABLE_EXPLAIN_COMMENTS -DSQLITE_ENABLE_DBPAGE_VTAB -DSQLITE_ENABLE_STMTVTAB -DSQLITE_ENABLE_DBSTAT_VTAB -fPIC -o sqlite3 sqlite3-shell.o sqlite3-sqlite3.o 
wasm-ld: /b/s/w/ir/cache/builder/emscripten-releases/llvm-project/llvm/include/llvm/Support/Casting.h:264: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = lld::wasm::FunctionSymbol, Y = const lld::wasm::Symbol]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
Stack dump:
0.      Program arguments: /home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/wasm-ld -o sqlite3 --allow-undefined --lto-O0 sqlite3-shell.o sqlite3-sqlite3.o -L/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/emscripten/system/local/lib -L/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/emscripten/system/lib -L/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/.emscripten_cache/wasm-obj --import-memory --import-table -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --export __wasm_call_ctors --export __data_end --export main --export malloc --export free --export setThrew --export __errno_location --export fflush -z stack-size=5242880 --initial-memory=16777216 --no-entry --max-memory=16777216 --global-base=1024 --relocatable 
 #0 0x00007fcd16c250b4 PrintStackTraceSignalHandler(void*) (/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/../lib/libLLVM-10svn.so+0x7000b4)
 #1 0x00007fcd16c22e3e llvm::sys::RunSignalHandlers() (/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/../lib/libLLVM-10svn.so+0x6fde3e)
 #2 0x00007fcd16c25368 SignalHandler(int) (/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/../lib/libLLVM-10svn.so+0x700368)
 #3 0x00007fcd19bacc60 __restore_rt (/lib64/libpthread.so.0+0x12c60)
 #4 0x00007fcd1603ce35 raise (/lib64/libc.so.6+0x37e35)
 #5 0x00007fcd16027895 abort (/lib64/libc.so.6+0x22895)
 #6 0x00007fcd16027769 _nl_load_domain.cold (/lib64/libc.so.6+0x22769)
 #7 0x00007fcd16035566 (/lib64/libc.so.6+0x30566)
 #8 0x00000000006e19f0 lld::wasm::GlobalSection::writeBody() (/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/wasm-ld+0x6e19f0)
 #9 0x00000000006d22aa lld::wasm::SyntheticSection::finalizeContents() (/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/wasm-ld+0x6d22aa)
#10 0x00000000006cc638 (anonymous namespace)::Writer::run() (/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/wasm-ld+0x6cc638)
#11 0x00000000006c51a1 lld::wasm::writeResult() (/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/wasm-ld+0x6c51a1)
#12 0x00000000006a79bd (anonymous namespace)::LinkerDriver::link(llvm::ArrayRef<char const*>) (/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/wasm-ld+0x6a79bd)
#13 0x00000000006a24d8 lld::wasm::link(llvm::ArrayRef<char const*>, bool, llvm::raw_ostream&) (/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/wasm-ld+0x6a24d8)
#14 0x000000000041f45b main (/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/wasm-ld+0x41f45b)
#15 0x00007fcd16028f43 __libc_start_main (/lib64/libc.so.6+0x23f43)
#16 0x000000000041efe9 _start (/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/wasm-ld+0x41efe9)
shared:ERROR: '/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/bin/wasm-ld -o sqlite3 --allow-undefined --lto-O0 sqlite3-shell.o sqlite3-sqlite3.o -L/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/emscripten/system/local/lib -L/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/upstream/emscripten/system/lib -L/home/mdboom/Work/builds/compiling/pyodide/emsdk/emsdk/.emscripten_cache/wasm-obj --import-memory --import-table -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --export __wasm_call_ctors --export __data_end --export main --export malloc --export free --export setThrew --export __errno_location --export fflush -z stack-size=5242880 --initial-memory=16777216 --no-entry --max-memory=16777216 --global-base=1024 --relocatable' failed (-6)
make[2]: *** [Makefile:506: sqlite3] Error 1

@mdboom
Copy link
Collaborator Author

mdboom commented Oct 10, 2019

Taking sqlite3 out, it does seem to successfully build the Python interpreter, but then it fails again similarly compiling lz4.

@rth
Copy link
Member

rth commented Oct 11, 2019

@manumartin Are you trying to build this PR or master? I you updated bynarien you also need to update the path in emsdk/patches/num_params.patch where NUM_PARAMS is patched.

@manumartin
Copy link

manumartin commented Oct 15, 2019

Hi, Sorry for the noise I think all of the previous errors were caused because of a misconfiguration on my part (I'm still learning how the build system works). I have removed all the previous comments related to that.

I think now my build is on par with @mdboom's. I have seen the SIGABRT problem when wasm-ld was trying to link stuff while building sqlite, lz4 and also on libf2c inside CLAPACK.

I managed to build pyodide.asm.js by disabling sqlite and lz4 but then CLAPACK started building and libf2c failed:

/src/emsdk/emsdk/upstream/bin/wasm-ld -o libf2c.bc --allow-undefined --lto-O0 f77vers.o i77vers.o -L/src/emsdk/emsdk/upstream/emscripten/system/local/lib s_rnge.o -L/src/emsdk/emsdk/upstream/emscripten/system/lib abort_.o -L/src/emsdk/emsdk/.emscripten_cache/wasm-obj exit_.o getarg_.o iargc_.o getenv_.o signal_.o s_stop.o s_paus.o system_.o cabs.o ctype.o derf_.o derfc_.o erf_.o erfc_.o sig_die.o uninit.o pow_ci.o pow_dd.o pow_di.o pow_hh.o pow_ii.o pow_ri.o pow_zi.o pow_zz.o c_abs.o c_cos.o c_div.o c_exp.o c_log.o c_sin.o c_sqrt.o z_abs.o z_cos.o z_div.o z_exp.o z_log.o z_sin.o z_sqrt.o r_abs.o r_acos.o r_asin.o r_atan.o r_atn2.o r_cnjg.o r_cos.o r_cosh.o r_dim.o r_exp.o r_imag.o r_int.o r_lg10.o r_log.o r_mod.o r_nint.o r_sign.o r_sin.o r_sinh.o r_sqrt.o r_tan.o r_tanh.o d_abs.o d_acos.o d_asin.o d_atan.o d_atn2.o d_cnjg.o d_cos.o d_cosh.o d_dim.o d_exp.o d_imag.o d_int.o d_lg10.o d_log.o d_mod.o d_nint.o d_prod.o d_sign.o d_sin.o d_sinh.o d_sqrt.o d_tan.o d_tanh.o i_abs.o i_dim.o i_dnnt.o i_indx.o i_len.o i_len_trim.o i_mod.o i_nint.o i_sign.o lbitbits.o lbitshft.o i_ceiling.o h_abs.o h_dim.o h_dnnt.o h_indx.o h_len.o h_mod.o h_nint.o h_sign.o l_ge.o l_gt.o l_le.o l_lt.o hl_ge.o hl_gt.o hl_le.o hl_lt.o ef1asc_.o ef1cmc_.o f77_aloc.o s_cat.o s_cmp.o s_copy.o backspac.o close.o dfe.o dolio.o due.o endfile.o err.o fmt.o fmtlib.o ftell_.o iio.o ilnw.o inquire.o lread.o lwrite.o open.o rdfmt.o rewind.o rsfe.o rsli.o rsne.o sfe.o sue.o typesize.o uio.o util.o wref.o wrtfmt.o wsfe.o wsle.o wsne.o xwsne.o dtime_.o etime_.o --import-memory --import-table -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --export __wasm_call_ctors --export __data_end --export main --export malloc --export free --export setThrew --export __errno_location --export fflush -z stack-size=5242880 --initial-memory=16777216 --no-entry --max-memory=16777216 --global-base=1024 --relocatable 
 #0 0x00007f1dcc7cb0b4 PrintStackTraceSignalHandler(void*) (/src/emsdk/emsdk/upstream/bin/../lib/libLLVM-10svn.so+0x7000b4)
 #1 0x00007f1dcc7c8e3e llvm::sys::RunSignalHandlers() (/src/emsdk/emsdk/upstream/bin/../lib/libLLVM-10svn.so+0x6fde3e)
 #2 0x00007f1dcc7cb368 SignalHandler(int) (/src/emsdk/emsdk/upstream/bin/../lib/libLLVM-10svn.so+0x700368)
 #3 0x00007f1dcf7528e0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x128e0)
 #4 0x00007f1dcbc12f3b raise (/lib/x86_64-linux-gnu/libc.so.6+0x35f3b)
 #5 0x00007f1dcbc142f1 abort (/lib/x86_64-linux-gnu/libc.so.6+0x372f1)
 #6 0x00007f1dcbc0ba8a (/lib/x86_64-linux-gnu/libc.so.6+0x2ea8a)
 #7 0x00007f1dcbc0bb02 (/lib/x86_64-linux-gnu/libc.so.6+0x2eb02)
 #8 0x00000000006e19f0 lld::wasm::GlobalSection::writeBody() (/src/emsdk/emsdk/upstream/bin/wasm-ld+0x6e19f0)
 #9 0x00000000006d22aa lld::wasm::SyntheticSection::finalizeContents() (/src/emsdk/emsdk/upstream/bin/wasm-ld+0x6d22aa)
#10 0x00000000006cc638 (anonymous namespace)::Writer::run() (/src/emsdk/emsdk/upstream/bin/wasm-ld+0x6cc638)
#11 0x00000000006c51a1 lld::wasm::writeResult() (/src/emsdk/emsdk/upstream/bin/wasm-ld+0x6c51a1)
#12 0x00000000006a79bd (anonymous namespace)::LinkerDriver::link(llvm::ArrayRef<char const*>) (/src/emsdk/emsdk/upstream/bin/wasm-ld+0x6a79bd)
#13 0x00000000006a24d8 lld::wasm::link(llvm::ArrayRef<char const*>, bool, llvm::raw_ostream&) (/src/emsdk/emsdk/upstream/bin/wasm-ld+0x6a24d8)
#14 0x000000000041f45b main (/src/emsdk/emsdk/upstream/bin/wasm-ld+0x41f45b)
#15 0x00007f1dcbbffb17 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x22b17)
#16 0x000000000041efe9 _start (/src/emsdk/emsdk/upstream/bin/wasm-ld+0x41efe9)

I also had to remove these arguments from LDFLAGS:

-s USE_FREETYPE=1
-s USE_LIBPNG=1
-lstdc++ \

It appears as if either freetype or libpng depend on zlib and emscripten was failing to compile from the emscripten port of zlib because it was trying to execute wasm-ld commands with the "-pie" and "-relocatable" arguments at the same time and it looks like they are incompatible.

The -lstdc++ flag failed because the library wasn't found. The emscripten documentation says it should automatically take caer of the c++ standard libraries for you, perhaps this flag isn't really needed? I don't know.

@manumartin
Copy link

manumartin commented Oct 15, 2019

Someone from the LLVM IRC channel pointed out a possible solution to the GlobalSection::writeBody() issue, here is the full conversation: https://pastebin.com/vqZjTf9j I'm trying to check if it works.

@manumartin
Copy link

manumartin commented Oct 16, 2019

Hi, the problem has been fixed on this commit: https://reviews.llvm.org/rL372779
I have built LLVM with that change and now the abort error no longer appears. lz4, sqlite and libf2c are now compiling.

I still have several problems and several temporal workarounds:

  • The main pyodide build is breaking when trying to build libpng and libfreetype because it invokes wasm-ld commands with the "-pie" and the "--relocatable" arguments at the same time and they are incompatible.
  • This Makefile "CLAPACK/CLAPACK_WA/SRC/Makefile" is trying to link against some .o files like "../INSTALL/slamch.o" which are not built at that point. This is breaking the build, removing the references to slamch.o fixes the build.
  • The -lgfortran flag is appearing on some emcc commands and is breaking the scipy/fftpack build. I am dropping it with a modification on pywasmcross.py:
     # Go through and adjust arguments
     for arg in line[1:]:
+        if arg == "-lgfortran":
+            continue

I have a new error on scipy/quadpack:

emcc -O3 -Werror -s SIDE_MODULE=1 -s WASM=1 --memory-init-file 0 -s EMULATE_FUNCTION_POINTER_CASTS -s TOTAL_MEMORY=1073741824 -s ALLOW_MEMORY_GROWTH=1 -s LINKABLE=1 -s EXPORT_ALL=1 -Wall -g -Wall -g -shared build/temp.linux-x86_64-3.7/scipy/integrate/_quadpackmodule.bc -Lbuild/temp.linux-x86_64-3.7 -lquadpack -lmach -o build/lib.linux-x86_64-3.7/scipy/integrate/_quadpack.cpython-37m-x86_64-linux-gnu.wasm
emscripten:WARNING: Wasm source map won't be usable in a browser without --source-map-base
Fatal: FuncCastEmulation::NUM_PARAMS needs to be at least Fatal: FuncCastEmulation::NUM_PARAMS needs to be at least 21

I haven't looked too much into it but seems related to the NUM_PARAMS stuff that @rth mentioned.

@mdboom
Copy link
Collaborator Author

mdboom commented Oct 16, 2019

The main pyodide build is breaking when trying to build libpng and libfreetype because it invokes wasm-ld commands with the "-pie" and the "--relocatable" arguments at the same time and they are incompatible.

Can you be more specific? Pyodide doesn't actually build these libraries, but uses the pre-built ones from emscripten by using the -s USE_LIBPNG=1 etc. flags. Hopefully we can track where the -pie and --relocatable flags are coming from and fix them with the appropriate environment variables.

This Makefile "CLAPACK/CLAPACK_WA/SRC/Makefile" is trying to link against some .o files like "../INSTALL/slamch.o" which are not built at that point. This is breaking the build, removing the references to slamch.o fixes the build.

Probably fine to move that forward. But eventually we will have to determine whether the functions in those objects are required by Scipy and if so fix the build ordering (or what have you) so this works...

The -lgfortran flag is appearing on some emcc commands and is breaking the scipy/fftpack build. I am dropping it with a modification on pywasmcross.py

This seems fine as a workaround, since we are explicitly using -lf2c here. There must be some environmental/system difference that makes this appear on your system and not mine or CI. That would be "nice" to get to the bottom of at some point, but not critical.

FuncCastEmulation::NUM_PARAMS needs to be at least 21

On master, we patch this in binaryen and rebuild it. That patch no longer applies directly with LLVM upstream (since binaryen is no longer being used), but I'm sure it can be ported to the LLVM upstream compiler. I was kind of hoping we wouldn't have to do this with LLVM upstream (and skip the need for building a custom compiler build), but it looks like it's still necessary.

So, we'll need to update our build script so it downloads emsdk with the sources for LLVM, patches them to increase the NUM_PARAMS value, and then builds LLVM from source. You can look at master for how this was done with binaryen as a guide, but that was ripped out of this branch.

https://github.com/iodide-project/pyodide/blob/master/emsdk/patches/num_params.patch

@manumartin
Copy link

manumartin commented Oct 17, 2019

-pie & --relocatable problem:
This happens in one execution of wasm-ld when it is building zlib which is a dependency of libpng and freetype.

-lgfortran problem:
I haven't seen the -lf2c flag you mention, I have seen it is linking to libf2c like this though:
../../../../CLAPACK/CLAPACK-WA/F2CLIBS/libf2c.bc

NUM_PARAMS problem:
I have fixed this, the problem is happening (and always happened) inside a tool called wasm_opt from binaryen. Even though we are now using LLVM, binaryen is still used in some parts and "emsdk install xxx-upstream" actually installs binaries from both binaryen and LLVM. The wasm_opt binary that gets installed doesn't have our NUM_PARAMS change obviously.

I have solved this by downloading binaryen, patching it and compiling it. Sadly, I haven't found a way to use the normal ./emsdk install mechanism because the install command downloads and builds binaryen all at once and doesn't give you a chance to patch the EmulateFuncCast.cpp file. So my emsdk Makefile now downloads & builds binaryen manually. After building it you need to point the BINARYEN_ROOT variable to the binaryen build directory. This variable is inside the emsdk/emsdk/.emscripten file. This is normally done by "./emsdk activate" but again I couldn't use that mechanism because I needed a special version of binaryen which isn't listed in the available ones with "./emsdk list".

The reason I needed an special binaryen version is because I encountered an additional problem. Emscripten is relying on another binaryen tool: wasm_emscripten_finalize. In the latest versions of binaryen (1.38.31 and 1.38.32) this tool expects a parameter called --initial-stack-pointer but emscripten 1.38.47 is not passing it. If the parameter isn't passed the tool exits with a fatal error and breaks the build.

There is a PR on emscripten about this:
WebAssembly/binaryen#2201

At some point they removed the requirement of this parameter on binaryen but I have only found it fixed in the latest development tag: version_89

So now I'm using this binaryen version and the problem doesn't appear anymore.

@manumartin
Copy link

manumartin commented Oct 17, 2019

BTW I'm also pointing LLVM_ROOT inside emsdk/emsdk/.emscripten to my build of LLVM that contains the fix the LLVM guy pushed yesterday.

I'm now having another problem when building scipy/linalg though, several errors like this are appearing:

wasm-ld: error: function signature mismatch: dlamch_
>>> defined as (i32, i32) -> f64 in build/temp.linux-x86_64-3.7/scipy/linalg/src/lapack_deprecations/dgegv.bc
>>> defined as (i32) -> f64 in ../../../../CLAPACK/CLAPACK-WA/lapack_WA.bc

I have checked the sources and indeed the function definitions are different.

and one happens with blas and libf2c:
wasm-ld: warning: function signature mismatch: s_copy

defined as (i32, i32, i32, i32) -> i32 in ../../../../CLAPACK/CLAPACK-WA/blas_WA.bc
defined as (i32, i32, i32, i32) -> void in ../../../../CLAPACK/CLAPACK-WA/F2CLIBS/libf2c.bc

@manumartin
Copy link

manumartin commented Oct 21, 2019

Ok, those error messages were somewhat misleading, I had to look into the wasm-ld source (https://github.com/llvm/llvm-project/blob/67b055841f3b64efd1e92bde3ed7aeeb493c1182/lld/wasm/SymbolTable.cpp#L710) to understand them, there are two "function signature mismatch" cases:

  • For the one that ends in a warning: Two mismatching function declarations are found and the function body is also available. In this case wasm-ld takes one symbol and discards the other and keeps going.

  • For the one that ends in an error: The same as before but in this case the function body is not available. This is happening because in a previous patch we removed some object files from the lapack makefile. This was done to avoid some duplication errors, but it seems too many of them were removed, some are not really duplicated and since they are missing when linking the link fails.

Also I'm starting to think the majority of these warnings are normal, in particular the ones were two functions are found were the mismatch is on the return type (void vs int). I think this happens because the cython wrappers aren't using the same return method as the functions inside clapack, instead they are using the first parameter of every function to pass an object by reference were the return value is put.

@manumartin
Copy link

manumartin commented Oct 21, 2019

problem with scipy/sparse/linalg/dsolve:

I have seen that @rth modified setup.py files from other scipy packages, for example from scipy/sparse/linalg/isolve. He did this so that these packages would link to LAPACK. my build is breaking on sparse/linalg/dsolve because it does not find a symbol from BLAS called "scopy". I don't know why this wasn't failing with bynarien. I have tried to modify the setup.py of sparse/linalg/dsolve in the same way as the other setup.py files and after doing it this specific issue is solved and the symbol is found.

However now I have duplicate symbol errors because there are some functions inside sparse/linalg/dsolve/SuperLU that seem copy pasted from blas/libf2c.

again @rth has solved other duplicate symbol errors by entirely removing object files from LAPACK, but in this case the duplicated functions seem fairly common and I think if I remove them other stuff can break.

I don't know how to proceed with this, all of these errors seem to be caused because of the static linking.

@manumartin
Copy link

manumartin commented Oct 21, 2019

Declared those functions from SuperLU as extern and now it works. but.. more errors:

/src/emsdk/emsdk/../../llvm-project/cbuild/bin/wasm-ld 
-o /tmp/emscripten_temp_kazgenec/_arpack.cpython-37m-x86_64-linux-gnu.wasm
--allow-undefined --lto-O0 --whole-archive 
build/temp.linux-x86_64-3.7/build/src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/scipy/sparse/linalg/eigen/arpack/_arpackmodule.bc 
build/temp.linux-x86_64-3.7/build/src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/scipy/sparse/linalg/eigen/arpack/fortranobject.bc 
-L/src/emsdk/emsdk/upstream/emscripten/system/local/lib build/temp.linux-x86_64-3.7/build/src.linux-x86_64-3.7/build/src.linux-x86_64-3.7/scipy/sparse/linalg/eigen/arpack/_arpack-f2pywrappers.bc 
-L/src/emsdk/emsdk/upstream/emscripten/system/lib ../../../../CLAPACK/CLAPACK-WA/F2CLIBS/libf2c.bc 
-L/src/emsdk/emsdk/.emscripten_cache/wasm-obj-pic ../../../../CLAPACK/CLAPACK-WA/blas_WA.bc ../../../../CLAPACK/CLAPACK-WA/lapack_WA.bc 
-Lbuild/temp.linux-x86_64-3.7 
-larpack_scipy
--no-whole-archive --import-memory --import-table -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --no-gc-sections --export-dynamic --export-all --export __wasm_call_ctors --export __data_end --export main --export malloc --export free --export setThrew --export __errno_location -shared


wasm-ld: error: duplicate symbol: debug_
>>> defined in build/temp.linux-x86_64-3.7/libarpack_scipy.a(cgetv0.bc)
>>> defined in build/temp.linux-x86_64-3.7/libarpack_scipy.a(cnaitr.bc)

wasm-ld: error: duplicate symbol: timing_
>>> defined in build/temp.linux-x86_64-3.7/libarpack_scipy.a(cgetv0.bc)
>>> defined in build/temp.linux-x86_64-3.7/libarpack_scipy.a(cnaitr.bc)

wasm-ld: error: duplicate symbol: debug_
>>> defined in build/temp.linux-x86_64-3.7/libarpack_scipy.a(cnaitr.bc)
>>> defined in build/temp.linux-x86_64-3.7/libarpack_scipy.a(cnapps.bc)

@manumartin
Copy link

I have built pyodide with emscripten 1.38.47 and the master LLVM version from a few days ago disabling all packages except numpy. I just wanted to see if it ran properly and I'm getting this error:

Uncaught (in promise) RangeError: WebAssembly.Instance(): Out of memory: wasm memory
    at convertJsFunctionToWasm (VM1976 pyodide.asm.js:8)
    at addFunctionWasm (VM1976 pyodide.asm.js:8)
    at addFunction (VM1976 pyodide.asm.js:8)
    at Module._fp$_ZNSt3__213basic_istreamIwNS_11char_traitsIwEEED0Ev$vi (VM1976 pyodide.asm.js:8)
    at _fp$_ZNSt3__213basic_istreamIwNS_11char_traitsIwEEED0Ev$vi (VM1976 pyodide.asm.js:8)
    at :9000/wasm-function[15145]:0x667bc6
    at Module.___assign_got_enties (VM1976 pyodide.asm.js:8)
    at func (VM1976 pyodide.asm.js:8)
    at callRuntimeCallbacks (VM1976 pyodide.asm.js:8)
    at initRuntime (VM1976 pyodide.asm.js:8)

@mdboom
Copy link
Collaborator Author

mdboom commented Oct 23, 2019

I'm not sure what might be causing that. Does tweaking the TOTAL_MEMORY parameter have any effect? I don't know what convertJsFunctionToWasm does... I wonder if that's new to LLVM upstream usage.

Since this backtrace is in js, you should be able to add --minify 0 to LDFLAGS in the main Makefile, delete the build directory and rebuild (shouldn't take long). Then you can use the browser's Javascript debugger to see what the values are at various stages. Could be there's something off there...

@manumartin
Copy link

It seems to be an issue with chrome dev tools, it only happens when I refresh with chrome dev tools opened. I don't see it on firefox.

I have another error now:
RuntimeError: function signature mismatch

I can see a wasm stacktrace and also if I activate debugging symbols / sourcemaps with -g4 I can see this c++ stacktrace:

VM877 wasm-0466c812:15436 Uncaught (in promise) RuntimeError: function signature mismatch
    at std::__2::(anonymous namespace)::__fake_bind::operator()() const (wasm-function[15435]:0x797d5d)
    at decltype(std::__2::forward<std::__2::(anonymous namespace)::__fake_bind>(fp)()) std::__2::__invoke<std::__2::(anonymous namespace)::__fake_bind>(std::__2::(anonymous namespace)::__fake_bind&&) (wasm-function[15434]:0x797ca3)
    at void std::__2::__call_once_param<std::__2::tuple<std::__2::(anonymous namespace)::__fake_bind&&> >::__execute<>(std::__2::__tuple_indices<>) (wasm-function[15431]:0x797c8b)
    at std::__2::__call_once_param<std::__2::tuple<std::__2::(anonymous namespace)::__fake_bind&&> >::operator()() (wasm-function[15429]:0x797c6b)
    at void std::__2::__call_once_proxy<std::__2::tuple<std::__2::(anonymous namespace)::__fake_bind&&> >(void*) (wasm-function[14884]:0x78e510)
    at byn$fpcast-emu$void std::__2::__call_once_proxy<std::__2::tuple<std::__2::(anonymous namespace)::__fake_bind&&> >(void*) (wasm-function[23715]:0x7e0180)
    at std::__2::__call_once(unsigned long volatile&, void*, void (*)(void*)) (wasm-function[15662]:0x799a47)
    at void std::__2::call_once<std::__2::(anonymous namespace)::__fake_bind>(std::__2::once_flag&, std::__2::(anonymous namespace)::__fake_bind&&) (wasm-function[14874]:0x78e35e)
    at std::__2::locale::id::__get() (wasm-function[14709]:0x78bd83)
    at void std::__2::locale::__imp::install<std::__2::collate<char> >(std::__2::collate<char>*) (wasm-function[14645]:0x78b86c)

and it says the problem happens on python/Modules/xxsubtype.c:94 which contains this:

static PyObject *
spamlist_state_get(spamlistobject *self)
{
    return PyLong_FromLong(self->state);
}

That doesn't look at all like the calls on the stacktrace.. I have also tried to put a breakpoint there but because of the out of memory bug I have to restart the dev tools everytime I try. On firefox I can't get the sourcemaps to work.

From looking directly at the webassembly it seems like there is a "call xxx" instruction without setting any parameters first. xxx has parameters so it seems reasonable that a runtime error happens when calling it without them.

@manumartin
Copy link

Yes.. definitely the sourcemaps aren't right.. this is the full stacktrace:


std::__2::(anonymous namespace)::__fake_bind::operator()() const | @ | xxsubtype.c:94
-- | -- | --
  | decltype(std::__2::forward<std::__2::(anonymous namespace)::__fake_bind>(fp)()) std::__2::__invoke<std::__2::(anonymous namespace)::__fake_bind>(std::__2::(anonymous namespace)::__fake_bind&&) | @ | xxsubtype.c:94
  | void std::__2::__call_once_param<std::__2::tuple<std::__2::(anonymous namespace)::__fake_bind&&> >::__execute<>(std::__2::__tuple_indices<>) | @ | xxsubtype.c:94
  | std::__2::__call_once_param<std::__2::tuple<std::__2::(anonymous namespace)::__fake_bind&&> >::operator()() | @ | xxsubtype.c:94
  | void std::__2::__call_once_proxy<std::__2::tuple<std::__2::(anonymous namespace)::__fake_bind&&> >(void*) | @ | xxsubtype.c:94
  | byn$fpcast-emu$void std::__2::__call_once_proxy<std::__2::tuple<std::__2::(anonymous namespace)::__fake_bind&&> >(void*) | @ | xxsubtype.c:94
  | std::__2::__call_once(unsigned long volatile&, void*, void (*)(void*)) | @ | xxsubtype.c:94
  | void std::__2::call_once<std::__2::(anonymous namespace)::__fake_bind>(std::__2::once_flag&, std::__2::(anonymous namespace)::__fake_bind&&) | @ | xxsubtype.c:94
  | std::__2::locale::id::__get() | @ | xxsubtype.c:94
  | void std::__2::locale::__imp::install<std::__2::collate<char> >(std::__2::collate<char>*) | @ | xxsubtype.c:94
  | std::__2::locale::__imp::__imp(unsigned long) | @ | xxsubtype.c:94
  | std::__2::locale::__imp& std::__2::(anonymous namespace)::make<std::__2::locale::__imp, unsigned int>(unsigned int) | @ | xxsubtype.c:94
  | std::__2::locale::__imp::make_classic() | @ | xxsubtype.c:94
  | std::__2::locale::classic() | @ | xxsubtype.c:94
  | std::__2::locale::__imp::make_global() | @ | xxsubtype.c:94
  | std::__2::locale::__global() | @ | xxsubtype.c:94
  | std::__2::locale::locale() | @ | xxsubtype.c:94
  | std::__2::basic_streambuf<char, std::__2::char_traits<char> >::basic_streambuf() | @ | xxsubtype.c:94
  | std::__2::__stdinbuf<char>::__stdinbuf(_IO_FILE*, __mbstate_t*) | @ | xxsubtype.c:94
  | std::__2::ios_base::Init::Init() | @ | xxsubtype.c:94
  | __cxx_global_var_init | @ | xxsubtype.c:94
  | _GLOBAL__I_000101 | @ | xxsubtype.c:94
  | __wasm_call_ctors | @ | xxsubtype.c:94
  | Module.___wasm_call_ctors | @ | pyodide.asm.js:98135
  | func | @ | pyodide.asm.js:2276
  | callRuntimeCallbacks | @ | pyodide.asm.js:1758
  | initRuntime | @ | pyodide.asm.js:1793
  | doRun | @ | pyodide.asm.js:157266
  | run | @ | pyodide.asm.js:157287
  | runCaller | @ | pyodide.asm.js:157186
  | removeRunDependency | @ | pyodide.asm.js:1968
  | receiveInstance | @ | pyodide.asm.js:2113
  | (anonymous) | @ | pyodide.js:313
  | Promise.then (async) |   |  
  | Module.instantiateWasm | @ | pyodide.js:313
  | createWasm | @ | pyodide.asm.js:2168
  | (anonymous) | @ | pyodide.asm.js:42207
  | (anonymous) | @ | pyodide.js:363
  | script.onload | @ | pyodide.js:83
  | load (async) |   |  
  | loadScript | @ | pyodide.js:83
  | (anonymous) | @ | pyodide.js:359
  | script.onload | @ | pyodide.js:83
  | load (async) |   |  
  | loadScript | @ | pyodide.js:83
  | (anonymous) | @ | pyodide.js:357
  | (anonymous) | @ | pyodide.js:5

@manumartin
Copy link

manumartin commented Oct 24, 2019

Someone has a very similar stacktrace here google/sanitizers#947

He is also using clang but not to compile to wasm.

It looks like the problem is related to libc++ initialization "Now, this seems like an initialization problem (e.g. similar to issue #30) with the C++ standard stream objects (std::cout, etc.), since the reference call in the stack traces is std::ios_base::Init."

@pmp-p
Copy link
Contributor

pmp-p commented Oct 25, 2019

hi, upstream is currently (very?) unstable with -fPIC, there was some improvement on tot-upstream a while ago but it's seems there are regressions.
If not using dlfcn, the only simple way is to remove "-fPIC" from emcc.py arguments like said here emscripten-core/emscripten#9317 (comment)

@manumartin
Copy link

Hi @pmp-p thanks for the answer. The thing is I'm interested in using -fPIC because we want to use dynamic linking. Scipy has several libraries that are currently linking statically against LAPACK and this generates a very big object with ~5 copies of LAPACK.

I'm wondering if perhaps we could link statically but use some of the link time optimizations that LLVM supports.

Currently I can't even make a c++ hello world work using emscripten-1.38.47 + LLVM master + "-fPIC". It builds and it executes but it prints weird stuff. When I remove -fPIC it works. On top of that the dwarf debugging information inside the .wasm file seems wrong or missing some parts, that makes the sourcemaps generated by emscripten to also have missing stuff since they are generated from the dwarf information.

I don't know if I'm doing something wrong but I wasn't expecting I wasn't going to be able to build, run and debug a simple c++ "hello world".

I can currently build pyodide (Without packages like scipy) with this setup, but it is throwing some weird runtime error inslide libcxx when running.

@manumartin
Copy link

For anyone interested I managed to make the sourcemap stuff work by setting the -g4 flag when building the system libraries (libstdc++). I had to edit the emscripten tools/system_libs.py script and change a cflags variable there. It's kind of a hack but I haven't found the proper way of doing this.

@rth
Copy link
Member

rth commented Oct 27, 2019

Thanks for all your work on this @manumartin !

I can currently build pyodide (Without packages like scipy) with this setup, but it is throwing some weird runtime error inside libcxx when running.

It great that you at least managed to make it build. To make things a bit more incremental, I think it could make sense to create a separate branch (e.g. llvm-upstream) and add your changes there. If we can manage to make this work, even without scipy and its dependencies that would already be significant. I can create that branch so you can create your PR if that works for you?

again @rth has solved other duplicate symbol errors by entirely removing object files from LAPACK, but in this case the duplicated functions seem fairly common and I think if I remove them other stuff can break.

Yeah, that part is a mess. There are multiple duplicated symbols in general, in scipy & LAPACK/BLAS, with dynamic linking it doesn't seem to matter, but it will error when linked statically. I haven't found a good approach for it short of disabling modules entirely.

@manumartin
Copy link

Hi @rth I'm ok with that, however to make it work I'm currently using:

emscripten 1.38.47
LLVM master <- needed to fix the crashes while compiling sqlite, lz4 etc..
binaryen version_89 <- I needed this because the interface of one of the LLVM tools changed breaking the build with the binaryen version paired with emscripten 1.38.47.

Those versions can't be installed by the normal "emsdk install / activate" method and I have installed them manually. I could modify emsdk/Makefile temporarilly so that it downloads and builds those until emsdk supports versions with the needed changes.

@rth
Copy link
Member

rth commented Apr 17, 2020

@manumartin Thanks again for your work on this! In your attempts what branch did you use? I imagine you have additional changes on top of this PR? I'm trying to give this another go in #637

@rth
Copy link
Member

rth commented May 27, 2020

Just for future reference there is a more incremental apporach for updating emscripten in #480

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants