Skip to content

Comments

Emscripten 2.0.12 Upstream backend#1102

Merged
rth merged 61 commits intopyodide:masterfrom
joemarshall:upstream_backend
Feb 6, 2021
Merged

Emscripten 2.0.12 Upstream backend#1102
rth merged 61 commits intopyodide:masterfrom
joemarshall:upstream_backend

Conversation

@joemarshall
Copy link
Contributor

@joemarshall joemarshall commented Jan 10, 2021

What it says. Emscripten 2.0.9 upstream (non-fastcomp, compiles straight to wasm), plus patches for binaryen and emscripten so that dynamic linking and passing of function pointers between side modules and main modules works.

I'm part way through pull requests for binaryen and emscripten so that the patches there aren't needed.

All other patches to emscripten and binaryen have been dropped - I'm not sure if they were all things that are fixed in the new backend - none of them worked with new backend anyway

Closes #531
Closes #476

@joemarshall
Copy link
Contributor Author

I need to fix the tests, will do it later if I have a chance

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @joemarshall, really looking forward to this!



class EnvironmentRewritingArgument(argparse.Action):
def __call__(self, parser, namespace, values, option_string=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some type annotations would help readability.

Copy link
Contributor

@dalcde dalcde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! I'm really excited about this. I've left some minor comments

@dalcde
Copy link
Contributor

dalcde commented Jan 10, 2021

We should try not reviewing simultaneously :). Glad that we mostly caught distinct things though.

@joemarshall
Copy link
Contributor Author

Current status on this - I've commited some of the changes suggested above. In doing so I had to move to emscripten 2.0.12 to get that packager fix in. It needed some more fixes to emscripten dynamic linking to make the tests there work (plus some modifying of the fpcast test so that it actually tried calling something with the wrong signature!). Right now there's a bunch of tests in src that I need to work through to see what is broken. Updating seems to have broken python output (at least from print in console.html), I suspect this is causing a bunch of the fails.

@joemarshall
Copy link
Contributor Author

Fixed a bunch of tests now, but there's some stuff in scipy that I think is to do with yet more fortran return types being inconsistent which causes some test fails when sub modules don't import. This also breaks astropy and some things that use scipy submodules.

Also, wasm in new emscripten breaks sometimes on access to uninitialised variables I discover, whereas it appears asm.js initializes to zero. There was one in the jsproxy that caused all sort of weird intermittent errors.

@rth
Copy link
Member

rth commented Jan 12, 2021

yet more fortran return types being inconsistent which causes some test fails when sub modules don't import

BTW, apparently there are a number of known issues around it #72 (comment) and scipy/scipy#10335

There was one in the jsproxy that caused all sort of weird intermittent errors.

Was or is? As there are lots of core (and package) test are still failing with RuntimeError: unreachable executed. It would be great to make those work first. Maybe @hoodmane would have some insights on debugging jsproxy related issues..

Could you also please merge upstream master in to resolve conflicts?

@joemarshall
Copy link
Contributor Author

Oops, didn't push the unreachable thing, which was that uninitialized variable in jsproxy.

There's still weirdness with function parameters to lapack code in scipy which I am debugging by building it with function parameter and implicit function definition warnings set to error

@hoodmane
Copy link
Member

hoodmane commented Jan 12, 2021

Looking into this, at a first glance these all seem to be assertion errors fairly deep inside of CPython. I took a look at the core-chrome tests.

Most failures (380) are coming from pyproxy_apply ==> PyObject_VectorCall ==> "function signature mismatch". These could be issues related to pyproxy, but I would suspect something about the vectorcall protocol might be amiss since we see so many of these failures (and no examples of failures in the ordinary nonvector "Call" protocol, which pyproxy calls into regularly as well). VectorCall looks up a function pointer based on an offset stored in the PyType struct. Does "function signature mismatch" mean it actually found a function pointer, or is it possible that it didn't even get a correct function pointer at all.

In 5 cases it fails inside of the ceval.c method import_all_from which is the lowest layer of the import mechanism. In one case it fails at

from js import TEST
from js import a
import micropip (x3)

I think this is happening at _PyObject_LookupAttrId(v, &PyId___dict__, &dict) on line 5269 of ceval.c. There are a couple of cases while executing the LOAD_ATTR opcode (2966 of ceval).

3 times it fails in ___sys_wait4 with "abort(cannot wait on child processes)", that may need a patch?

@hoodmane
Copy link
Member

hoodmane commented Jan 12, 2021

Anyways, there's a clear first thing to look into since VectorCall failures account for the vast majority of problems. I don't think most of these tests are run in selenium_standalone so it is likely that the other failures are caused by state corruption that occurred in the VectorCall panics.

@joemarshall
Copy link
Contributor Author

I'll try to reproduce one of those errors in browser, there's probably something similar to the proxy error that I pushed a patch for, some kind of assumption that function types don't matter because they don't in C, or that things are initialized to zero...

@hoodmane
Copy link
Member

hoodmane commented Jan 12, 2021

In the firefox version, it gives unreadable stack traces (no function name symbols) so you can't tell that it's inside of VectorCall but the errors read RuntimeError: indirect call to null, so that suggests that whatever the vectorcall function pointers are actually all NULL somehow? Or it is possible that indirect call to null is just what Firefox says when any bad function call happens.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. I think we can merge the rest in this PR. Thanks again for all you work @joemarshall !

for (let name of imports) {
if (name in packageNames) {
packages.add(name);
packages.add(packageNames[name]);
Copy link
Member

@rth rth Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we didn't have enough tests for this. (No action needed in this PR).

@rth
Copy link
Member

rth commented Feb 5, 2021

The only thing is that pillow on chrome seems to frequently timeout (or there are timeouts in other packages as well). Though I have also seen time outs on master recently, so I would be OK merging this as is, and try to iron out unreliable tests in follow up PRs.

@rth
Copy link
Member

rth commented Feb 5, 2021

I pushed a few commits to fix review comments as I'm very much looking forward to see this merged :)

@rth
Copy link
Member

rth commented Feb 5, 2021

BTW it looks like with the upstream backend in this PR pure Python code in pyodide code runs 25 to 30% faster on average (ref #1120)

Below are benchmark results for Firefox aggregated over 3 CI runs in this PR and on master. Chrome is comparable. Benchmarks on the top are mostly pure python code, and at the bottom mostly using numpy,

image

@rth
Copy link
Member

rth commented Feb 5, 2021

The only concern is that Chrome does seem to time out more often here and we removed the custom timeout in conftest.py which was a few min, while the current default timeout of 10 or 20 min is too long.

@dalcde
Copy link
Contributor

dalcde commented Feb 6, 2021 via email

@rth rth merged commit 1bd8380 into pyodide:master Feb 6, 2021
@rth
Copy link
Member

rth commented Feb 6, 2021

Also actually timeouts for packages in Chrome might have something to do with the fact that this removes xfail on scipy dependent packages. Previously it's not that they weren't working, but had a non negligible change of failing (or I guess now time out) hence the xfail. Maybe it's just more about RAM usage when a few of such packages are loaded in parallel.

@rth
Copy link
Member

rth commented Feb 8, 2021

Thanks again for your work on this @joemarshall and @dalcde for improving the build setup earlier to make it possible! This fixed a number of scipy related issues (linked above) and reduced the scipy size significantly #240 (comment), which was a blocker for a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate to upstream LLVM for emscripten backend

4 participants