Skip to content

Add Pyodide support and CI jobs for statsmodels#9270

Merged
bashtage merged 78 commits intostatsmodels:mainfrom
agriyakhetarpal:add-emscripten-ci
Aug 27, 2024
Merged

Add Pyodide support and CI jobs for statsmodels#9270
bashtage merged 78 commits intostatsmodels:mainfrom
agriyakhetarpal:add-emscripten-ci

Conversation

@agriyakhetarpal
Copy link
Contributor

@agriyakhetarpal agriyakhetarpal commented Jun 6, 2024

This PR adds a GitHub Actions CI job in emscripten.yml that builds and tests WASM wheels for statsmodels (Pyodide version 0.26.0, Python 3.12), compiled using the Emscripten toolchain (version 3.1.58). The pytest test suite is then executed against the built wheel using the in-house statsmodels.test() API – the tests have been fixed or skipped in order to adapt to the limitations of Pyodide in the browser, such as not being able to use threading/multiprocessing from the standard library or the fact that the bitness for the Python interpreter that is powered by Pyodide is 32-bit.

A Node.js-based test suite that calls Pyodide from JS is used instead of the Pyodide-managed special virtual environment due to differences in being able to load shared objects, where there are some problems for statsmodels regarding the visibility of symbols most likely coming from OpenBLAS, please see previous discussion(s) on #9166 for more information.

Additional context

Here are a few xrefs to some other PRs that have been implemented for packages in the Scientific Python ecosystem:

The greater, long-term goal towards this change is to implement Sphinx-based interactive documentation via JupyterLite and WASM-powered, in-browser kernels, as referenced in Quansight-Labs/czi-scientific-python-mgmt#19, see also: Quansight-Labs/czi-scientific-python-mgmt#18. A pilot that can be readily tried out is available for the "API reference" pages under the PyWavelets documentation. This will be preceded by configuring a place to publish these WebAssembly/Emscripten wheels nightly or on a scheduled cadence for statsmodels (which could be a third-party, PyPI-like index such as Anaconda.org) and then integrating Sphinx extensions such as jupyterlite-sphinx for hosted documentation. Work towards these areas is going on in-line with the above tracking issue(s).


Details

Notes:

  • It is essential that you add a test when making code changes. Tests are not
    needed for doc changes.
  • When adding a new function, test values should usually be verified in another package (e.g., R/SAS/Stata).
  • When fixing a bug, you must add a test that would produce the bug in main and
    then show that it is fixed with the new code.
  • New code additions must be well formatted. Changes should pass flake8. If on Linux or OSX, you can
    verify your changes are well formatted by running
    git diff upstream/main -u -- "*.py" | flake8 --diff --isolated
    
    assuming flake8 is installed. This command is also available on Windows
    using the Windows System for Linux once flake8 is installed in the
    local Linux environment. While passing this test is not required, it is good practice and it helps
    improve code quality in statsmodels.
  • Docstring additions must render correctly, including escapes and LaTeX.

@agriyakhetarpal agriyakhetarpal marked this pull request as draft June 6, 2024 11:26
@josef-pkt
Copy link
Member

changes in unit test looks good to me
The only silenced "failure" seems to be in the base tests.

    @pytest.mark.xfail(is_wasm(), reason="Fails in WASM, not sure what to do with this yet")
    def test_zero_collinear(self):

Question (I never looked at this)
Is the numeric code, numpy, scipy, linear algebra, run in 32 or 64 bit precision?

@agriyakhetarpal
Copy link
Contributor Author

Hi, @bashtage and @josef-pkt! Is it okay to tag you both to seek another review on this PR, since it has been a month since my last review request? It is complete and ready to go from my end, and I've merged the latest changes from main now and resolved the conflicts that came up – thanks! I'm glad to make this PR easier to review if needed, so if anything is required to be done, please let me know and I'll take a look at that.

@bashtage
Copy link
Member

Thanks for the work. I'll get around to it next week.

@agriyakhetarpal
Copy link
Contributor Author

agriyakhetarpal commented Jul 22, 2024

Sure, thanks, @bashtage! Could you trigger the workflows in the meantime here, please?

@agriyakhetarpal
Copy link
Contributor Author

Hi again, @bashtage – here is a gentle reminder since you mentioned you would get around to it the week after. There is no hurry here whatsoever – whenever you get the time. Thanks!

@bashtage
Copy link
Member

bashtage commented Aug 1, 2024

I am away this week but will absolutely take a look when I'm back.

Copy link
Member

@bashtage bashtage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small change and one quesation. Overall LGTM. I can't really comment n the wasm/js stuff.

@pytest.mark.matplotlib
@pytest.mark.skipif(
PYTHON_IMPL_WASM,
reason="Matplotlib uses different backend in WASM/Pyodide"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the backend matter for these and the others that are marked to skip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean the functionality of the plots, I don't think so. Normally, one will not need to adapt the code to use matplotlib-pyodide to use the HTML5/WASM canvases, since it gets installed when one installs matplotlib using micropip. The details are a little murky since it has been some time for me, but I received the error: 'Line2D' object has no attribute "other", so it is probably a missing feature in matplotlib-pyodide at the time of writing.

For example, https://github.com/statsmodels/statsmodels/blob/main/examples/python/categorical_interaction_plot.py (I can't find any other failing example right now) can be tried out in the Pyodide console (https://pyodide.org/en/stable/console.html), though it won't display any of the matplotlib.figure.Figure objects, of course, lacking a canvas for the renderer – they can be viewed in a JupyterLite notebook (https://jupyter.org/try-jupyter/lab/) and they should all work without any code changes. Not all of the tests marked with @pytest.mark.matplotlib have been skipped, and the ones skipped here are those that were failing because of the error noted above. I could investigate further as a follow-up if you would want me to do so.

@bashtage
Copy link
Member

Going to merge this and we'll see if other changes are needed.

@bashtage
Copy link
Member

I'll let it run one more time to make sure there is nothing stange

Copy link
Member

@bashtage bashtage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve

@bashtage bashtage merged commit 629a436 into statsmodels:main Aug 27, 2024
@agriyakhetarpal agriyakhetarpal deleted the add-emscripten-ci branch August 27, 2024 09:23
@agriyakhetarpal
Copy link
Contributor Author

Thank you for the merge!

agriyakhetarpal added a commit to agriyakhetarpal/pyodide that referenced this pull request Sep 5, 2024
@agriyakhetarpal agriyakhetarpal restored the add-emscripten-ci branch September 16, 2024 13:50
@agriyakhetarpal agriyakhetarpal deleted the add-emscripten-ci branch September 16, 2024 15:35
bashtage added a commit that referenced this pull request Sep 17, 2024
Backport of #9270: add Pyodide support and CI jobs for v0.14.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: out-of-tree Pyodide builds for statsmodels

4 participants