Skip to content

Better integration with conda/conda-forge for building packages #795

@rth

Description

@rth

The idea to rely on conda-forge for building Python packages to WebAssembly has been mentioned for a while now (#38 (comment), conda/conda#7619, regro/cf-scripts#1052 (comment)), and in this issue I wanted start a discussion about current situation and existing challenges to move in that direction from the perspective of pyodide, as I understand it (please correct me if needed).

First the main motivation is that the present way of building all the packages in one repo is not sustainable with the increase of the number of packages and the associated increase in CI time. To resolve this we would need significant development resources, which we don't have. Even if we did, it would amount to doing many things (including a community) that already exist and work great at conda-forge, which wouldn't make sense.

Now as to challenges (it's a long post),

1. Updating emscripten

With a single repo it's relatively fast to rebuild all packages with a different version of the emsdk toolchain (emscripten, binaryen, ..) or different options. We currently still have a couple of patches applied to emscripten, and we also ideally need to update emscripten frequently to benefit from improvements and fixes (currently 1.5 years late with respect to the latest release, unfortunately). In conda-forge rebuilding all the packages with a new compiler would take longer (though this got better recently regro/cf-scripts#1052 (comment)). Also the use-case where a) we update emscripten version b) some package fails to build c) we have to go back and change some global emscripten settings would really be unpractical I think.

This would hopefully become less of an issue with time as emscripten becomes more and more stable, but it's still an issue now (see e.g. #480 (comment))

2. Build approach

The cross-compilation of scientific Python packages (based on distutils) is difficult (scipy/scipy#8571, numpy/numpy#17620) as far as I understand, even on Linux between different architectures.

I'm not sure if this was the reason, but pyodide doesn't do cross-compilation in the classical sense. Instead it compiles the package with the host compilers, stores a log of all executed compilation commands and re-run those commands with the emscripten compiler.

3. Shared package specifications

Package specifications where chosen as close as possible to the meta.yaml in conda, and hopefully soon the package index will also use the same format (#791)

4. Artifacts format

Currently each package consist of 2 separate (.data, .js) files which we distribute via jsDelivr. Those would probably not fit as conda artifacts, which would mean that we likely need to handle some of this in any case.

5. Dependency resolution in the browser

There are two use cases for pyodide,

  1. interactive (notebooks, etc) where having a dependency resolver in the browser (e.g. mamba) would be great.
  2. python applications, where dependencies are known in advance, and we certainly don't want to do dependency resolution at each page load. There having a precomputed list of packages is likely the way to go.

Either way we also need to install pure python wheels (from PyPi or other custom location), so we still have this duality between conda/pyodide packages and Python wheels as well. Meaning we have to maintain a minimalistic pip (micropip) in pyodide.

I haven't followed close WebAssembly related developments at conda-forge, maybe I am missing something.

cc @wolfv @jakirkham

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions