Skip to content

Better package cross compilation#2238

Merged
hoodmane merged 37 commits intopyodide:mainfrom
hoodmane:better-cross-build
Mar 10, 2022
Merged

Better package cross compilation#2238
hoodmane merged 37 commits intopyodide:mainfrom
hoodmane:better-cross-build

Conversation

@hoodmane
Copy link
Member

@hoodmane hoodmane commented Mar 2, 2022

Instead of trying to convince setup.py to do what we want, the wheel
package empowers us to unpack and directly manipulate the wheel.
In particular this is important because --skip-build really does
not work.

First we build a native wheel. If skip_host is True, then this
"native" wheel has defective empty files as binaries, if it's False,
we build a real native wheel. If skip_host is False, then we use
pip to install our native wheel into host directory.

Then we unpack the wheel, replace the platform tag with
wasm32_emscripten, remove the non-metadata files, and copy in the
new files with the cross compiled binaries.

Of course, this is all only necessary for binary extensions.

Hopefully this will help get us towards using alternative build
backends. The one concern is that this approach depends on
implementation details of where the build backend puts the files
into the directory tree before copying them into the wheel. If
the build backend moved the linked binaries after building them,
we would be in trouble. Or worse if the build backend deleted the
files after packing them into the wheel, we'd be in trouble.

Basically, it seems clear that we can only support alternative
build backends on a case-by-case basis.

@henryiii I'm curious what your thoughts are cross compilation + arbitrary build backends.

hoodmane added 3 commits March 1, 2022 18:25
We also update it to handle .so files that are not in the MEMFS.
(In this case we have to copy the file into memory before loading it.)
Instead of trying to convince setup.py to do what we want, the wheel
package empowers us to unpack and directly manipulate the wheel.
In particular this is important because `--skip-build` really does
not work.

First we build a native wheel. If skip_host is True, then this
"native" wheel has defective empty files as binaries, if it's False,
we build a real native wheel. If skip_host is False, then we use
pip to install our native wheel into host directory.

Then we unpack the wheel, replace the platform tag with
wasm32_emscripten, remove the non-metadata files, and copy in the
new files with the cross compiled binaries.

Of course, this is all only necessary for binary extensions.

Hopefully this will help get us towards using alternative build
backends. The one concern is that this approach depends on
implementation details of where the build backend puts the files
into the directory tree before copying them into the wheel. If
the build backend moved the linked binaries after building them,
we would be in trouble. Or worse if the build backend deleted the
files after packing them into the wheel, we'd be in trouble.

Basically, it seems clear that we can only support alternative
build backends on a case-by-case basis.
This was referenced Mar 2, 2022
@henryiii
Copy link
Contributor

henryiii commented Mar 3, 2022

I'd have to understand the cross-compile process a little better. Where is the "cross-compiled" run being done? I'm guessing it's picking up the compile commands, then reprocessing them, or something like that?

@hoodmane
Copy link
Member Author

hoodmane commented Mar 3, 2022

I'm guessing it's picking up the compile commands, then reprocessing them, or something like that?

Yes. On this branch, we run python setup.py bdist_wheel with symlinks installed so that attempts to invoke the C compiler toolchain are recorded into $pkg_root/build/$src_dir/build.log. If skip_host is not false, we just emit an empty output file rather than actually invoking the compiler. Then we delete these "native" artifacts.
This is done by the pywasmcross.capture_compile function. capture_make_command_wrapper_symlinks places the symlinks into the path.

Then we replay the compilation with pywasmcross.replay_compile which iterates over build.log and calls replay_command, which does some rewrite process to remove flags that don't make sense or replace them with other flags, rearrange which libraries are loaded, remove or reroute system libraries or includes into emscripten ones etc. (This is quite similar to what happens in emcc.py which is after a lot of the same stuff.)

Then we unpack the wheel, copy the new binary files into the wheel, rewrite the platform tag, and repack.

@hoodmane hoodmane mentioned this pull request Mar 8, 2022
Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @hoodmane! Generally LGTM.

It would be great if rth can review this too, as I haven't reviewed our build system very much. But he is doing more important things so I'm fine with merging this as there are other PRs waiting for this. (We can discuss our build system more in #2256 anyway)

@hoodmane
Copy link
Member Author

Thanks for the review @ryanking13!

I haven't reviewed our build system very much

It's very confusing and we have a steep learning curve, but thankfully we seem to be gradually increasing how much we understand the build system. Probably it would be good to devote some time to improving the dev docs or in line comments on this topic to reduce the confusingness.

I'm fine with merging this as there are other PRs waiting for this

To be clear, my opinion is that this replaces the hack we already have with a new hack which is marginally simpler and more predictable. But this is enough to allow us to update to Python 3.10 =)

#2256 will dramatically improve predictability and get us away from being tied to setuptools. It also removes slightly more than 400 loc on net.

@hoodmane
Copy link
Member Author

Okay I guess I'll merge this then.

@hoodmane hoodmane merged commit c01ab54 into pyodide:main Mar 10, 2022
@hoodmane hoodmane deleted the better-cross-build branch March 10, 2022 04:34
hoodmane added a commit to hoodmane/pyodide that referenced this pull request Mar 10, 2022
Co-authored-by: Gyeongjae Choi <[email protected]>

Instead of trying to convince setup.py to do what we want, the wheel
package empowers us to unpack and directly manipulate the wheel.
In particular this is important because --skip-build really does
not work.

First we build a native wheel. If skip_host is True, then this
"native" wheel has defective empty files as binaries, if it's False,
we build a real native wheel. If skip_host is False, then we use
pip to install our native wheel into host directory.

Then we unpack the wheel, replace the platform tag with
wasm32_emscripten, remove the non-metadata files, and copy in the
new files with the cross compiled binaries.

Of course, this is all only necessary for binary extensions.

Hopefully this will help get us towards using alternative build
backends. The one concern is that this approach depends on
implementation details of where the build backend puts the files
into the directory tree before copying them into the wheel. If
the build backend moved the linked binaries after building them,
we would be in trouble. Or worse if the build backend deleted the
files after packing them into the wheel, we'd be in trouble.

Basically, it seems clear that we can only support alternative
build backends on a case-by-case basis.
hoodmane added a commit that referenced this pull request Mar 13, 2022

Our package build process currently has a significant flaw: we first run setup.py, recording all compilation commands, then we rewrite these compilation commands to invoke emcc and replay them, and then we pray that the cross compiled executables ended up in the right place to go into the wheel. This is not a good strategy because the build script is allowed to implement arbitrary logic, and if it moves, renames, etc any of the output files then we lose track of them. This has repeatedly caused difficulty for us.

However, we also make no particularly significant use of the two pass approach. We can just do the simpler thing: capture the compiler commands as they occur, modify them as needed, and then run the fixed command.

I also added a patch to fix the numpy feature detection for wasm so that we don't have to include _npyconfig.h and config.h, numpy can generate them in the way it would for a native build. I opened a numpy PR that would fix the detection for us upstream:
numpy/numpy#21154

This clears the way for us to switch to using pypa/build (as @henryiii has suggested) by removing our dependence on specific setuptools behavior.

This is on top of #2238.
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hoodmane !

I agree it's a good idea to get rid of setup.py install there, and the general direction sounds good to me. This does depend on some of the implementation details, as you mentionned, but we have a wide enough variety of packages to catch most common issues I think. So as long as it builds it should be OK.

A few minor comments below, mostly about docs (feel fee to aggregate them into a single PR, I'm going to have a look at some other recent PRs as well).

WHEEL.write_text("\n".join(result_lines))


def update_wheel_platform(build_dir: Path, wheel_dir: Path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a docstring explaing what exactly we are updating and where?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function was removed in #2256.

del bash_runner.env["_PYTHON_HOST_PLATFORM"]


def set_host_platform(wheel_dir: Path, platform: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, a docstring with a sentence saying that this edits *.dist-info/WHEEL to replace the Tag value with the platform, and linking to the PEP or documentation listing valid values there wouldn't hurt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, removed in #2256

sys.executable,
"-m",
"pip",
"install",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing is pip will often use build isolation to install build requirements from pyproject.toml and not whatever dependencies we indicated in our build system.
In most cases, that's probably fine, but I wonder if this won't be an issue if the build dependency is patched. Say you build a package that needs numpy at build time, and it will use a different (and unpatched) numpy version to install it on the host when skip_host=False.
Again I'm not sure if it's really an issue, but we could at least mention it in a comment that in some cases it might be appropriate to disable build isolation here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, this was also removed in #2256 (along with the entire skip_host key). Only numpy was setting skip_host to false too, so if there were problems on this code path we probably wouldn't know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants