Better package cross compilation#2238
Conversation
We also update it to handle .so files that are not in the MEMFS. (In this case we have to copy the file into memory before loading it.)
Instead of trying to convince setup.py to do what we want, the wheel package empowers us to unpack and directly manipulate the wheel. In particular this is important because `--skip-build` really does not work. First we build a native wheel. If skip_host is True, then this "native" wheel has defective empty files as binaries, if it's False, we build a real native wheel. If skip_host is False, then we use pip to install our native wheel into host directory. Then we unpack the wheel, replace the platform tag with wasm32_emscripten, remove the non-metadata files, and copy in the new files with the cross compiled binaries. Of course, this is all only necessary for binary extensions. Hopefully this will help get us towards using alternative build backends. The one concern is that this approach depends on implementation details of where the build backend puts the files into the directory tree before copying them into the wheel. If the build backend moved the linked binaries after building them, we would be in trouble. Or worse if the build backend deleted the files after packing them into the wheel, we'd be in trouble. Basically, it seems clear that we can only support alternative build backends on a case-by-case basis.
This reverts commit 53900b0.
|
I'd have to understand the cross-compile process a little better. Where is the "cross-compiled" run being done? I'm guessing it's picking up the compile commands, then reprocessing them, or something like that? |
Yes. On this branch, we run Then we replay the compilation with Then we unpack the wheel, copy the new binary files into the wheel, rewrite the platform tag, and repack. |
Co-authored-by: Gyeongjae Choi <[email protected]>
|
Thanks for the review @ryanking13!
It's very confusing and we have a steep learning curve, but thankfully we seem to be gradually increasing how much we understand the build system. Probably it would be good to devote some time to improving the dev docs or in line comments on this topic to reduce the confusingness.
To be clear, my opinion is that this replaces the hack we already have with a new hack which is marginally simpler and more predictable. But this is enough to allow us to update to Python 3.10 =) #2256 will dramatically improve predictability and get us away from being tied to setuptools. It also removes slightly more than 400 loc on net. |
|
Okay I guess I'll merge this then. |
Co-authored-by: Gyeongjae Choi <[email protected]> Instead of trying to convince setup.py to do what we want, the wheel package empowers us to unpack and directly manipulate the wheel. In particular this is important because --skip-build really does not work. First we build a native wheel. If skip_host is True, then this "native" wheel has defective empty files as binaries, if it's False, we build a real native wheel. If skip_host is False, then we use pip to install our native wheel into host directory. Then we unpack the wheel, replace the platform tag with wasm32_emscripten, remove the non-metadata files, and copy in the new files with the cross compiled binaries. Of course, this is all only necessary for binary extensions. Hopefully this will help get us towards using alternative build backends. The one concern is that this approach depends on implementation details of where the build backend puts the files into the directory tree before copying them into the wheel. If the build backend moved the linked binaries after building them, we would be in trouble. Or worse if the build backend deleted the files after packing them into the wheel, we'd be in trouble. Basically, it seems clear that we can only support alternative build backends on a case-by-case basis.
Our package build process currently has a significant flaw: we first run setup.py, recording all compilation commands, then we rewrite these compilation commands to invoke emcc and replay them, and then we pray that the cross compiled executables ended up in the right place to go into the wheel. This is not a good strategy because the build script is allowed to implement arbitrary logic, and if it moves, renames, etc any of the output files then we lose track of them. This has repeatedly caused difficulty for us. However, we also make no particularly significant use of the two pass approach. We can just do the simpler thing: capture the compiler commands as they occur, modify them as needed, and then run the fixed command. I also added a patch to fix the numpy feature detection for wasm so that we don't have to include _npyconfig.h and config.h, numpy can generate them in the way it would for a native build. I opened a numpy PR that would fix the detection for us upstream: numpy/numpy#21154 This clears the way for us to switch to using pypa/build (as @henryiii has suggested) by removing our dependence on specific setuptools behavior. This is on top of #2238.
rth
left a comment
There was a problem hiding this comment.
Thanks @hoodmane !
I agree it's a good idea to get rid of setup.py install there, and the general direction sounds good to me. This does depend on some of the implementation details, as you mentionned, but we have a wide enough variety of packages to catch most common issues I think. So as long as it builds it should be OK.
A few minor comments below, mostly about docs (feel fee to aggregate them into a single PR, I'm going to have a look at some other recent PRs as well).
| WHEEL.write_text("\n".join(result_lines)) | ||
|
|
||
|
|
||
| def update_wheel_platform(build_dir: Path, wheel_dir: Path): |
There was a problem hiding this comment.
Would you mind adding a docstring explaing what exactly we are updating and where?
| del bash_runner.env["_PYTHON_HOST_PLATFORM"] | ||
|
|
||
|
|
||
| def set_host_platform(wheel_dir: Path, platform: str): |
There was a problem hiding this comment.
Same, a docstring with a sentence saying that this edits *.dist-info/WHEEL to replace the Tag value with the platform, and linking to the PEP or documentation listing valid values there wouldn't hurt.
| sys.executable, | ||
| "-m", | ||
| "pip", | ||
| "install", |
There was a problem hiding this comment.
The only thing is pip will often use build isolation to install build requirements from pyproject.toml and not whatever dependencies we indicated in our build system.
In most cases, that's probably fine, but I wonder if this won't be an issue if the build dependency is patched. Say you build a package that needs numpy at build time, and it will use a different (and unpatched) numpy version to install it on the host when skip_host=False.
Again I'm not sure if it's really an issue, but we could at least mention it in a comment that in some cases it might be appropriate to disable build isolation here.
There was a problem hiding this comment.
Oh yeah, this was also removed in #2256 (along with the entire skip_host key). Only numpy was setting skip_host to false too, so if there were problems on this code path we probably wouldn't know.
Instead of trying to convince setup.py to do what we want, the wheel
package empowers us to unpack and directly manipulate the wheel.
In particular this is important because
--skip-buildreally doesnot work.
First we build a native wheel. If skip_host is True, then this
"native" wheel has defective empty files as binaries, if it's False,
we build a real native wheel. If skip_host is False, then we use
pip to install our native wheel into host directory.
Then we unpack the wheel, replace the platform tag with
wasm32_emscripten, remove the non-metadata files, and copy in the
new files with the cross compiled binaries.
Of course, this is all only necessary for binary extensions.
Hopefully this will help get us towards using alternative build
backends. The one concern is that this approach depends on
implementation details of where the build backend puts the files
into the directory tree before copying them into the wheel. If
the build backend moved the linked binaries after building them,
we would be in trouble. Or worse if the build backend deleted the
files after packing them into the wheel, we'd be in trouble.
Basically, it seems clear that we can only support alternative
build backends on a case-by-case basis.
@henryiii I'm curious what your thoughts are cross compilation + arbitrary build backends.