Skip to content

Comments

python3.pkgs.mmcv: fix build#251132

Closed
benxiao wants to merge 1 commit intoNixOS:masterfrom
benxiao:rx/mmcv-fix-build
Closed

python3.pkgs.mmcv: fix build#251132
benxiao wants to merge 1 commit intoNixOS:masterfrom
benxiao:rx/mmcv-fix-build

Conversation

@benxiao
Copy link
Contributor

@benxiao benxiao commented Aug 24, 2023

Description of changes

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added the 6.topic: python Python is a high-level, general-purpose programming language. label Aug 24, 2023
@benxiao benxiao requested a review from happysalada August 24, 2023 10:10
@benxiao benxiao marked this pull request as draft August 24, 2023 10:23
@ofborg ofborg bot added 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. labels Aug 24, 2023
@happysalada
Copy link
Contributor

the build is working without this change (on linux-x86_64), could you elaborate a little more what is broken ?

@benxiao
Copy link
Contributor Author

benxiao commented Aug 25, 2023

There are two things broken
one is similiar to #251151 (comment)

the other one there is a collision of dependency for protobuf. one from torch and another from onnxruntime. torch uses v4, and onnxruntime uses v3. and the torch ci currently uses v3. thinking we should downgrade torch to use v3 because of that but that might cause other packages to break. not sure what is the best way to fix it. any advice is welcome. @happysalada

@happysalada
Copy link
Contributor

How can i trigger the failure locally ? Does the failure only happen on staging next or another branch ?

@benxiao
Copy link
Contributor Author

benxiao commented Aug 25, 2023

git checkout master (430b94c3a247de57e76abcf4a6c851748e075533)
nix-build -A python3.pkgs.mmcv
  File "/nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
    build_ext.build_extensions(self)
  File "/nix/store/fhy4qq6d1xkcxybf0q63b0ryszxj74sm-python3.10-setuptools-67.4.0/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
    self._build_extensions_serial()
  File "/nix/store/fhy4qq6d1xkcxybf0q63b0ryszxj74sm-python3.10-setuptools-67.4.0/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
    self.build_extension(ext)
  File "/nix/store/fhy4qq6d1xkcxybf0q63b0ryszxj74sm-python3.10-setuptools-67.4.0/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "/nix/store/fhy4qq6d1xkcxybf0q63b0ryszxj74sm-python3.10-setuptools-67.4.0/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
    objects = self.compiler.compile(
  File "/nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
/nix/store/fmv92wqvryrn9xmd3farz5lb03z65z46-stdenv-linux/setup: line 1596: pop_var_context: head of shell_variables not a function context
error: builder for '/nix/store/kdivya8yszggy1f7kw4dj4g05fv3yfcb-python3.10-mmcv-2.0.0.drv' failed with exit code 1;
       last 10 log lines:
       >   File "/nix/store/fhy4qq6d1xkcxybf0q63b0ryszxj74sm-python3.10-setuptools-67.4.0/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
       >     objects = self.compiler.compile(
       >   File "/nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
       >     _write_ninja_file_and_compile_objects(
       >   File "/nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
       >     _run_ninja_build(
       >   File "/nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
       >     raise RuntimeError(message) from e
       > RuntimeError: Error compiling objects for extension
       > /nix/store/fmv92wqvryrn9xmd3farz5lb03z65z46-stdenv-linux/setup: line 1596: pop_var_context: head of shell_variables not a function context
       For full logs, run 'nix log /nix/store/kdivya8yszggy1f7kw4dj4g05fv3yfcb-python3.10-mmcv-2.0.0.drv'.

FAILED: /build/source/build/temp.linux-x86_64-cpython-310/mmcv/ops/csrc/pytorch/pybind.o 
g++ -MMD -MF /build/source/build/temp.linux-x86_64-cpython-310/mmcv/ops/csrc/pytorch/pybind.o.d -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -I/nix/store/6683g7wm2yd44vybf580xfxxh09vjp9b-libxcrypt-4.4.36/include -fPIC -I/build/source/mmcv/ops/csrc/common -I/nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/include -I/nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/include/TH -I/nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/include/THC -I/nix/store/wlxpsdzfvdanfzh704qmgyzb42qvy4fr-python3-3.10.12/include/python3.10 -c -c /build/source/mmcv/ops/csrc/pytorch/pybind.cpp -o /build/source/build/temp.linux-x86_64-cpython-310/mmcv/ops/csrc/pytorch/pybind.o -std=c++14 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=_ext -D_GLIBCXX_USE_CXX11_ABI=1
In file included from /nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/python.h:11,
                 from /nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/include/torch/extension.h:6,
                 from /build/source/mmcv/ops/csrc/pytorch/pybind.cpp:2:
/nix/store/s32f5jgq9mxfd0l7y7kawxxj7yv258pk-python3.10-torch-2.0.1/lib/python3.10/site-packages/torch/include/torch/csrc/Exceptions.h:14:10: fatal error: pybind11/pybind11.h: No such file or directory
   14 | #include <pybind11/pybind11.h>
      |          ^~~~~~~~~~~~~~~~~~~~~

once you add pybind11, compilation will complete, and you will see the protobuf collision. I marked this MR as a draft. because I am not sure how best to deal with the collision with protobuf yet.

@kirillrdy
Copy link
Member

kirillrdy commented Aug 25, 2023

How can i trigger the failure locally ? Does the failure only happen on staging next or another branch ?

@happysalada I can also reproduce the build issue on x86_64-linux

@happysalada
Copy link
Contributor

After that latest merge of staging-next, got it. I wasn't on the latest commit.

I've looked a bit at the build process of onnxruntime and it's intense.
They are on protobuf 21, and the compilation will fail trying any tricks to get a different version.
I'd love to be proven wrong here, but I'm guessing we have to wait until onnxruntime updates their dependencies.

I think we can merge the pybind11 addition in the meantime. It doesn't fix the build completely, but it will be required in the future no matter what.

@benxiao
Copy link
Contributor Author

benxiao commented Aug 25, 2023

what about downgrading torch to use protobuf3_21, as it is the version used in their CI.

@happysalada
Copy link
Contributor

looking a bit further, it looks like onnx has switched to protobuf python 4
https://github.com/onnx/onnx/pulls?q=is%3Apr+protobuf+is%3Aclosed
however trying to build master with protobuf 4 fails on some weird abseil error (after adding abseil to the list of dependencies).

downgrading the torch propagated build input might break dependents. I don't know how many it will break though, or if it will break any. It feels like that change to downgrade the version is a little backwards, but it might be an acceptable temporary solution.

@happysalada
Copy link
Contributor

here is the ci reference for torch https://github.com/pytorch/pytorch/blob/c2ac0da445cfe3d848342926f9cd4422bd35bfe2/.ci/docker/requirements-ci.txt#L133
(better to put it here for reference).

@cbourjau cbourjau mentioned this pull request Oct 2, 2023
12 tasks
@wegank wegank added 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 2.status: merge conflict This PR has merge conflicts with the target branch labels Mar 19, 2024
@kirillrdy
Copy link
Member

builds at 6814225

nix-build -A python3.pkgs.mmcv
/nix/store/lhbqkvii64kppvi4ffxj0dsq0yppyfsv-python3.11-mmcv-2.1.0

@kirillrdy kirillrdy closed this Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.status: merge conflict This PR has merge conflicts with the target branch 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: python Python is a high-level, general-purpose programming language. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 11.by: package-maintainer This PR was created by a maintainer of all the package it changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants