Skip to content

Comments

onnx: Fix build#281011

Merged
happysalada merged 1 commit intoNixOS:masterfrom
cbourjau:fix-onnx-darwin
Jan 20, 2024
Merged

onnx: Fix build#281011
happysalada merged 1 commit intoNixOS:masterfrom
cbourjau:fix-onnx-darwin

Conversation

@cbourjau
Copy link
Contributor

@cbourjau cbourjau commented Jan 14, 2024

The ONNX package is broken since #271586. This is due to protobuf issues and missing dependencies for running the test suite.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added the 6.topic: python Python is a high-level, general-purpose programming language. label Jan 14, 2024
@Scrumplex Scrumplex self-requested a review January 15, 2024 06:40
@Scrumplex
Copy link
Member

@ofborg eval

@Scrumplex Scrumplex requested a review from acairncross January 15, 2024 08:22
@ofborg ofborg bot added 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels Jan 15, 2024
@Scrumplex
Copy link
Member

Result of nixpkgs-review pr 281011 run on x86_64-linux 1

2 packages marked as broken and skipped:
  • piper-train
  • piper-train.dist
37 packages failed to build:
  • aitrack
  • deface
  • deface.dist
  • livecaptions
  • monado
  • obs-studio-plugins.obs-backgroundremoval
  • onnxruntime
  • onnxruntime.dev
  • onnxruntime.dist
  • opencomposite-helper
  • piper-phonemize
  • piper-tts
  • python311Packages.fastembed
  • python311Packages.fastembed.dist
  • python311Packages.faster-whisper
  • python311Packages.faster-whisper.dist
  • python311Packages.ffcv
  • python311Packages.ffcv.dist
  • python311Packages.insightface
  • python311Packages.insightface.dist
  • python311Packages.mmcv
  • python311Packages.mmcv.dist
  • python311Packages.onnxruntime
  • python311Packages.onnxruntime.dist
  • python311Packages.piper-phonemize
  • python311Packages.piper-phonemize.dist
  • python311Packages.pytorch-pfn-extras
  • python311Packages.pytorch-pfn-extras.dist
  • python311Packages.tf2onnx
  • python311Packages.tf2onnx.dist
  • python312Packages.piper-phonemize
  • python312Packages.piper-phonemize.dist
  • rocmPackages.mivisionx (rocmPackages.mivisionx-hip)
  • rocmPackages.mivisionx-cpu
  • rocmPackages.mivisionx-opencl
  • whisper-ctranslate2
  • whisper-ctranslate2.dist
13 packages built:
  • easyocr (python311Packages.easyocr)
  • easyocr.dist (python311Packages.easyocr.dist)
  • python311Packages.onnx
  • python311Packages.onnx.dist
  • python311Packages.onnxconverter-common
  • python311Packages.onnxconverter-common.dist
  • python311Packages.onnxmltools
  • python311Packages.onnxmltools.dist
  • python311Packages.onnxruntime-tools
  • python311Packages.onnxruntime-tools.dist
  • python311Packages.skl2onnx
  • python311Packages.skl2onnx.dist
  • rocmPackages.migraphx

Comment on lines 49 to 50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to propdate both. Also is the python package not already including the cli?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build fails using only the Python package with an absl errors like the following:

In file included from /tmp/nix-build-python3.11-onnx-1.15.0.drv-1/source/.setuptools-cmake-build/onnx/onnx-operators-ml.pb.cc:4:
In file included from /tmp/nix-build-python3.11-onnx-1.15.0.drv-1/source/.setuptools-cmake-build/onnx/onnx-operators-ml.pb.h:24:
In file included from /nix/store/glnr34k9z7qwrhcpnrp77b83h3mgfhiq-protobuf-24.4/include/google/protobuf/io/coded_stream.h:134:
In file included from /nix/store/mid900708j8vgfrkd5yfxn48f7x4pspc-abseil-cpp-20230125.3/include/absl/strings/cord.h:78:
In file included from /nix/store/mid900708j8vgfrkd5yfxn48f7x4pspc-abseil-cpp-20230125.3/include/absl/container/inlined_vector.h:53:
In file included from /nix/store/mid900708j8vgfrkd5yfxn48f7x4pspc-abseil-cpp-20230125.3/include/absl/container/internal/inlined_vector.h:30:
In file included from /nix/store/mid900708j8vgfrkd5yfxn48f7x4pspc-abseil-cpp-20230125.3/include/absl/container/internal/compressed_tuple.h:40:
/nix/store/mid900708j8vgfrkd5yfxn48f7x4pspc-abseil-cpp-20230125.3/include/absl/utility/utility.h:164:12: error: no member named 'in_place_t' in namespace 'std'

Using only protobuf_21 won't work since protobuf is use from Python in the final package.

@happysalada
Copy link
Contributor

@Scrumplex if you have some details on the build failures, I'm curious.

@cbourjau
Copy link
Contributor Author

Thanks for the feedback! There is still something strange going on with protobuf. More context can be found in #281065

@Scrumplex
Copy link
Member

@Scrumplex if you have some details on the build failures, I'm curious.

Some of them were unrelated.

pythonPackages.pytorch-pfn-extras doesn't like the updated version of setuptools

error: builder for '/nix/store/fk18mj5nrqjysbzibzxm7lb6wvawjl0r-python3.11-pytorch-pfn-extras-0.7.4.drv' failed with exit code 1;
       last 10 log lines:
       > writing dependency_links to pytorch_pfn_extras.egg-info/dependency_links.txt
       > writing requirements to pytorch_pfn_extras.egg-info/requires.txt
       > writing top-level names to pytorch_pfn_extras.egg-info/top_level.txt
       > writing manifest file 'pytorch_pfn_extras.egg-info/SOURCES.txt'
       > reading manifest file 'pytorch_pfn_extras.egg-info/SOURCES.txt'
       > adding license file 'LICENSE'
       > writing manifest file 'pytorch_pfn_extras.egg-info/SOURCES.txt'
       >
       > ERROR Missing dependencies:
       >      setuptools<64

rocmPackages.mivisionx-cpu and rocmPackages.mivisionx-hip fail to build

error: builder for '/nix/store/88y8azigm4q0chj4smz9ajkjggdh95wz-mivisionx-cpu-5.7.1.drv' failed with exit code 2;
       last 10 log lines:
       > /build/source/rocAL/rocAL/include/loaders/image/node_fused_jpeg_crop.h:55:14: warning: private field '_num_attempts' is not used [-Wunused-private-field]
       >     unsigned _num_attempts;
       >              ^
       > 1 warning generated.
       > 1 warning generated.
       > 1 warning generated.
       > 2 warnings generated.
       > 2 warnings generated.
       > make[1]: *** [CMakeFiles/Makefile2:399: rocAL/rocAL/CMakeFiles/rocal.dir/all] Error 2
       > make: *** [Makefile:166: all] Error 2

looking at the full log, the following errors appear:

[ 79%] Building CXX object rocAL/rocAL/CMakeFiles/rocal.dir/source/loaders/image/cifar10_data_loader.cpp.o
In file included from /build/source/rocAL/rocAL/source/decoders/video/hardware_video_decoder.cpp:25:
In file included from /build/source/rocAL/rocAL/include/decoders/video/hardware_video_decoder.h:25:
In file included from /build/source/rocAL/rocAL/include/decoders/video/video_decoder.h:31:
In file included from /nix/store/jsj68rdhxkjnl5dz014aalrqz02mk70p-ffmpeg-4.4.4-dev/include/libavutil/imgutils.h:30:
In file included from /nix/store/jsj68rdhxkjnl5dz014aalrqz02mk70p-ffmpeg-4.4.4-dev/include/libavutil/avutil.h:296:
/nix/store/jsj68rdhxkjnl5dz014aalrqz02mk70p-ffmpeg-4.4.4-dev/include/libavutil/common.h:30:2: error: missing -D__STDC_CONSTANT_MACROS / #define __STDC_CONSTANT_MACROS
#error missing -D__STDC_CONSTANT_MACROS / #define __STDC_CONSTANT_MACROS
 ^
[ 80%] Building CXX object rocAL/rocAL/CMakeFiles/rocal.dir/source/loaders/image/image_loader.cpp.o
[ 80%] Building CXX object rocAL/rocAL/CMakeFiles/rocal.dir/source/loaders/image/image_loader_sharded.cpp.o
In file included from /build/source/rocAL/rocAL/source/decoders/video/ffmpeg_video_decoder.cpp:25:
In file included from /build/source/rocAL/rocAL/include/decoders/video/ffmpeg_video_decoder.h:25:
In file included from /build/source/rocAL/rocAL/include/decoders/video/video_decoder.h:31:
In file included from /nix/store/jsj68rdhxkjnl5dz014aalrqz02mk70p-ffmpeg-4.4.4-dev/include/libavutil/imgutils.h:30:
In file included from /nix/store/jsj68rdhxkjnl5dz014aalrqz02mk70p-ffmpeg-4.4.4-dev/include/libavutil/avutil.h:296:
/nix/store/jsj68rdhxkjnl5dz014aalrqz02mk70p-ffmpeg-4.4.4-dev/include/libavutil/common.h:30:2: error: missing -D__STDC_CONSTANT_MACROS / #define __STDC_CONSTANT_MACROS
#error missing -D__STDC_CONSTANT_MACROS / #define __STDC_CONSTANT_MACROS
 ^

onnxruntime fails to build

[ 27%] Building CXX object CMakeFiles/onnxruntime_session.dir/build/source/onnxruntime/core/session/inference_session.cc.o
/build/source/onnxruntime/core/session/environment.cc: In lambda function:
/build/source/onnxruntime/core/session/environment.cc:283:61: error: 'all_tensor_types_with_bfloat' is not a member of 'onnx::OpSchema'
  283 |       std::vector<std::string> all_tensor_types = OpSchema::all_tensor_types_with_bfloat();
      |                                                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[ 27%] Building CXX object CMakeFiles/onnxruntime_framework.dir/build/source/onnxruntime/core/framework/feeds_fetches_manager.cc.o

pythonPackages.mmengine freezes during its checkPhase for me. I let the nixpkgs-review above run for 9 hours on a machine wait a 5800X3D CPU until I finally killed the pytest process

@Scrumplex
Copy link
Member

Apparently, fixing onnxruntime would be as easy as updating the package. This is due to b12ff1d updating pythonPackages.onnx which seemingly removed whatever all_tensor_types_with_bfloat was.

While naively updating onnxruntime does build fine, several tests fail. It seems like we need to wait for onnxruntime 1.17.0 for a definitive fix.

Diff to update onnxruntime to 1.16.3:

diff --git a/pkgs/development/libraries/onnxruntime/default.nix b/pkgs/development/libraries/onnxruntime/default.nix
index 7a8b8570f62c..27d48e0b8c53 100644
--- a/pkgs/development/libraries/onnxruntime/default.nix
+++ b/pkgs/development/libraries/onnxruntime/default.nix
@@ -33,8 +33,8 @@ let
   eigen = fetchFromGitLab {
     owner = "libeigen";
     repo = "eigen";
-    rev = "d10b27fe37736d2944630ecd7557cefa95cf87c9";
-    sha256 = "sha256-Lmco0s9gIm9sIw7lCr5Iewye3RmrHEE4HLfyzRkQCm0=";
+    rev = "e7248b26a1ed53fa030c5c459f7ea095dfd276ac";
+    sha256 = "sha256-uQ1YYV3ojbMVfHdqjXRyUymRPjJZV3WHT36PTxPRius=";
   };
 
   mp11 = fetchFromGitHub {
@@ -78,13 +78,13 @@ let
 in
 stdenv.mkDerivation rec {
   pname = "onnxruntime";
-  version = "1.15.1";
+  version = "1.16.3";
 
   src = fetchFromGitHub {
     owner = "microsoft";
     repo = "onnxruntime";
     rev = "v${version}";
-    sha256 = "sha256-SnHo2sVACc++fog7Tg6f2LK/Sv/EskFzN7RZS7D113s=";
+    sha256 = "sha256-bTW9Pc3rvH+c8VIlDDEtAXyA3sajVyY5Aqr6+SxaMF4=";
     fetchSubmodules = true;
   };
 

@Scrumplex
Copy link
Member

pythonPackages.pytorch-pfn-extras can be patched to remove the version check. Doing so results in a successful build, but yet again failing checks, as it seems to require pythonPackages.onnxruntime for some of its tests now.

So that one also won't work until onnxruntime 1.17.0 releases or someone backports relevant changes

@happysalada
Copy link
Contributor

Thank you for taking the time!

@Zahrun Zahrun mentioned this pull request Jan 17, 2024
12 tasks
@Scrumplex
Copy link
Member

I don't think anything blocks this PR specifically. It should be good to be merged. The other packages require their own fixes

@happysalada happysalada merged commit 85acf99 into NixOS:master Jan 20, 2024
@cbourjau cbourjau deleted the fix-onnx-darwin branch January 20, 2024 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: python Python is a high-level, general-purpose programming language. 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants