Skip to content

Opencv4 tests and blas provider#208351

Merged
Artturin merged 2 commits intoNixOS:stagingfrom
gbtb:opencv4_tests_and_blas_provider
Jan 28, 2023
Merged

Opencv4 tests and blas provider#208351
Artturin merged 2 commits intoNixOS:stagingfrom
gbtb:opencv4_tests_and_blas_provider

Conversation

@gbtb
Copy link
Member

@gbtb gbtb commented Dec 30, 2022

Description of changes

This PR is derived from my attempt to fix #196653 . I've figured out that by default blas provider uses openblas under the hood, so the change was trivial albeit a little ugly because of blas.provider.pname check. Everything seemed fine, but I decided to throw in some unit tests just to be sure I didn't break anything. Initially for testing I overrided blas.provider with mkl, ran tests with it and managed to enable 80% of the opencv's tests. After that I switched blas.provider to openblas, ran tests and ... they broke, hung indefinitely, to be precise 😑 . I tried to set --test_threads=1 , turned off several test groups from testNames but different tests hung anyway. Ok, I've tried to ran tests without my blas.provider changes - result was the same 🤯 . Then I saw singleThreaded in openblas, and soon found link to the openblas docs and matched it with -DWITH_OPENMP=ON flag we use here. I've made experiments with this two flags, and results are quite logical - you can either use singleThreaded openblas, or build opencv with WITH_OPENMP=OFF, and then tests pass. So, I've figured out we have 3 options:

  1. Delete tests and just use blas.provider. Nobody had any complaints so far, right? 🙈
  2. Use singleThreaded openblas
  3. build opencv with WITH_OPENMP=OFF

I've ruled out 1st option, because concurrency bugs could be really hard to spot and reproduce, but when they finally hit you it could cause you a lot of headache. Between 2nd and 3rd option I've intuitively inclined toward 2nd option, because I think that linear algebra calculations should be only a small part of what opencv does. Moreover, it seems that blas support was initially added to fix perfomance bottlenecks, but later optimizations were applied and the need has vanished. But anyway, I decided to enable perf tests as well and compare both variants - in my unscientific benchmark (6C 12T i5-11400F + 32Gb) difference of total time between two versions was negligible. So, I decided that 2nd option is "good enough" (however, documentation on multithreading support of opencv and openblas contain hints to other possible approaches with pluggable MT backends, if anyone interested to dig this rabbit hole even more)

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@ofborg ofborg bot requested review from basvandijk and mdaiter December 30, 2022 12:26
@ofborg ofborg bot added 10.rebuild-darwin: 101-500 This PR causes between 101 and 500 packages to rebuild on Darwin. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 501-1000 This PR causes many rebuilds on Linux and should normally target the staging branches. labels Dec 30, 2022
@gbtb gbtb marked this pull request as draft December 30, 2022 12:53
@gbtb
Copy link
Member Author

gbtb commented Dec 30, 2022

Seems like xvfb-run is unsupported on darwin, and one tests fails on arm 😞

@gbtb gbtb force-pushed the opencv4_tests_and_blas_provider branch from 3a1c153 to 0472758 Compare December 31, 2022 04:14
@ofborg ofborg bot added the 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. label Dec 31, 2022
@gbtb gbtb force-pushed the opencv4_tests_and_blas_provider branch 2 times, most recently from dd72636 to 5b00354 Compare January 1, 2023 02:02
@gbtb gbtb mentioned this pull request Jan 1, 2023
13 tasks
@gbtb gbtb removed the 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. label Jan 1, 2023
@gbtb gbtb force-pushed the opencv4_tests_and_blas_provider branch from 5b00354 to 222fe87 Compare January 1, 2023 05:58
@gbtb
Copy link
Member Author

gbtb commented Jan 1, 2023

Ok, x86_64-darwin first had a timeout while building openblas, 2nd try it had a timeout during run of opencv tests. I suspect something is not right with build agent itself and I would disable tests for this platform, because I don't have a machine to test them myself.

@gbtb gbtb marked this pull request as ready for review January 1, 2023 08:34
@gbtb gbtb requested a review from Artturin January 1, 2023 08:35
Copy link
Member

@Artturin Artturin Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
testRunner = if stdenv.isDarwin then "" else "${xvfb-run}/bin/xvfb-run -a ";
testRunner = if stdenv.isDarwin then "" else "${lib.getExe xvfb-run} -a ";

also for other reviewers, testRunner is only used when build == host so no need to use buildPackages

@gbtb gbtb changed the base branch from master to staging January 3, 2023 11:46
@gbtb gbtb marked this pull request as draft January 4, 2023 07:38
@gbtb gbtb force-pushed the opencv4_tests_and_blas_provider branch from 222fe87 to 9c647f0 Compare January 4, 2023 07:40
@gbtb
Copy link
Member Author

gbtb commented Jan 4, 2023

aarch64-linux failed because of cargo:

error: builder for '/nix/store/ap2shazwiv9lzc6vfk5shbwiqpkab7cp-rustc-1.66.0.drv' failed with exit code 2;
       last 10 log lines:
       >           /nix/store/4xvwn9mz1zkw07nngc7bjzxxb22z5f3i-binutils-2.39/bin/ld: /build/rustc-1.66.0-src/build/aarch64-unknown-linux-gnu/stage0-rustc/aarch64-unknown-linux-gnu/release/deps/librustc_driver-2b892224648aedac.so: undefined reference to `__aarch64_cas1_acq'
       >           collect2: error: ld returned 1 exit status
       >           
       >   = help: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
       >   = note: use the `-l` flag to specify native libraries to link
       >   = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#cargorustc-link-libkindname)
       >
       > error: could not compile `rustc-main` due to previous error
       > Build completed unsuccessfully in 0:04:25
       > make: *** [Makefile:12: all] Error 1
       For full logs, run 'nix log /nix/store/ap2shazwiv9lzc6vfk5shbwiqpkab7cp-rustc-1.66.0.drv'.
....
error: 1 dependencies of derivation '/nix/store/wfdcmgcl1x7cs3fhb0zb82kpzsk1i9pi-cargo-1.66.0.drv' failed to build
error: 1 dependencies of derivation '/nix/store/6h1nmygpy7y53rdpvsjz26snksnm2p30-cargo-build-hook.sh.drv' failed to build
error: 1 dependencies of derivation '/nix/store/0kn3lh8yf25iqykzfx1sh0x7wjiqrh9y-cargo-check-hook.sh.drv' failed to build
error: 3 dependencies of derivation '/nix/store/w52b5361mkq4hbm1vs8qk0gmkb22ra0y-cargo-auditable-0.6.0.drv' failed to build
error: 2 dependencies of derivation '/nix/store/9yy1prql699nha4f3wlibnnjxf0vz52p-cargo.drv' failed to build
error: 4 dependencies of derivation '/nix/store/q3gmw7p57biyjwj6hz8p1igxv47g6hnl-cargo-auditable-0.6.0.drv' failed to build
error: 2 dependencies of derivation '/nix/store/jjfinj7y42imbsh4axn0d4d4dw7g8gm9-cargo.drv' failed to build
error: 2 dependencies of derivation '/nix/store/b5xhrawxawqaa27bw1cilv7sncbf6hfy-librsvg-2.55.1.drv' failed to build

Not sure what happened with darwin builds, they displayed as queued and no direct link to logs present. But I've found this log link https://logs.nix.ci/?attempt_id=f8174375-f5f7-4aee-be64-1ea47874c035&key=nixos%2Fnixpkgs.208351 which probably is the last commits build, and it failed because of qimgv-1.0.3-alpha not supported on darwin and also because because of llvm build timeout. The same problem with qimv is also present for aarch64-darwin - https://logs.nix.ci/?attempt_id=a5ce9a8a-281c-4a10-a47e-6c7754ee5c51&key=nixos%2Fnixpkgs.208351 . So, looks like that qimv should not be included in darwin builds. But I don't understand how PR with opencv 4.7 and qimv got merged in the first place, it should've failed CI for the same reason ?

@gbtb gbtb marked this pull request as ready for review January 4, 2023 12:08
@gbtb gbtb requested a review from Artturin January 4, 2023 12:08
@Artturin
Copy link
Member

Artturin commented Jan 4, 2023

as far as i can tell the darwin failures are unrelated

@Artturin
Copy link
Member

Artturin commented Jan 4, 2023

this seems to double the build time of opencv so i think it would be better to run these tests in passthru.tests

@gbtb gbtb force-pushed the opencv4_tests_and_blas_provider branch 3 times, most recently from c016192 to 77a0192 Compare January 5, 2023 13:09
@gbtb gbtb marked this pull request as draft January 5, 2023 14:02
@gbtb gbtb force-pushed the opencv4_tests_and_blas_provider branch 2 times, most recently from 3bec2cc to 6781d43 Compare January 7, 2023 12:10
@gbtb
Copy link
Member Author

gbtb commented Jan 8, 2023

Oookey, I've refactored out tests into a separate runCommand derivation. Main derivation now have two outputs - out and package_tests with binaries and some data for tests. I've also remove qimgv from passthru.tests on darwin since it's not available for this platform (though it's still failing because of clang build timeout). I think now it's ready for review. @Artturin Can you look again, pls ?

@gbtb gbtb marked this pull request as ready for review January 8, 2023 00:25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove mdaiter?

Copy link
Member Author

@gbtb gbtb Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He is not active anymore. In the nixpkgs and elsewhere over github, it seems. Last PR in nixpkgs in 2017, no activity in his github profile over the last year. Personally, I think we better to not decieve ourselves and remove maintainers after some period (e.g. a year) of inactivity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment about removing him to the commit msg so people don't think it was an accident

@gbtb gbtb requested a review from Artturin January 9, 2023 01:36
@Artturin Artturin force-pushed the opencv4_tests_and_blas_provider branch from 6781d43 to e3a8995 Compare January 10, 2023 03:50
gbtb added 2 commits January 10, 2023 05:51
removed mdaiter from maintainers, his last PR in nixpkgs was in 2017
@Artturin Artturin force-pushed the opencv4_tests_and_blas_provider branch from e3a8995 to 66efeba Compare January 10, 2023 03:51
@gbtb gbtb requested a review from Artturin January 28, 2023 11:38
@gbtb
Copy link
Member Author

gbtb commented Jan 28, 2023

Hi, @Artturin 👋 . Is there something else left preventing this PR from being merged ? I'd like to make a new release of my Stable Diffusion flake soon, hopefully with this fixes present in nixpkgs 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

10.rebuild-darwin: 101-500 This PR causes between 101 and 500 packages to rebuild on Darwin. 10.rebuild-linux: 501-1000 This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

python3Packages.opencv4 depends on multiple versions of OpenBLAS, segfaults when used with scipy

2 participants