Skip to content

python3Packages.tensorflow-bin: Use CUDA 12, add Jetson support#334996

Merged
SomeoneSerge merged 8 commits intoNixOS:masterfrom
hacker1024:package/tensorflow-bin-jetson
Sep 19, 2024
Merged

python3Packages.tensorflow-bin: Use CUDA 12, add Jetson support#334996
SomeoneSerge merged 8 commits intoNixOS:masterfrom
hacker1024:package/tensorflow-bin-jetson

Conversation

@hacker1024
Copy link
Member

@hacker1024 hacker1024 commented Aug 16, 2024

Description of changes

NVIDIA provide prebuilt copies of TensorFlow for Jetsons, and these can be used with the existing tensorflow-bin package. Unfortunately, only Python 3.10 builds are available.

No TensorRT support for now. No JAX either, though I'm not sure why that's a dependency in the first place.

tensorflow-bin has also been changed to use CUDA 12 for all platforms, as this is what versions >= 2.15.0 expect (https://www.tensorflow.org/install/source#gpu).

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@NixOS/cuda-maintainers @jyp @abbradar

@github-actions github-actions bot added the 6.topic: python Python is a high-level, general-purpose programming language. label Aug 16, 2024
@hacker1024
Copy link
Member Author

The formatting issues exist already.

@ofborg ofborg bot requested a review from abbradar August 16, 2024 02:37
@ofborg ofborg bot added 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. labels Aug 16, 2024
@hacker1024 hacker1024 marked this pull request as draft August 16, 2024 02:45
@hacker1024 hacker1024 changed the title python3Packages.tensorflow-bin: Add Jetson support python3Packages.tensorflow-bin: Use CUDA 12, add Jetson support Aug 16, 2024
@hacker1024 hacker1024 marked this pull request as ready for review August 16, 2024 04:29
@hacker1024 hacker1024 requested a review from natsukium as a code owner August 16, 2024 04:29
inherit (pkgs.config) cudaSupport;
# https://www.tensorflow.org/install/source#gpu
cudaPackages = pkgs.cudaPackages_11;
cudaPackages = pkgs.cudaPackages_12;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we drop this as cudaPackages is cudaPackges_12 ?

Copy link
Member Author

@hacker1024 hacker1024 Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we could, but as this package is precompiled and hardcoded to use a specific CUDA version (e.g. libcudart.so.12), I'm not sure what the benefit would be.

The only time the CUDA version can be changed without breaking the package is when the package is updated, and when that happens it can be bumped like this. If we use cudaPackages, and that gets bumped without updating this package, this package breaks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess we should get rid of all -bin packages already pin -bin packages to concrete cuda releases

@SomeoneSerge
Copy link
Contributor

@ofborg eval

@ofborg ofborg bot added 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. and removed 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. labels Sep 16, 2024
Comment on lines +114 to +118
for f in tensorflow-*+nv*.whl; do
# e.g. *nv24.07* -> *nv24.7*
mv "$f" "$(sed -E 's/(nv[0-9]+)\.0*([0-9]+)/\1.\2/' <<< "$f")"
done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this logic necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name from NVIDIA does not exactly match the name in the wheel metadata. wheel unpack does not like this inconsistency, and refuses to run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, let's please make a note of this in a code comment placed outside of the nix string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chunk was previously discussed at #334996 (comment). I agree this deserves a better note but also I don't think we should keep blocking this PR, it's been well over a month now

# We keep this binary build for three reasons:
# - the source build doesn't work on Darwin.
# - the source build is currently brittle and not easy to maintain
# - the source build doesn't work on NVIDIA Jetson platforms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue link?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#271077, though will probably be other issues that arise once that is resolved too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's include a link to that in the comment

Copy link
Contributor

@SomeoneSerge SomeoneSerge Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I think it's OK to add the note in a follow-up PR. This discussion is discoverable via git blame as long as github is online too. Besides we should be generally more aggressive about removing stale comments from code

@SomeoneSerge SomeoneSerge merged commit bb84917 into NixOS:master Sep 19, 2024
@SomeoneSerge
Copy link
Contributor

@samuela I hope you don't mind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: python Python is a high-level, general-purpose programming language. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants