Skip to content

Update ollama to 0.3.9 and add +cuda variant#46204

Merged
alalazo merged 8 commits intospack:developfrom
brettviren:develop
Sep 27, 2024
Merged

Update ollama to 0.3.9 and add +cuda variant#46204
alalazo merged 8 commits intospack:developfrom
brettviren:develop

Conversation

@brettviren
Copy link
Copy Markdown
Member

No description provided.

@brettviren
Copy link
Copy Markdown
Member Author

This does build w/ and w/out cuda but I am unable to figure out how to make a model actually use my GPUs (2x 4090). I suspect it has to do with not finding all CUDA libs from the Spack cuda package.

FWIW, I can download the binary from ollama's GitHub release page (which includes its own copy of CUDA libs) and get a model running on the GPU.

@teaguesterling
Copy link
Copy Markdown
Contributor

@spackbot fix style

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Sep 7, 2024

Let me see if I can fix that for you!

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Sep 7, 2024

I was able to run spack style --fix for you!

spack style --fix
==> Running style checks on spack
  selected: isort, black, flake8, mypy
==> Modified files
  var/spack/repos/builtin/packages/ollama/package.py
==> Running isort checks
  isort checks were clean
==> Running black checks
reformatted var/spack/repos/builtin/packages/ollama/package.py
All done! ✨ 🍰 ✨
1 file reformatted.
  black checks were clean
==> Running flake8 checks
  flake8 checks were clean
==> Running mypy checks
lib/spack/spack/version/version_types.py:145: error: Argument 2 to "StandardVersion" has incompatible type "*Tuple[Tuple[Any, ...], Tuple[Any, ...]]"; expected "Tuple[Tuple[Any, ...], Tuple[Any, ...]]"  [arg-type]
lib/spack/spack/version/version_types.py:452: error: Argument 2 to "StandardVersion" has incompatible type "*Tuple[Tuple[Any, ...], Tuple[Any, ...]]"; expected "Tuple[Tuple[Any, ...], Tuple[Any, ...]]"  [arg-type]
lib/spack/spack/version/version_types.py:481: error: Argument 2 to "StandardVersion" has incompatible type "*Tuple[Tuple[Any, ...], Tuple[Any, ...]]"; expected "Tuple[Tuple[Any, ...], Tuple[Any, ...]]"  [arg-type]
Found 3 errors in 1 file (checked 620 source files)
  mypy found errors
Keep in mind that I cannot fix your flake8 or mypy errors, so if you have any you'll need to fix them and update the pull request. If I was able to push to your branch, if you make further changes you will need to pull from your updated branch before pushing again.

I've updated the branch with style fixes.

@teaguesterling
Copy link
Copy Markdown
Contributor

Thanks for taking this on! I was hoping to get back to this and get a GPU build going. I don't have much in the way of good CUDA resource available for testing on my dev system but I'll see if I can help sort out the build details.

@brettviren
Copy link
Copy Markdown
Member Author

I think the CUDA runtime "problem" I had was a mirage due to a mix of wrong expectation from Spack's cuda package and misleading "warning" messages from ollama serve.

I had thought Spack's cuda actually provided libcuda.so but in fact the package.py says:

Note: This package does not currently install the drivers necessary
to run CUDA. These will need to be installed manually. See:
https://docs.nvidia.com/cuda/ for details.

The stubs/libcuda.so that is provided holds dummy implementations to satisfy link-time dependencies on systems that lack a "real" libcuda.so.

With that knowledge in hand, I let ollama serve make use of Debian's libcuda.so and see the GPUs finally being used.

$ spack install ollama+cuda
$ spack load ollama
$ ollama serve
$ ollama run llama3.1
$ nvidia-smi|grep ollama
|    0   N/A  N/A   1097881      C   ...unners/cuda_v12/ollama_llama_server       6142MiB |

There was also some prior confusion due to WARN level messages from ollama serve:

time=2024-09-12T11:56:49.553-04:00 level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries"
time=2024-09-12T11:56:49.553-04:00 level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries"

However, with OLLAMA_DEBUG=true ollama serve, in addition to the WARN I get more comforting DEBUG messages:

time=2024-09-12T11:57:22.128-04:00 level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[libcuda.so* /home/wcwc/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-09-12T11:57:22.131-04:00 level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.555.42.06]
CUDA driver version: 12.5
time=2024-09-12T11:57:22.255-04:00 level=DEBUG source=gpu.go:119 msg="detected GPUs" count=2 library=/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.555.42.06

So in the end, all seems good. A fresh push is immanent.

@teaguesterling
Copy link
Copy Markdown
Contributor

Sorry to keep dragging you along here! I'm not as familiar with Cuda packages as I am other things. I took a bit more time to review this morning and had a few more suggestions:

  • The CudaPackage class actually adds both variant("cuda",...) and depends_on("cuda", when="+cuda", ...) for you, so we can actually remove those. (It also adds a few other things).
  • The package auditing standards appear to have changed (just in the last few days, it seems) and now the setup_build_environment method needs to be added to the Builder class. I've tested this and just moving the method works as expected.

Copy link
Copy Markdown
Contributor

@teaguesterling teaguesterling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to give you the round-around on these changes. I'll open a PR into your branch with them as well to make it easier to implement.

@teaguesterling
Copy link
Copy Markdown
Contributor

All stylistic considerations aside: I was able to sort of confirm that this works for building with Cuda. I have an ancient CUDA-compatible card in a machine. It was able to compile with Cuda support and got as far as detecting that my card was a dinosaur before dropping back to CPU.

Fixing style audits and simplifying dependencies
@teaguesterling
Copy link
Copy Markdown
Contributor

@spackbot fix style

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Sep 16, 2024

Let me see if I can fix that for you!

@teaguesterling
Copy link
Copy Markdown
Contributor

@spackbot rerun pipeline

@teaguesterling
Copy link
Copy Markdown
Contributor

@spackbot rerun pipeline

@brettviren something seems broken in the CI unrelated to the package. Rerunning again in hopes it's resolved.

@teaguesterling
Copy link
Copy Markdown
Contributor

@spackbot rerun pipeline

@spack spack deleted a comment from spackbot-app bot Sep 24, 2024
@spack spack deleted a comment from spackbot-app bot Sep 24, 2024
@spack spack deleted a comment from spackbot-app bot Sep 24, 2024
@teaguesterling
Copy link
Copy Markdown
Contributor

Lgtm!

@alalazo alalazo merged commit 3637c08 into spack:develop Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants