Ollama errors orin nano

I’m running llms in a terminal using ollama on a nvidia jetson nano orin. Sometimes I get weird outputs like repeated 3’s or a repeated < unused31 >

What is wrong? It usually happens when a longer prompt is added (one that works fine on slower hardware like a raspberry pi5). When the models do run, I notice they are faster than on my pi, so the do seem to be using the gpu…

*** Please note that this reply is generated by LLM automatically ***

The issue you’re experiencing with weird outputs, such as repeated characters, when running LLaMA models on your NVIDIA Jetson Nano Orin using ollama, could be related to several factors. Given that the models work fine on slower hardware like a Raspberry Pi 5, it’s likely that the problem is specific to the Jetson Nano Orin or its configuration. Here are a few potential causes and solutions to consider:

  1. GPU Utilization and Memory: The Jetson Nano Orin has a more powerful GPU compared to the Raspberry Pi 5, which could lead to differences in how models are executed. Ensure that the GPU is being utilized properly for the LLaMA models. You can check GPU usage with tools like nvidia-smi. If the GPU memory is insufficient for the model, especially with longer prompts, it might cause errors. Consider optimizing the model or prompt to reduce memory requirements.

  2. Thermal Throttling: The Jetson Nano Orin, like many embedded systems, can throttle performance if it gets too hot. This could lead to unexpected behavior or errors. Ensure your device is properly cooled. You can monitor the temperature with nvidia-smi or tegrastats and adjust your setup or reduce the load if necessary.

  3. Software Compatibility: The issue might be related to the software stack or the specific version of ollama you’re using. Ensure that ollama and any dependencies are up-to-date and compatible with the Jetson Nano Orin. Sometimes, reverting to a previous version or updating to a newer one can resolve compatibility issues.

  4. Model Optimization: LLaMA models are optimized for different hardware configurations. If the model you’re using is not optimized for the Jetson Nano Orin’s GPU architecture, it might lead to inefficiencies or errors. Look for models specifically optimized for NVIDIA GPUs or the Jetson series.

  5. Prompt Length and Complexity: Longer prompts can indeed cause issues, especially if the model is not designed to handle them efficiently. Try breaking down longer prompts into smaller, more manageable parts, or look into models that are known to handle longer input sequences better.

  6. System Configuration: Sometimes, system-level configurations can affect how applications run. Ensure that your Jetson Nano Orin is configured correctly for GPU-accelerated applications. This includes setting up the NVIDIA drivers correctly and ensuring that the system recognizes and utilizes the GPU.

  7. Comparison with Raspberry Pi 5: The Raspberry Pi 5, being slower, might not encounter the same issues due to its lower performance ceiling. However, this also means it might not utilize the GPU in the same way, potentially masking issues that appear on more powerful hardware like the Jetson Nano Orin.

To troubleshoot, you could try the following steps:

  • Monitor System Resources: Use nvidia-smi for GPU usage and top or htop for CPU and memory usage to see if there are any bottlenecks.
  • Adjust Model Parameters: If possible, adjust the model’s parameters to reduce computational requirements.
  • Cooling: Ensure the device is properly cooled.
  • Software Updates: Keep your software stack up-to-date.

If none of these steps resolve the issue, it might be helpful to consult more specific forums or support channels for ollama or the Jetson Nano Orin, as the problem could be more nuanced or specific to your setup.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Hi,

There is a similar known issue before:

Do you install it on the Jetson natively?
If yes, could you try it with the container to see if this can solve your issue as well?

Thanks.

In my case the output is 3333333 of a repeated < unused31>. It does this with several ollama llms.

Good to hear that the docker method seems to solve issues. Is there a clear step by step instruction for this?

I do hope people are working on fixing the issue, because running ollama natively should not be any big task as weaker hardware copes just fine (all be it with slower performance)

The tutorial on https://www.youtube.com/watch?v=R0PjKr4d-gU seems to have all the steps explained.

Hi,

Do you get the issue solved after switching to the container?
Thanks.

Unfortunately I got another < unused 31 > looped response with a longer prompt that does work on weaker hardware…

The other problem is that when the llms do run they make weird statements that hint at them ‘forgetting’ parts of what was input or not applying things they ‘know’. And again, compared to the same models running on another system, these problem occur on only on the jetson, and not in similar circumstances on the other (weaker) hardware like a Pi5.

And finally, installing a slightly bigger model that is stated to work on the hardware it gives errors both natively and in a docker

update, I decided to try a clean install of only the official ubuntu image and only the steps from the working instructions.

Unfortunately, it still gives the same error allthough the llama3.1 I also tried just started to output rather random words and pars of words it tried to organise into sentences

I have the same issue with Ollama on the Nano Super. I also have a system throttled error every time, even though the temp stats are just fine and nothing else is running. But if I quit the Ubuntu GUI (sudo init 3), then launch “ollama serve” and go through SSH, the issue disappears. Running this way is also faster by about 20%.

I’ve seen some improvement in the TUI, but a model like gemma2:9b still has memory errors, but that is a separate issue to this thread.

Is it solved? Same issue happend on my Orin. I tested qwen2.5:7b qwen3:8b llama3:8b deepseek-r1-1.5B, they all have this issue with a long prompt. The only model works is gemma3:1b for some unknown reason

1 Like

Glad you replied because I was getting worried the thread would be archived.

No, I have not seen a solution yet and multiple models from small ones to big ones eventually stop working. On my current install I have not set any swap file, so caching issues there are not at fault.

I hope work is being done to fix this so we can unleash the nano for one of its intended purposes.

The docker solution does not seem to help and was just a workaround to begin with (and as docker itself also uses memory, it might be detrimental)

no weird error messages so far with the older ollama 0.10.0, but the qwen2.5:7b and llama3.2:3b models keep making the same mistake about instructions over and over again even after repeated corrections, so the models seem to lobotomise themselves after a couple of prompts.

For example ask them the number of squares between A3 and d6 on a chessboard and they will start giving different answers (even when prompted to make diagonal steps count as 1). And then they forget what a chessboard is and start hallucinating, for example about square J9 which does not exist!

I will have to restore my backup and forbid any Nvidia/CUDA updates as they make ollama in a docker unusable:

Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model

I even tried installing ollama natively again, but that ALSO gives an error.

Nvidia please provide a fix as this makes the board unusable or unupdatable….

3 Likes

I tried to run ollama natively after the updates:

curl -fsSL https://ollama.com/install.sh | sh

Installing ollama to /usr/local
[sudo] password for orinnano:
Downloading Linux arm64 bundle
######################################################################## 100,0%
Downloading JetPack 6 components
######################################################################## 100,0%
Creating ollama user…
Adding ollama user to render group…
Adding ollama user to video group…
Adding current user to ollama group…
Creating ollama systemd service…
Enabling and starting ollama service…
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
NVIDIA JetPack ready.
The Ollama API is now available at 127.0.0.1:11434.
Install complete. Run “ollama” from the command line.

this seemed promising, however, when running a model I got the same error message:

ollama run gemma:2b
Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer
llama_model_load_from_file_impl: failed to load model

UPDATE: 1

Apparently after rebooting again, some more steps of the updates were implemented (even though there was no message about a reboot being required). After those the native install seems to work. I will try the docker install too after the update.

Update 2:

I tried the docker install. That lead to the error message: Error: 500 Internal Server Error: llama runner process has terminated: error loading model: unable to allocate CUDA0 buffer

But the weird thing is, I tried the docker exec command again for some reason and now it WORKED!

However, when e4xiting one llm model and running another, the CUDA0 error returns. The command ‘Docker restart ollama’ works sometimes, but I had to log out and log in again at other times and even then got the error “Error: 500 Internal Server Error: llama runner process has terminated: cudaMalloc failed: out of memory“ which only disappeared after running the command ‘Docker restart ollama’ again.

Another error sometimes crops up alongside the CUDA0 error: “llama_model_load_from_file_impl: failed to load model“

Weird things are happening with this update!

I can only copy I am seeing the same issues and error messages. Ollama used to be working fine. Now sometime it works, sometimes not. I reinstalled from the orin nano scratch as I was afraid I messed something up but still the same issues after full install.

1 Like

Things have gotten worse.

outside docker only tinyllama works and gemma:2b seems to big and gives errors

inside docker things deteriorated: docker exec -it ollama ollama run gemma2:2b
Error response from daemon: container X is not running

It seems the only way to have this working is to return to the stock OS image, disable updates and use old versions and even then there is some weird behaviour in some llms…. This is NOT what I imagined would be using this board when I bought it!

Hi,

Sorry for the late update.
On Orin Nano, we recommended running a model with weights less than 4B since the memory is limited.

About the CUDA0 buffer error, has your environment been upgraded to r36.4.7 recently?
If yes, does this command work normally when the system was r36.4.4?

docker exec -it ollama ollama run gemma2:2b

Thanks.

In the stock ISO install of your official latest jetpack (downloaded about a month ago) ollama works both outside of docker as well as inside of docker. It did seem more stable when downgrading to the older ollama version as stated before.

After ubuntu updater updates are ollowed to be installed (including several nvidia items as well as docker), things become broken as described and work intermittedly. Even rebooting does not guarantee at least one session to work without the CUDA0 error.

Recently (without any changes or updates) the following error crops up: Error response from daemon: container is not running. I’ve tried removing the ollama docker files and reinstalling.

I looked up the details of the downloaded image file:

jetson-orin-nano-devkit-super-SD-image-JP6.2.1.zip downloaded 11-sep-25 and containing the sd-blob.img file with the date 18-06-25.

To recreate, install that and allow the updates ubuntu shows and follow the above steps to install ollama natively and in a docker