Run llm stuck while use jetson thor

17816864266 · September 19, 2025, 12:46am

i run gpt-oss:20b step by step in Ollama - NVIDIA Jetson AI Lab

but while talk to llm, always stuck or shows unexpect EOF，how to solve it？in the same time，dmesg shows NVRM: nvAssertFailed: Assertion failed: 0 @ g_kern_bus_nvoc.h:2706

CharlieHsu · September 19, 2025, 3:45am

First, ensure your Jetson is operating at its maximum performance profile. This can prevent bottlenecks related to CPU, GPU, and memory clock speeds.

Bash

sudo jetson_clocks

Second, there appears to be a known issue on some Jetson platforms where Ollama may not release memory correctly, leading to instability. To mitigate this, you can manually clear the system’s page cache, dentries, and inodes right before running the model. This command will free up cached memory.

Bash

sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

Recommended Workflow:

Run sudo jetson_clocks once after booting up.
Right before you start the model with ollama run ..., execute the drop_caches command to ensure maximum available memory.

Hope this helps solve the issue!

17816864266 · September 19, 2025, 5:31am

thx, i ll try

AastaLLL · September 19, 2025, 5:36am

Hi,

Thanks for reporting this issue.

We are checking it internally.
Will provide more information with you later.

AastaLLL · September 22, 2025, 10:46am

Hi,

Thanks for your patience.
We did some test but not able to reproduce this issue in our environment.

Our Thor is set up with JetPack 7.0 + this ollama container.
The test of gpt-oss:20b runs smoothly and normally.

$ sudo docker run --ipc=host --net=host --gpus=all --runtime=nvidia --privileged -it --rm -u 0:0 --name=ollama ghcr.io/nvidia-ai-iot/ollama:r38.2.arm64-sbsa-cu130-24.04

root@tegra-ubuntu:/# ollama run gpt-oss:20b
pulling manifest
...
success
>>> test
Thinking...
The user just sent "test". They might be testing. Likely they want a response confirming receipt. We should respond politely and ask if they need help.
...done thinking.

Got it! How can I assist you today?

>>> Which number is larger, 9.11 or 9.8?
Thinking...
We need to answer which is larger: 9.11 or 9.8. Clearly 9.8 is larger.
...done thinking.

9.8 is larger.

>>> hello
Thinking...
User says "hello". They didn't ask a question. We should respond in a friendly manner. Possibly ask how to help.
...done thinking.

Hi there! How can I help you today?

>>> Send a message (/? for help)

In case there are some issues when downloading the model, could you re-run the model to see if it can work?

Thanks.

alexjao · September 24, 2025, 5:21am

I have found the following would also work

sudo jetson_clocks
sudo sh -c ‘echo 3 > /proc/sys/vm/drop_caches’
sudo systemctl restart ollama
free -h

AastaLLL · September 25, 2025, 6:48am

Hi,

Thanks for your information.

Just for your reference.
We also have cache cleaner script in the VSS release:

github.com/NVIDIA-AI-Blueprints/video-search-and-summarization

deploy/scripts/sys_cache_cleaner.sh

main

######################################################################################################
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
######################################################################################################

set -e

echo "disable vm/nr_hugepage"

This file has been truncated. show original

Thanks.

system · October 22, 2025, 1:46am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to control amount of shared memory available to LLM on Jetson Thor? Jetson Thor generative_ai	20	529	November 10, 2025
Llama3.2:3b randomly outputting "GGGGGGGG" when running under ollama on Jetson Orin Nano Super (JP6.2) Jetson Orin Nano generative_ai	40	669	December 12, 2025
Ollama errors orin nano Jetson Orin NX generative_ai	29	1057	December 16, 2025
Thor ollama[7754]: CUDA error: an internal operation failed Jetson Thor llama	4	387	September 24, 2025
Jetson thor: run qwen2.5vl by ollama can't on GPU, only cpu Jetson Thor generative_ai	7	314	October 8, 2025
"unable to allocate CUDA0 buffer" after Updating Ubuntu Packages Jetson Orin Nano cuda , jetson , generative_ai , llama	152	5673	December 15, 2025
Ollama and Jetson issue Jetson Orin NX jetson-inference , generative_ai	12	6017	March 20, 2024
Ollama in docker causing graphics exceptions and bad responses Jetson Thor generative_ai	8	88	November 26, 2025
Introducing Ollama Support for Jetson Devices Jetson Projects cuda , natural-language-processing-nlp , artificialintelligence , interactive , docker-machine-learning , generative_ai	29	13061	August 28, 2024
Ollama 0.4.2 released and runs on Nvidia Jetson Orin AGX 64 Jetson AGX Orin generative_ai , llama	9	1856	November 21, 2024

Run llm stuck while use jetson thor

Related topics