LLM inference results?

YearnMar10 · October 17, 2025, 6:26am

Hey,

I am considering acquiring a Xavier agx second hand and use it for LLM inference. Does anyone have any benchmarks? The only thing I found so far was in this forum (llama3.1 8b, about 8.4 tps tg with 13.4 tps pp). Any other benchmarks or experiences would be appreciated!

AastaLLL · October 20, 2025, 2:39am

Hi,

We don’t have LLM benchmark data for Xavier.
Maybe other user can share their experience.

But we do have several scores of Orin and Thor for your reference:

Thanks.

dahai.pon · October 27, 2025, 1:29am

I ran llama.cpp + gpt‑oss‑20b‑Q4_K_M.gguf on the NVIDIA Jetson AGX Xavier, and the test results are as follows:
command:
~/llama.cpp/build/bin/llama-server -m “$selected_gguf” --host 0.0.0.0 --port 1234 -c 12288 -b 256 -ub 128 --flash-attn 0 --no-warmup --jinja -a “$(basename “$selected_gguf” .gguf)”

Context: 1770/12288 (14%) Output: 1681/∞ 12.9 tokens/sec
Conclusion: The context can only be this large; adding more will cause the program to crash.

system · November 19, 2025, 3:05am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Llama.cpp loading Llama 3.1 very slow on Jetson Xavier AGX Jetson AGX Xavier jetson-inference , generative_ai , llama	4	699	November 2, 2024
Seeking Advice on Running Quantized Large Language Models on Jetson AGX Xavier Jetson AGX Xavier generative_ai	2	1048	March 19, 2024
LLMs token/sec Jetson AGX Orin generative_ai	2	1169	April 8, 2024
Problem: slow LLM inference speed on Jetson AGX Orin 64GB Jetson AGX Orin jetson-inference , generative_ai	2	612	April 8, 2025
Can someone tell me how to benchmark LLama_v2_7b model on jetson Orin AGX with different quantization methods? NVIDIA AI Workbench jetson , generative_ai	2	103	April 3, 2025
Running llama3.3 or llama4 on Jetson AGX Orin Developer Kit (64 GB) Jetson AGX Orin generative_ai	8	826	May 12, 2025
The token speed of LLM on Jetson AGX Orin Jetson AGX Orin generative_ai , llm , llama	5	316	October 22, 2025
Running Ollama / llama3.1 on Jetson AGX Xavier 16gb is it possible? how-to? Jetson AGX Xavier generative_ai , llama-31-8b-instruct	8	2593	October 19, 2024
Gen AI Benchmarking: LLMs and VLMs on Jetson Jetson AGX Orin llm	7	90	November 5, 2025
Performance Issues with LLM model on NVIDIA Jetson Orin NX (16GB) Jetson Orin NX generative_ai	2	1262	June 13, 2024

LLM inference results?

Related topics