Ollama C++ or Python Example

xplanescientist · August 22, 2025, 8:59am

I’m running LLMs with Ollama on my Orin Nano devkit with jetpack 6.2.1. My system info (611 Bytes). I did a Native Install and it was surprisingly seamless compared to other Nvidia projects.

I’m running Ollama on the command line. The following models work very well:

gemma3:4b
llama3.2:3b
phi4-mini-reasoning:latest
falcon3:3b
phi4-mini:3.8b
smollm:1.7b
starcoder2:3b
phi3:3.8b
deepseek-r1:1.5b
qwen3:4b

I want to move away from the command line. I want to run these model inside of a C++ or Python program and feed it prompts inside. Is there a working C++ or Python example for the orin nano?

carolyuu · August 22, 2025, 9:18am

*** Please note that this reply is generated by LLM automatically ***

It appears that you are running various Large Language Models (LLMs) using Ollama on your Jetson Orin Nano devkit with JetPack 6.2.1. You’ve successfully installed Ollama using the Native Install method, which is great to hear.

The models you’ve listed that work well on your setup are:

gemma34b
llama323b
phi4minireasoninglatest
falcon33b
phi4mini38b
smollm17b
starcoder23b
phi338b
deepseekr115b
qwen34b

These models are likely to be performing well due to the optimized performance of the Jetson Orin Nano, which is designed for AI and machine learning workloads.

In comparison to other Nvidia projects, the seamless installation experience you had with Ollama is likely due to the improved package management and dependency resolution in JetPack 6.2.1.

It’s also worth noting that the Jetson Orin Nano has a powerful GPU and a large amount of memory, which enables it to run complex models like LLMs efficiently. The Orin Nano’s GPU is based on the Ampere architecture, which provides a significant boost in performance and power efficiency compared to previous generations.

Overall, it seems that you have a well-optimized setup for running LLMs on your Jetson Orin Nano, and you should be able to achieve good performance with the models you’ve listed. If you have any further questions or need help with optimizing your setup, feel free to ask!

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

AastaLLL · August 25, 2025, 3:38am

Hi,

You can find an example below:

It shows how to interact with services via curl or APIs.

Thanks.

xplanescientist · August 25, 2025, 11:10am

To clarify my original question, I want run the LLMs natively on my orin nano devkit without an internet connection, so no APIs or curl requests. I want to send the prompts to the natively installed LLMs from within a C++ or Python program, rather than thru the command line with ollama. I’m looking for a working C++ or Python example to accomplish this.

My project will run autonomously, so it will not have an internet connection, so no APIs or curl requests. I am NOT using docker/containers to reduce complexity and reduce memory usage. The LLMs are natively installed and currently receive prompts from the command line. Now I need the LLMs to be prompted from a C++ or Python program.

AastaLLL · August 27, 2025, 6:56am

Hi,

In the above tutorial, there is a section about Python API:

from openai import OpenAI
client = OpenAI(
    base_url = os.environ.get('OPENAI_BASE_URL', 'http://0.0.0.0:9000/v1'),
    api_key = 'foo',
)

response = client.chat.completions.create(
    model="google/gemma-3-4b-it",
    ...

The API connects to localhost so it doesn’t need an internet connection.
Does this work for you?

Or maybe you can check Ollama for more python sample:

Thanks.

system · September 24, 2025, 2:04am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Available with Small Language Model on tutorial Jetson Orin Nano generative_ai	3	923	May 3, 2024
LLM on Jetson Nano 4GB B01 Jetson Nano conversational-ai , generative_ai	13	3886	August 12, 2024
Ollama errors orin nano Jetson Orin NX generative_ai	27	1037	December 12, 2025
Nano_LLM or nanollm for Python package? Jetson Orin Nano generative_ai , llama	8	224	May 15, 2025
LLM run on Jetson Orin Nano Jetson Orin Nano	3	12367	October 9, 2023
LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui Jetson Projects generative_ai	86	25583	May 10, 2024
Orin Nano vs. LLM doable or not Jetson Orin Nano generative_ai	2	776	November 4, 2024
Llama3.2:3b randomly outputting "GGGGGGGG" when running under ollama on Jetson Orin Nano Super (JP6.2) Jetson Orin Nano generative_ai	40	661	December 12, 2025
How to run local llm with cuda 10.2 support Jetson Nano generative_ai	5	1824	May 22, 2024
Local_llm vs NanoLLM: Help Getting NanoLLM up & running Jetson Orin Nano generative_ai	7	1259	April 17, 2024

Ollama C++ or Python Example

Related topics