Ollama C++ or Python Example

I’m running LLMs with Ollama on my Orin Nano devkit with jetpack 6.2.1. My system info (611 Bytes). I did a Native Install and it was surprisingly seamless compared to other Nvidia projects.

I’m running Ollama on the command line. The following models work very well:

  • gemma3:4b
  • llama3.2:3b
  • phi4-mini-reasoning:latest
  • falcon3:3b
  • phi4-mini:3.8b
  • smollm:1.7b
  • starcoder2:3b
  • phi3:3.8b
  • deepseek-r1:1.5b
  • qwen3:4b

I want to move away from the command line. I want to run these model inside of a C++ or Python program and feed it prompts inside. Is there a working C++ or Python example for the orin nano?

*** Please note that this reply is generated by LLM automatically ***

It appears that you are running various Large Language Models (LLMs) using Ollama on your Jetson Orin Nano devkit with JetPack 6.2.1. You’ve successfully installed Ollama using the Native Install method, which is great to hear.

The models you’ve listed that work well on your setup are:

  1. gemma34b
  2. llama323b
  3. phi4minireasoninglatest
  4. falcon33b
  5. phi4mini38b
  6. smollm17b
  7. starcoder23b
  8. phi338b
  9. deepseekr115b
  10. qwen34b

These models are likely to be performing well due to the optimized performance of the Jetson Orin Nano, which is designed for AI and machine learning workloads.

In comparison to other Nvidia projects, the seamless installation experience you had with Ollama is likely due to the improved package management and dependency resolution in JetPack 6.2.1.

It’s also worth noting that the Jetson Orin Nano has a powerful GPU and a large amount of memory, which enables it to run complex models like LLMs efficiently. The Orin Nano’s GPU is based on the Ampere architecture, which provides a significant boost in performance and power efficiency compared to previous generations.

Overall, it seems that you have a well-optimized setup for running LLMs on your Jetson Orin Nano, and you should be able to achieve good performance with the models you’ve listed. If you have any further questions or need help with optimizing your setup, feel free to ask!

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Hi,

You can find an example below:

It shows how to interact with services via curl or APIs.

Thanks.

To clarify my original question, I want run the LLMs natively on my orin nano devkit without an internet connection, so no APIs or curl requests. I want to send the prompts to the natively installed LLMs from within a C++ or Python program, rather than thru the command line with ollama. I’m looking for a working C++ or Python example to accomplish this.

My project will run autonomously, so it will not have an internet connection, so no APIs or curl requests. I am NOT using docker/containers to reduce complexity and reduce memory usage. The LLMs are natively installed and currently receive prompts from the command line. Now I need the LLMs to be prompted from a C++ or Python program.

Hi,

In the above tutorial, there is a section about Python API:

from openai import OpenAI
client = OpenAI(
    base_url = os.environ.get('OPENAI_BASE_URL', 'http://0.0.0.0:9000/v1'),
    api_key = 'foo',
)

response = client.chat.completions.create(
    model="google/gemma-3-4b-it",
    ...

The API connects to localhost so it doesn’t need an internet connection.
Does this work for you?

Or maybe you can check Ollama for more python sample:

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.