How to deploy the model

whitewizard · November 13, 2025, 3:35am

i fine-tuning a model llama-3.1-1b-instruct-ft with llama-3.1-1b-instruct, and i try to chat with llama-31-1b-instruct-ft, always 502. i need to deploy llama-3.1-1b-instruct-ft by nemo deployment? and how?

shashank.verma · November 14, 2025, 10:31pm

Hi @whitewizard

Which version of Nemo (Microservices?) are you using?

For NeMo Microservices, there is an example here to create a deployment configuration and deploy the model using NeMo Deployment Management Service. While it deploys an embedding model, the steps can be used as a reference.

You can also find out more about NeMo DMS API reference here.

Thanks

whitewizard · November 19, 2025, 1:49am

i upgrade the nmp to laterst version 25.11.0,still error.

i follow this demo ,GenerativeAIExamples/nemo/data-flywheel/tool-calling/2_finetuning_and_inference.ipynb at main · NVIDIA/GenerativeAIExamples

when 3.2 Send an Inference Call to NIM, throw the error

whitewizard · November 19, 2025, 2:00am

pod.log (132.0 KB) and the meta/llama-3.1-1b-instruct nim pod error log

whitewizard · November 19, 2025, 8:20am

the error log sent.

Topic		Replies	Views
How to deploy a fine-tuned model with orgin model? NVIDIA NeMo nim , llama	0	24	November 13, 2025
Nemollm-inference-microservice failed to deploy Models nim , llama3-8b-instruct , llama	1	236	October 22, 2024
Deploying a 1.3B GPT-3 Model with NVIDIA NeMo Megatron Technical Blog	3	1048	March 31, 2023
How to Deploy and Run an LLM Designed with the 'NVIDIA NeMo Framework' and 'NVIDIA Megatron' NVIDIA Nemotron nemo	3	458	February 21, 2025
Access large models (405B) with NIM after using all credits for the build.nvidia.com endpoints Access/Accounts nim , nemotron-4-340b-reward , llama-31-405b-instruct , llama	3	287	August 29, 2024
Aunch NVIDIA NIM (llama3-8b-instruct) for LLMs locally Access/Accounts nim , llama3-8b-instruct	3	198	November 8, 2024
NIM Llama3 8B Instruct - Running container with "CUDA_ERROR_NO_DEVICE" cuDNN docker , nim , llama3-8b-instruct	1	101	March 28, 2025
Llama-3.1-70b-instruct Models llama-31-70b-instruct , llama	4	390	December 2, 2024
How to Transfer a LoRA Model from NeMo to NIM After Fine-Tuning with Megatron's Script? Models nemo , nim , llama	4	388	October 9, 2024
Fail to evaluate LLM efficiency using nemo evaluator General cuda , docker , python , inference-server-triton , kubernetes , nim , llama-31-8b-instruct , llama	0	38	August 28, 2024

How to deploy the model

Related topics