How to deploy Nvidia 3.3 70B FP8 model?

bart.michiels1 · February 20, 2025, 7:07pm

Hi, I have the FP16 original version working in the “run everywhere” container. I quantized the model to FP8 and would now like to include this one to a NIM container so I can make everything run like I would run the original FP16. Could you help me out with setting this up, please?

av-nvidia · November 14, 2025, 6:57pm

Llama 3.3 70B Instruct NIM includes FP8 prebuilt engines - Llama-3.3-70b-Instruct | NVIDIA NGC

You can find the full list of supported profiles here: Supported Models for NVIDIA NIM for LLMs — NVIDIA NIM for Large Language Models (LLMs)

Topic		Replies	Views
NIM llama3 deploy fp16 Models nim , llama3-70b-instruct	2	195	July 26, 2024
NIM Llama 3.3 70B requirements Models hw , nim , llama	2	499	March 21, 2025
NIM model quantization Models nim	6	525	September 10, 2024
NIM Llama3 8B Instruct - Running container with "CUDA_ERROR_NO_DEVICE" cuDNN docker , nim , llama3-8b-instruct	1	101	March 28, 2025
NIM does not support llama-3.1-8b-instruct and llama-3.1-70b-instruct on GH200 On-Prem deployment Models nim , llama-31-8b-instruct , llama	1	325	November 7, 2024
How to fix 0 compatible profiles? Where to get compatible profiles? Models nim , llama-31-8b-instruct , llama	4	665	November 26, 2024
Can I use models like Llama-3-8B-Instruct-Coder with NIM? Models nim , llama	1	119	September 20, 2024
How to set dtype for NVIDIA NIM 3-8b? Models nim , llama3-8b-instruct	2	33	November 20, 2025
Reusing a stored model (llama-3.1-8b-instruct) with a proper profile Models nim , llama-31-8b-instruct , llama	0	208	October 30, 2024
Livestream Thursday, July 17 : Simplify Deployment for a World of LLMs with NVIDIA NIM NVIDIA NIM nim	0	66	July 14, 2025

How to deploy Nvidia 3.3 70B FP8 model?

Related topics