Hi, I have the FP16 original version working in the “run everywhere” container. I quantized the model to FP8 and would now like to include this one to a NIM container so I can make everything run like I would run the original FP16. Could you help me out with setting this up, please?
Llama 3.3 70B Instruct NIM includes FP8 prebuilt engines - Llama-3.3-70b-Instruct | NVIDIA NGC
You can find the full list of supported profiles here: Supported Models for NVIDIA NIM for LLMs — NVIDIA NIM for Large Language Models (LLMs)