Better GPU for training & Inference & Execution LLModels

tatanrhf · November 29, 2023, 11:32am

Description

Dear Community,

I’m quite new in this Pre-Trained DL Transformers worlds and Im looking for the proper GPU for my Workstations to train and use LLModels for various purposes.

Im not expert in hardware and basically I need your help for choosing the proper GPU for my needs. I want to train and use open source models like these:

LLama ccp- Vicuna 13B
LLama 2 70B
Zephyr 7B
Mistral 7B
Claudes
GPTs
Wizard

But I don’t know which criteria apply to choose the propse GPU with the proper ratio performance / prize.

Simple questions:

Nvidia Titan, Quadro, Tesla, GeForce?
8GB, 16GB, 24GB or more (I would like to even train models with 70B)
Which parameters I need to take caregfully into account?

Im quite lost yet here, and I would appreciate someone expert in this to shed light on these shadows.

Thanks so much in advance

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

AakankshaS · November 30, 2023, 3:30pm

Hi,

Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:

Thanks!

Topic		Replies	Views
Recommend Compute for running a TensorRT-LLM using LLama2 13B & 70B model TensorRT	2	1110	November 15, 2023
Is it possible to deploy the Llama-70b model with TensorRT LLM on an L40S GPU? TensorRT tensorrt , ubuntu , inference-server-triton	2	658	May 30, 2024
Forum Bot Test TensorRT cudnn	0	10	December 9, 2025
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available Technical Blog	8	1922	January 25, 2024
NVIDIA TensorRT-LLM 및 NVIDIA Triton Inference Server로 Meta Llama 3 성능 강화 Technical Blog - South Korea	1	334	May 3, 2024
BIggest Latency in TensorRT TensorRT cudnn	1	341	October 19, 2023
Supercharging Llama 3.1 across NVIDIA Platforms Technical Blog	14	339	September 17, 2024
Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server Technical Blog	62	4212	August 28, 2024
How much GPU memory does TensorRT need to convert a model (e.g. Llama 7b with fp16) TensorRT	2	1375	November 4, 2023
TRT LLM for Inference with NVFP4 safetensors slower than LM studio GGUF on the Spark DGX Spark / GB10 tensorrt , llm , llama	5	407	November 15, 2025