Model Quantization Techniques#
๐ Overview#
LightX2V supports quantized inference for DIT, T5, and CLIP models, reducing memory usage and improving inference speed by lowering model precision.
๐ง Quantization Modes#
Quantization Mode |
Weight Quantization |
Activation Quantization |
Compute Kernel |
Supported Hardware |
|---|---|---|---|---|
|
FP8 channel symmetric |
FP8 channel dynamic symmetric |
H100/H200/H800, RTX 40 series, etc. |
|
|
INT8 channel symmetric |
INT8 channel dynamic symmetric |
A100/A800, RTX 30/40 series, etc. |
|
|
FP8 channel symmetric |
FP8 channel dynamic symmetric |
H100/H200/H800, RTX 40 series, etc. |
|
|
INT8 channel symmetric |
INT8 channel dynamic symmetric |
A100/A800, RTX 30/40 series, etc. |
|
|
FP8 channel symmetric |
FP8 channel dynamic symmetric |
RTX 40 series, L40S, etc. |
|
|
INT8 channel symmetric |
INT8 channel dynamic symmetric |
RTX 40 series, L40S, etc. |
|
|
INT8 channel symmetric |
INT8 channel dynamic symmetric |
A100/A800, RTX 30/40 series, etc. |
|
|
INT4 group symmetric |
FP16 |
H200/H800/A100/A800, RTX 30/40 series, etc. |
|
|
FP8 block symmetric |
FP8 group symmetric |
H100/H200/H800, RTX 40 series, etc. |
๐ง Obtaining Quantized Models#
Method 1: Download Pre-Quantized Models#
Download pre-quantized models from LightX2V model repositories:
DIT Models
Download pre-quantized DIT models from Wan2.1-Distill-Models:
# Download DIT FP8 quantized model
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
--local-dir ./models \
--include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors"
Encoder Models
Download pre-quantized T5 and CLIP models from Encoders-LightX2V:
# Download T5 FP8 quantized model
huggingface-cli download lightx2v/Encoders-Lightx2v \
--local-dir ./models \
--include "models_t5_umt5-xxl-enc-fp8.pth"
# Download CLIP FP8 quantized model
huggingface-cli download lightx2v/Encoders-Lightx2v \
--local-dir ./models \
--include "models_clip_open-clip-xlm-roberta-large-vit-huge-14-fp8.pth"
Method 2: Self-Quantize Models#
For detailed quantization tool usage, refer to: Model Conversion Documentation
๐ Using Quantized Models#
DIT Model Quantization#
Supported Quantization Modes#
DIT quantization modes (dit_quant_scheme) support: fp8-vllm, int8-vllm, fp8-sgl, int8-sgl, fp8-q8f, int8-q8f, int8-torchao, int4-g128-marlin, fp8-b128-deepgemm
Configuration Example#
{
"dit_quantized": true,
"dit_quant_scheme": "fp8-sgl",
"dit_quantized_ckpt": "/path/to/dit_quantized_model" // Optional
}
๐ก Tip: When thereโs only one DIT model in the scriptโs
model_path,dit_quantized_ckptdoesnโt need to be specified separately.
T5 Model Quantization#
Supported Quantization Modes#
T5 quantization modes (t5_quant_scheme) support: int8-vllm, fp8-sgl, int8-q8f, fp8-q8f, int8-torchao
Configuration Example#
{
"t5_quantized": true,
"t5_quant_scheme": "fp8-sgl",
"t5_quantized_ckpt": "/path/to/t5_quantized_model" // Optional
}
๐ก Tip: When a T5 quantized model exists in the scriptโs specified
model_path(such asmodels_t5_umt5-xxl-enc-fp8.pthormodels_t5_umt5-xxl-enc-int8.pth),t5_quantized_ckptdoesnโt need to be specified separately.
CLIP Model Quantization#
Supported Quantization Modes#
CLIP quantization modes (clip_quant_scheme) support: int8-vllm, fp8-sgl, int8-q8f, fp8-q8f, int8-torchao
Configuration Example#
{
"clip_quantized": true,
"clip_quant_scheme": "fp8-sgl",
"clip_quantized_ckpt": "/path/to/clip_quantized_model" // Optional
}
๐ก Tip: When a CLIP quantized model exists in the scriptโs specified
model_path(such asmodels_clip_open-clip-xlm-roberta-large-vit-huge-14-fp8.pthormodels_clip_open-clip-xlm-roberta-large-vit-huge-14-int8.pth),clip_quantized_ckptdoesnโt need to be specified separately.
Performance Optimization Strategy#
If memory is insufficient, you can combine parameter offloading to further reduce memory usage. Refer to Parameter Offload Documentation:
Wan2.1 Configuration: Refer to offload config files
Wan2.2 Configuration: Refer to wan22 config files with
4090suffix