Skip to content

Conversation

@RichardWooSJTU
Copy link
Collaborator

A precision trick for quantization of DeepGEMM. If export PER_TOKEN_QUANT_FP8_USE_FINEGRAINED_RANGE=1, this trick will be applied to all activation quantization of DeepGemm. In detail, a coefficient (constant value which is 7.0) will be multiplied by the maximum value per-token. This is to ensure that the quantized values are more concentrated in the high-precision region of the fp8 data type.

@yuanlehome yuanlehome merged commit e39159f into PaddlePaddle:develop Aug 5, 2025
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants