Add switch to apply fine-grained per token quant fp8 #3192

RichardWooSJTU · 2025-08-04T09:26:46Z

A precision trick for quantization of DeepGEMM. If export PER_TOKEN_QUANT_FP8_USE_FINEGRAINED_RANGE=1, this trick will be applied to all activation quantization of DeepGemm. In detail, a coefficient (constant value which is 7.0) will be multiplied by the maximum value per-token. This is to ensure that the quantized values are more concentrated in the high-precision region of the fp8 data type.

Add switch to apply fine-grained per token quant fp8

12305e8

heavengate approved these changes Aug 4, 2025

View reviewed changes

yuanlehome approved these changes Aug 5, 2025

View reviewed changes

yuanlehome merged commit e39159f into PaddlePaddle:develop Aug 5, 2025
12 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add switch to apply fine-grained per token quant fp8 #3192

Add switch to apply fine-grained per token quant fp8 #3192

Uh oh!

RichardWooSJTU commented Aug 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add switch to apply fine-grained per token quant fp8 #3192

Add switch to apply fine-grained per token quant fp8 #3192

Uh oh!

Conversation

RichardWooSJTU commented Aug 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants