Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method #4115

ckl117 · 2025-09-15T12:23:06Z

Fix noaux_tc cuda Error 700 in CUDAGraph(GLM45-Air)
-- Add input renormalize to specify whether topk normalization is required in noaux_tc op.
-- Support stable sort
Optimize per_token_quant_fp8 kernel performance improved by 50%
Support Wfp8Afp8MoEMethod(weight quant in channel-wise)

python -m fastdeploy.entrypoints.openai.api_server \
    --model ${model_path} \
    --max-model-len 32768 \
    --max-num-seqs 128 \
    --tensor-parallel-size 1 \
    --load_choices "default_v1" \
    --quantization wfp8afp8 \

paddle-bot · 2025-09-15T14:09:00Z

Thanks for your contribution!

…into fp8_quant_2.2

into fp8_quant_2.2

…into fp8_quant_2.2

qingqing01

后续增强下单测对出现bug case 的覆盖

improve per_token_quant_fp8 performance

99d54bf

ckl117 force-pushed the fp8_quant_2.2 branch from 18af3ec to 99d54bf Compare September 15, 2025 14:39

ckl117 added 2 commits September 18, 2025 11:24

support moe wfp8apf8

131c3e1

check glm test

a708767

ckl117 changed the title ~~Optimize per_token_quant_fp8 op~~ Optimize per_token_quant_fp8 OP and Support Wfp8Afp8MoEMethod Sep 18, 2025

ckl117 added 2 commits September 18, 2025 21:00

fix noaux_tc op in cudagraph, support noaux_tc return the correct

aa42bd7

Merge branch 'fp8_quant_2.2' of https://github.com/ckl117/FastDeploy …

171096c

…into fp8_quant_2.2

ckl117 changed the title ~~Optimize per_token_quant_fp8 OP and Support Wfp8Afp8MoEMethod~~ Fix group_topk cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method Sep 19, 2025

check

1b14584

ckl117 changed the title ~~Fix group_topk cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method~~ Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method Sep 19, 2025

Jiang-Jia-Jun and others added 4 commits September 19, 2025 12:55

Merge branch 'release/2.2' into fp8_quant_2.2

98a6d24

check inf and overwrite score in noaux_tc

31d77f6

Merge branch 'release/2.2' of https://github.com/PaddlePaddle/FastDeploy

44bd298

into fp8_quant_2.2

Merge branch 'fp8_quant_2.2' of https://github.com/ckl117/FastDeploy …

33fc2ab

…into fp8_quant_2.2

zhoutianzi666 approved these changes Sep 22, 2025

View reviewed changes

qingqing01 approved these changes Sep 22, 2025

View reviewed changes

Jiang-Jia-Jun merged commit f38b174 into PaddlePaddle:release/2.2 Sep 22, 2025
23 of 27 checks passed

ckl117 mentioned this pull request Sep 24, 2025

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 #4238

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method #4115

Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method #4115

Uh oh!

ckl117 commented Sep 15, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Sep 15, 2025

Uh oh!

qingqing01 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method #4115

Fix noaux_tc cuda Error 700 in CUDAGraph and Add wfp8apf8 moe quant method #4115

Uh oh!

Conversation

ckl117 commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Sep 15, 2025

Uh oh!

qingqing01 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ckl117 commented Sep 15, 2025 •

edited

Loading