Skip to content

Conversation

@Sunny-bot1
Copy link
Collaborator

@Sunny-bot1 Sunny-bot1 commented Aug 22, 2025

描述

Matche 为 vLLM 开发的一个基于 CUTLASS 的高性能 GEMM 算子,优化内核继承自 Marlin ,并针对 Hopper 架构进行了优化。

本PR为 machete wint4 kernel 适配 paddle 接口,实现 wint4 dense gemm 性能提升。

使用方式

目前仅对 machete wint4 kernel做了适配和验证,所以仅适用于--quantization wint4

通过环境变量FD_USE_MACHETE控制是否使用machete kernel,默认不开启,值为 0。

执行 machete kernel 需要同时满足:

  1. sm_version=90
  2. --quantization wint4
  3. B.shape()[1] % 128 = 0
  4. export FD_USE_MACHETE=1

性能测试

k, n = 7168, 1536

m paddle machete
32 22.598 19.052
64 22.691 23.040
128 22.889 31.066
256 29.305 25.369
512 49.616 27.129
1024 72.947 46.157
2048 138.637 73.427
4096 213.015 136.138

k, n = 2048, 5120

m paddle machete
32 8.511 10.950
64 10.614 16.208
128 16.912 13.465
256 23.571 16.374
512 37.011 24.953
1024 65.542 38.365
2048 106.605 66.925
4096 205.872 116.838

@paddle-bot
Copy link

paddle-bot bot commented Aug 22, 2025

Thanks for your contribution!

@codecov-commenter
Copy link

codecov-commenter commented Aug 26, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@c694fa2). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #3561   +/-   ##
==========================================
  Coverage           ?   59.40%           
==========================================
  Files              ?        3           
  Lines              ?      101           
  Branches           ?       10           
==========================================
  Hits               ?       60           
  Misses             ?       33           
  Partials           ?        8           
Flag Coverage Δ
diff 59.40% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 479c8b8 into PaddlePaddle:develop Aug 28, 2025
14 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants