Cherry-pick: Reduce Python and Nuget GPU package size (#26002) #26087

snnn · 2025-09-18T22:19:45Z

Description

The package size limit for PyPI and Nuget are:

python package size under 300MB
Nuget package size under 250MB

To meet the size limit,
this PR firstly removes some old GPU arch support in
CMAKE_CUDA_ARCHITECTURE.
Secondly, it removes the FPA_INTB_GEMM support in Linux Python wheel.

Python wheel

|---------|--------------------------------------------------------|-|-------------|---|
| Linux | 60-real;70-real;75-real;80-real;86-real;90a-real;90a-virtual |
|341 MB |No (original)|
| Linux | 70-real;75-real;80-real;86-real;90a-real;90a-virtual | | 329
MB |No|
| Linux | 75-real;80-real;86-real;90a-real;90a-virtual | |319 MB |No|
| Linux | 80-real;86-real;90a-real;90a-virtual | |304 MB |No|
| Linux | 60-real;70-real;75-real;80-real;86-real;90a-real;90a-virtual.
| FPA_INTB_GEMM|287 MB |Yes|
| Windows | 52-real;61-real;75-real;86-real;89-real;90a-virtual | | 272
MB |Yes (original)|

Nuget

|---------|--------------------------------------------------------|---|--------------|---|
| Linux | 60-real;70-real;75-real;80-real;90a-real;90a-virtual | |276 MB
|No (original)|
| Linux | 75-real;80-real;90a-real;90a-virtual | |253 MB |No|
| Linux | 60-real;70-real;75-real;80-real;90a-real;90a-virtual
|FPA_INTB_GEMM| 230 MB |Yes|
| Windows | 52-real;61-real;75-real;86-real;89-real;90a-virtual || 264
MB |No (original)|
| Windows | 61-real;75-real;86-real;89-real;90a-virtual || 254 MB |No|
| Windows | 75-real;86-real;89-real;90a-virtual || 242 MB |Yes|

Motivation and Context

### Description The package size limit for PyPI and Nuget are: - python package size under 300MB - Nuget package size under 250MB To meet the size limit, this PR firstly removes some old GPU arch support in CMAKE_CUDA_ARCHITECTURE. Secondly, it removes the FPA_INTB_GEMM support in Linux Python wheel. #### Python wheel | OS | cmake_cuda_architecture | CUDA kernel removal |Package size | Under 300MB| |---------|--------------------------------------------------------|-|-------------|---| | Linux | 60-real;70-real;75-real;80-real;86-real;90a-real;90a-virtual | |341 MB |No (original)| | Linux | 70-real;75-real;80-real;86-real;90a-real;90a-virtual | | 329 MB |No| | Linux | 75-real;80-real;86-real;90a-real;90a-virtual | |319 MB |No| | Linux | 80-real;86-real;90a-real;90a-virtual | |304 MB |No| | Linux | 60-real;70-real;75-real;80-real;86-real;90a-real;90a-virtual. | FPA_INTB_GEMM|287 MB |Yes| | Windows | 52-real;61-real;75-real;86-real;89-real;90a-virtual | | 272 MB |Yes (original)| #### Nuget | OS | cmake_cuda_architecture | CUDA kernel removal |Package size |Under 250MB| |---------|--------------------------------------------------------|---|--------------|---| | Linux | 60-real;70-real;75-real;80-real;90a-real;90a-virtual | |276 MB |No (original)| | Linux | 75-real;80-real;90a-real;90a-virtual | |253 MB |No| | Linux | 60-real;70-real;75-real;80-real;90a-real;90a-virtual |FPA_INTB_GEMM| 230 MB |Yes| | Windows | 52-real;61-real;75-real;86-real;89-real;90a-virtual || 264 MB |No (original)| | Windows | 61-real;75-real;86-real;89-real;90a-virtual || 254 MB |No| | Windows | 75-real;86-real;89-real;90a-virtual || 242 MB |Yes| ### Motivation and Context

chilo-ms · 2025-09-19T00:04:10Z

We need to cherry-pick this as well in order to disable FPA_INTB_GEMM kernel.
#25802

### Description Add a build flag to enable/disable mixed gemm cutlass kernel. To disable the kernel, you can append the following at the end of build command line: `--cmake_extra_defines onnxruntime_USE_FPA_INTB_GEMM=OFF` ### Motivation and Context FpA IntB Gemm need a lot of time to compile. With such option, developer can speed up the build especially on build machine with limited memory.

snnn changed the title ~~Reduce Python and Nuget GPU package size (#26002)~~ Cherry-pick: Reduce Python and Nuget GPU package size (#26002) Sep 18, 2025

snnn requested review from chilo-ms and tianleiwu September 18, 2025 22:21

chilo-ms approved these changes Sep 19, 2025

View reviewed changes

tianleiwu approved these changes Sep 19, 2025

View reviewed changes

snnn merged commit 2a034d5 into rel-1.23.0 Sep 19, 2025
73 of 78 checks passed

snnn deleted the snnn/p9 branch September 19, 2025 19:20

This was referenced Sep 19, 2025

[CUDA] Add build flag onnxruntime_USE_FPA_INTB_GEMM #25802

Merged

Reduce Python and Nuget GPU package size #26002

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cherry-pick: Reduce Python and Nuget GPU package size (#26002) #26087

Cherry-pick: Reduce Python and Nuget GPU package size (#26002) #26087

Uh oh!

snnn commented Sep 18, 2025

Uh oh!

chilo-ms commented Sep 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cherry-pick: Reduce Python and Nuget GPU package size (#26002) #26087

Cherry-pick: Reduce Python and Nuget GPU package size (#26002) #26087

Uh oh!

Conversation

snnn commented Sep 18, 2025

Description

Python wheel

Nuget

Motivation and Context

Uh oh!

chilo-ms commented Sep 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants