Skip to content

Conversation

@snnn
Copy link
Contributor

@snnn snnn commented Sep 18, 2025

Description

The package size limit for PyPI and Nuget are:

  • python package size under 300MB
  • Nuget package size under 250MB

To meet the size limit,
this PR firstly removes some old GPU arch support in
CMAKE_CUDA_ARCHITECTURE.
Secondly, it removes the FPA_INTB_GEMM support in Linux Python wheel.

Python wheel

| OS | cmake_cuda_architecture | CUDA kernel removal |Package size |
Under 300MB|

|---------|--------------------------------------------------------|-|-------------|---|
| Linux | 60-real;70-real;75-real;80-real;86-real;90a-real;90a-virtual |
|341 MB |No (original)|
| Linux | 70-real;75-real;80-real;86-real;90a-real;90a-virtual | | 329
MB |No|
| Linux | 75-real;80-real;86-real;90a-real;90a-virtual | |319 MB |No|
| Linux | 80-real;86-real;90a-real;90a-virtual | |304 MB |No|
| Linux | 60-real;70-real;75-real;80-real;86-real;90a-real;90a-virtual.
| FPA_INTB_GEMM|287 MB |Yes|
| Windows | 52-real;61-real;75-real;86-real;89-real;90a-virtual | | 272
MB |Yes (original)|

Nuget

| OS | cmake_cuda_architecture | CUDA kernel removal |Package size
|Under 250MB|

|---------|--------------------------------------------------------|---|--------------|---|
| Linux | 60-real;70-real;75-real;80-real;90a-real;90a-virtual | |276 MB
|No (original)|
| Linux | 75-real;80-real;90a-real;90a-virtual | |253 MB |No|
| Linux | 60-real;70-real;75-real;80-real;90a-real;90a-virtual
|FPA_INTB_GEMM| 230 MB |Yes|
| Windows | 52-real;61-real;75-real;86-real;89-real;90a-virtual || 264
MB |No (original)|
| Windows | 61-real;75-real;86-real;89-real;90a-virtual || 254 MB |No|
| Windows | 75-real;86-real;89-real;90a-virtual || 242 MB |Yes|

Motivation and Context

### Description
The package size limit for PyPI and Nuget are:
- python package size under 300MB
- Nuget package size under 250MB

To meet the size limit, 
this PR firstly removes some old GPU arch support in
CMAKE_CUDA_ARCHITECTURE.
Secondly, it removes the FPA_INTB_GEMM support in Linux Python wheel.


#### Python wheel


| OS | cmake_cuda_architecture | CUDA kernel removal |Package size |
Under 300MB|

|---------|--------------------------------------------------------|-|-------------|---|
| Linux | 60-real;70-real;75-real;80-real;86-real;90a-real;90a-virtual |
|341 MB |No (original)|
| Linux | 70-real;75-real;80-real;86-real;90a-real;90a-virtual | | 329
MB |No|
| Linux | 75-real;80-real;86-real;90a-real;90a-virtual | |319 MB |No|
| Linux   | 80-real;86-real;90a-real;90a-virtual   | |304 MB       |No|
| Linux | 60-real;70-real;75-real;80-real;86-real;90a-real;90a-virtual.
| FPA_INTB_GEMM|287 MB |Yes|
| Windows | 52-real;61-real;75-real;86-real;89-real;90a-virtual | | 272
MB |Yes (original)|

#### Nuget


| OS | cmake_cuda_architecture | CUDA kernel removal |Package size
|Under 250MB|

|---------|--------------------------------------------------------|---|--------------|---|
| Linux | 60-real;70-real;75-real;80-real;90a-real;90a-virtual | |276 MB
|No (original)|
| Linux   | 75-real;80-real;90a-real;90a-virtual   | |253 MB       |No|
| Linux | 60-real;70-real;75-real;80-real;90a-real;90a-virtual
|FPA_INTB_GEMM| 230 MB |Yes|
| Windows | 52-real;61-real;75-real;86-real;89-real;90a-virtual || 264
MB |No (original)|
| Windows | 61-real;75-real;86-real;89-real;90a-virtual || 254 MB |No|
| Windows | 75-real;86-real;89-real;90a-virtual    || 242 MB       |Yes|


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
@snnn snnn changed the title Reduce Python and Nuget GPU package size (#26002) Cherry-pick: Reduce Python and Nuget GPU package size (#26002) Sep 18, 2025
@snnn snnn requested review from chilo-ms and tianleiwu September 18, 2025 22:21
@chilo-ms
Copy link
Contributor

We need to cherry-pick this as well in order to disable FPA_INTB_GEMM kernel.
#25802

### Description

Add a build flag to enable/disable mixed gemm cutlass kernel.

To disable the kernel, you can append the following at the end of build
command line:
`--cmake_extra_defines onnxruntime_USE_FPA_INTB_GEMM=OFF`

### Motivation and Context

FpA IntB Gemm need a lot of time to compile. With such option, developer
can speed up the build especially on build machine with limited memory.
@snnn snnn merged commit 2a034d5 into rel-1.23.0 Sep 19, 2025
73 of 78 checks passed
@snnn snnn deleted the snnn/p9 branch September 19, 2025 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants