Skip to content

Conversation

@hariharans29
Copy link
Member

@hariharans29 hariharans29 commented Sep 20, 2025

Description

This is an internal branch dupe of #25255 + some minor cosmetic changes to account for Copilot feedback

Motivation and Context

Improve performance of NCHW Conv - Both grouped convolutions and batched inputs should benefit from this change. For a detailed understanding of perf improvement, please refer to the numbers in #25255.

Credit to @zoeczy and team for this improvement and code change

@hariharans29 hariharans29 changed the title [DO NOT REVIEW OR MERGE] Test PR Internal Dupe of https://github.com/microsoft/onnxruntime/pull/25255/ Oct 13, 2025
@hariharans29 hariharans29 changed the title Internal Dupe of https://github.com/microsoft/onnxruntime/pull/25255/ Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt Oct 13, 2025
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
edgchen1
edgchen1 previously approved these changes Oct 15, 2025
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@hariharans29 hariharans29 merged commit 992c598 into main Oct 16, 2025
92 checks passed
@hariharans29 hariharans29 deleted the hari/mlas_conv_enhancement branch October 16, 2025 17:06
apsonawane pushed a commit that referenced this pull request Oct 17, 2025
…tion opt (#26103)

### Description
This is an internal branch dupe of
#25255 + some minor
cosmetic changes to account for Copilot feedback

### Motivation and Context
Improve performance of NCHW Conv - Both grouped convolutions and batched
inputs should benefit from this change. For a detailed understanding of
perf improvement, please refer to the numbers in
#25255.

Credit to @zoeczy and team for this improvement and code change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <[email protected]>
apsonawane pushed a commit that referenced this pull request Oct 20, 2025
…tion opt (#26103)

### Description
This is an internal branch dupe of
#25255 + some minor
cosmetic changes to account for Copilot feedback

### Motivation and Context
Improve performance of NCHW Conv - Both grouped convolutions and batched
inputs should benefit from this change. For a detailed understanding of
perf improvement, please refer to the numbers in
#25255.

Credit to @zoeczy and team for this improvement and code change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <[email protected]>
apsonawane added a commit that referenced this pull request Oct 21, 2025
Adds the following commits to the release-1.23.2 branch for ORT 1.23.2:

- [TensorRT] Fix DDS output bug during engine update
  - PR: #26272
  - commit id: 00e85dd
- Fix shape inference failure with in-memory external data
   - PR: #26263
   - commit id: d955476
- [CUDA] replace 90a-virtual by 90-virtual for forward compatible 
  - PR: #26230
  - commit id: b58911f
- [QNN-EP] Fix logic flow bug
  - PR: #26148
  - commit id: b282379
- Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread
partition opt
  - PR: #26103
  - commit id: 7362518
- Update qMoE spec to support block quantization
  - PR: #25641
  - commit id: 7a8ffa8
- [VitisAI] add new api to VitisAI to save graph as a string
  - PR: #25602
  - commit id: 3361d72
- [[Build] Lock torch, onnxscript and onnx-ir versions to latest]
  - PR: #26315
  - commit id: ea69c4d

---------

Co-authored-by: Hariharan Seshadri <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: Yateng Hong <[email protected]>
Co-authored-by: Changming Sun <[email protected]>
Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Tianlei Wu <[email protected]>
Co-authored-by: quic-calvnguy <[email protected]>
Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com>
Co-authored-by: yifei410 <[email protected]>
Co-authored-by: yifei <[email protected]>
@apsonawane
Copy link
Contributor

Cherry-picked for 1.23.2. Removing the release tag and adding cherry-pick tag

@apsonawane apsonawane added cherry-picked Cherry-picked for a cherrypicks branch and removed release:1.23.2 labels Oct 21, 2025
JonathanC-ARM pushed a commit to JonathanC-ARM/onnxruntime that referenced this pull request Oct 24, 2025
…ead partition opt (microsoft#26103)

### Description
This is an internal branch dupe of
microsoft#25255 + some minor
cosmetic changes to account for Copilot feedback

### Motivation and Context
Improve performance of NCHW Conv - Both grouped convolutions and batched
inputs should benefit from this change. For a detailed understanding of
perf improvement, please refer to the numbers in
microsoft#25255.

Credit to @zoeczy and team for this improvement and code change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <[email protected]>
fs-eire pushed a commit that referenced this pull request Oct 24, 2025
…tion opt (#26103)

### Description
This is an internal branch dupe of
#25255 + some minor
cosmetic changes to account for Copilot feedback

### Motivation and Context
Improve performance of NCHW Conv - Both grouped convolutions and batched
inputs should benefit from this change. For a detailed understanding of
perf improvement, please refer to the numbers in
#25255.

Credit to @zoeczy and team for this improvement and code change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <[email protected]>
naomiOvad pushed a commit to naomiOvad/onnxruntime that referenced this pull request Nov 2, 2025
…ead partition opt (microsoft#26103)

### Description
This is an internal branch dupe of
microsoft#25255 + some minor
cosmetic changes to account for Copilot feedback

### Motivation and Context
Improve performance of NCHW Conv - Both grouped convolutions and batched
inputs should benefit from this change. For a detailed understanding of
perf improvement, please refer to the numbers in
microsoft#25255.

Credit to @zoeczy and team for this improvement and code change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked Cherry-picked for a cherrypicks branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants