Prefer DirectML for Windows ONNX transcription models by ferologics · Pull Request #985 · cjpais/Handy

ferologics · 2026-03-09T19:39:49Z

Summary

patch Handy's transcribe-rs dependency to a forked git revision with Windows DirectML support for ONNX models
prefer DirectMLExecutionProvider on Windows, with explicit CPU fallback if provider registration fails
log whether DirectML registration succeeded or fell back to CPU
clean a few existing Rust warnings touched during validation

Validation

cargo check
cargo check --release
launched the built Handy app locally and confirmed handy.log shows successful DirectML registration for the Parakeet ONNX sessions
ran a local dev-build transcription successfully after the change
benchmarked the patched Parakeet path on a real Handy recording (244.38s audio): 35.368s before vs 6.99s after (~5.1x faster, ~35x realtime)

Dependency patch

Handy now patches transcribe-rs to this git revision:
- ferologics/transcribe-rs@c56480687127070f456ae462d73c5defe964d807
Upstream PR for the current transcribe-rs mainline:
- Prefer DirectML for Windows ONNX sessions transcribe-rs#53

github-actions · 2026-03-10T01:14:45Z

🧪 Test Build Ready

Build artifacts for PR #985 are available for testing.

Download artifacts from workflow run

Artifacts expire after 30 days.

cjpais · 2026-03-10T04:43:01Z

@ferologics can you see if this helps the inference speed for you still, would be quite curious if it just goes back to CPU or works out the box. I kind of think since it's direct ML it might just work on Win 11, would be curious about Win10 too

ferologics · 2026-03-10T11:13:16Z

Tested the CI-built Windows artifact locally and it looks good.

What I checked:

downloaded handy-pr-985-x86_64-pc-windows-msvc from run 22882036165
extracted the MSI payload and confirmed the packaged app includes DirectML.dll
launched the packaged handy.exe
triggered a real start/stop transcription cycle against my normal Handy setup

Result:

handy.log shows successful DirectML registration in the packaged build:
- ONNX Runtime session registered DirectMLExecutionProvider on Windows (device 0) with CPU fallback enabled
I saw that log for the Parakeet encoder / decoder / nemo sessions
I did not see the CPU fallback warning

So on my Windows 11 machine the CI-built artifact is still taking the intended GPU path, not silently dropping back to CPU.

cjpais · 2026-03-10T11:43:50Z

Solid. This is amazing news. I will test on my windows machine when I can and see how it goes as well. I'm curious how this will play with integrated GPUs

I am slightly wondering if we will need to provide an option to disable this. Just in case CPU is faster for someone. I know another PR in transcribe rs had something like this. Might be worth considering

ferologics · 2026-03-12T15:40:28Z

Good callout. I agree an opt-out could be useful just in case CPU ends up better for some setups.

I’m not sure it needs to be tackled in this PR unless you think it’s important for landing it — happy to add it if you feel it’s essential, otherwise we can keep this one focused and follow up separately.

cjpais · 2026-03-13T07:57:01Z

Let me think about it, I want to give this a test myself on my machine and go from there. I will be able to test probably tomorrow

cjpais · 2026-03-13T10:44:02Z

Okay I gave this a quick run. We definitely need a toggle before shipping this. The default should be off. It probably should be in experimental settings. Possibly, it should be a dropdown since we may add CUDA, etc to it in the future. Not sure exactly how we are going to handle this generically but we will cross that bridge when we get there.

The reason for this is DirectML is 4x slower than CPU on my test machine with an integrated GPU (testing with parakeet v3). I suspect a lot of users have integrated GPU's and we cannot impact their performance.

Maybe #958 is relevant here and worth combining efforts. Pinging @andrewleech for thoughts and opinions.

I know there is also #1023, which we need to do.. Also cc: @intech. I am not ready to move to 0.3.0 of transcribe-rs quite yet. Mostly because we need to have a solid design for supporting acceleration in the app. Basically all these PR's are interrelated, so would love any help thinking about this.

Opinions and thoughts welcome. I will likely be making some changes to transcribe-rs soon enough which might change some of the API footprint as it relates to acceleration.

andrewleech · 2026-03-13T20:59:12Z

In #958 I was testing Direct ML as well as WebGPU and generally found on my iGPU they gave worse performance on most models depending on model architecture and format/quantizing.

I didn't keep DirectML by default because it's in maintenance / effectively deprecated, as far as I could tell WebGPU is the most supported framework that supports both Nvidia and AMD.

However WebGPU was slightly slower for me than DirectML on my test iirc, though there are some settings in WebGPU needed to ensure it's not using the "default browser settings" restrictions.

Cuda would likely give better performance for compatible hardware but I think ORT then bundles the ~100M binaries so you probably don't want it included / enabled by default.

My PR adds -- compile flags to select which GPU frameworks too include, along with drop-down setting to choose what's enabled.

cjpais · 2026-03-16T08:51:28Z

Closing because I will be submitting a PR for this and pulling it in

ferologics force-pushed the feat/windows-onnx-directml branch 2 times, most recently from 3b27bd4 to 32b0150 Compare March 9, 2026 23:40

ferologics and others added 2 commits March 10, 2026 00:40

feat: prefer DirectML for Windows ONNX transcription

32b0150

revert unrelated changes

af731a8

cjpais mentioned this pull request Mar 10, 2026

Add Windows microphone permission onboarding #991

Merged

ferologics marked this pull request as ready for review March 12, 2026 15:08

cjpais closed this Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prefer DirectML for Windows ONNX transcription models#985

Prefer DirectML for Windows ONNX transcription models#985
ferologics wants to merge 2 commits intocjpais:mainfrom
ferologics:feat/windows-onnx-directml

ferologics commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

cjpais commented Mar 10, 2026 •

edited

Loading

Uh oh!

ferologics commented Mar 10, 2026

Uh oh!

cjpais commented Mar 10, 2026 •

edited

Loading

Uh oh!

ferologics commented Mar 12, 2026

Uh oh!

cjpais commented Mar 13, 2026

Uh oh!

cjpais commented Mar 13, 2026

Uh oh!

andrewleech commented Mar 13, 2026 •

edited

Loading

Uh oh!

cjpais commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ferologics commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Dependency patch

Uh oh!

github-actions bot commented Mar 10, 2026

🧪 Test Build Ready

Uh oh!

cjpais commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ferologics commented Mar 10, 2026

Uh oh!

cjpais commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ferologics commented Mar 12, 2026

Uh oh!

cjpais commented Mar 13, 2026

Uh oh!

cjpais commented Mar 13, 2026

Uh oh!

andrewleech commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjpais commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ferologics commented Mar 9, 2026 •

edited

Loading

cjpais commented Mar 10, 2026 •

edited

Loading

cjpais commented Mar 10, 2026 •

edited

Loading

andrewleech commented Mar 13, 2026 •

edited

Loading