gpu: Add runtime GPU execution provider selection. by andrewleech · Pull Request #958 · cjpais/Handy

andrewleech · 2026-03-04T11:17:30Z

Summary

ORT-based transcription engines (Parakeet, Moonshine, SenseVoice) can use GPU acceleration via ONNX Runtime execution providers, but until now the build had no way to select one. This adds compile-time Cargo feature flags (gpu-directml, gpu-cuda, gpu-coreml, webgpu) that gate the available providers, plus a runtime settings dropdown that lets the user switch between them without restarting.

Backed by transcribe-rs PR #49 which adds GpuProvider enum, set_gpu_provider(), available_providers().

flowchart LR
    A[Startup] --> B{Persisted provider<br/>available in build?}
    B -->|yes| C[set_gpu_provider]
    B -->|no| D[Reset to auto<br/>+ warn]
    D --> C
    E[User changes setting] --> F{Model loaded?}
    F -->|ORT engine| G[Unload + reload<br/>with new EP]
    F -->|Whisper| H[Skip reload<br/>whisper.cpp ignores EP]
    F -->|Transcription in flight| I[Reject + revert]

The dropdown only appears when a GPU EP beyond auto+cpu is compiled in, so default builds see no UI change.

The second commit (chore: temporarily pin transcribe-rs to git branch) should be dropped once the upstream transcribe-rs PR merges and a crates.io release includes GPU provider support.

Benchmarks

Tested on AMD Ryzen AI 9 365 (Zen 5) with integrated AMD Radeon 860M iGPU. The autoregressive decoding loop in current ORT-based engines doesn't benefit from GPU parallelism — iGPU memory bandwidth is shared with CPU and per-token overhead dominates.

DirectML (Qwen3-ASR decoder only — encoder hits a DML Conv2d NaN bug and must stay on CPU):

Metric	Result
1.7B decoder	~7% speedup
0.6B decoder	~9% slowdown (GPU dispatch overhead > benefit)

WebGPU (Qwen3-ASR encoder + decoder, native Windows, D3D12 backend):

Model	Audio	CPU RTF	CPU time	WebGPU RTF	WebGPU time	Delta
0.6B	11s	2.54x	4.33s	2.42x	4.55s	-5%
0.6B	30s	1.45x	20.37s	1.38x	21.45s	-5%
1.7B	11s	1.31x	8.38s	1.16x	9.47s	-12%
1.7B	30s	0.98x	30.18s	0.66x	45.16s	-50%

On this hardware, GPU acceleration doesn't help. The infrastructure is in place for users with discrete GPUs where the cost/benefit may be different, and the setting defaults to auto so there's no regression for everyone else.

Testing

cargo check clean, 33 unit tests pass
Frontend lint clean
Runtime-tested on Windows with --features gpu-directml (setting persists, model reloads, stale provider resets correctly on next launch)

Trade-offs and Alternatives

Whisper models (whisper.cpp backend) don't use ORT, so GPU provider changes are intentionally skipped for them — the reload would be a no-op. An alternative would be hiding the setting entirely when only Whisper models are available, but the current approach is simpler and the dropdown already hides itself when no GPU EP is compiled in.

cjpais · 2026-03-04T12:04:08Z

Okay, I'm open to this, but I really need you to do some work on the CI/CD too. I get it works for you at build time but this cannot be pulled in if it cannot be distributed across platforms properly.

andrewleech · 2026-03-04T20:08:09Z

To be fair it doesn't work for me, but that's an AMD laptop issue (even though they brand it as an AI laptop) ;-)

Good point regardless, I'm sorry for throwing this up without tests, my bad. All the testing scripts were too local looking to include in the commit, I overlooked that was all I had.

I'll look into what can be covered in a GitHub runner, if nothing else the program flow/logic.

Add compile-time feature flags (gpu-directml, gpu-cuda, gpu-coreml, webgpu) that enable GPU acceleration for ORT-based engines (Parakeet, Moonshine, SenseVoice). A runtime GPU provider setting in Advanced Settings lets users switch between available providers; changing the provider reloads any loaded ORT model. Stale provider values from a previous build are reset to "auto" at startup.

Points at andrewleech/transcribe-rs feat/gpu-providers (PR cjpais/transcribe-rs#49). Drop this commit once GPU provider support is published to crates.io.

cjpais · 2026-03-16T08:42:15Z

im going to close this, but we will add the option, I just want to start it from scratch

pi-anl added 2 commits March 5, 2026 15:31

chore: temporarily pin transcribe-rs to git branch

9575e6f

Points at andrewleech/transcribe-rs feat/gpu-providers (PR cjpais/transcribe-rs#49). Drop this commit once GPU provider support is published to crates.io.

andrewleech force-pushed the feat/gpu-providers-standalone branch from bb34e86 to 9575e6f Compare March 5, 2026 04:32

cjpais mentioned this pull request Mar 13, 2026

Prefer DirectML for Windows ONNX transcription models #985

Closed

cjpais closed this Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gpu: Add runtime GPU execution provider selection.#958

gpu: Add runtime GPU execution provider selection.#958
andrewleech wants to merge 2 commits intocjpais:mainfrom
andrewleech:feat/gpu-providers-standalone

andrewleech commented Mar 4, 2026

Uh oh!

cjpais commented Mar 4, 2026

Uh oh!

andrewleech commented Mar 4, 2026 •

edited

Loading

Uh oh!

cjpais commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

andrewleech commented Mar 4, 2026

Summary

Benchmarks

Testing

Trade-offs and Alternatives

Uh oh!

cjpais commented Mar 4, 2026

Uh oh!

andrewleech commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjpais commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andrewleech commented Mar 4, 2026 •

edited

Loading