gpu: Add runtime GPU execution provider selection.#958
Closed
andrewleech wants to merge 2 commits intocjpais:mainfrom
Closed
gpu: Add runtime GPU execution provider selection.#958andrewleech wants to merge 2 commits intocjpais:mainfrom
andrewleech wants to merge 2 commits intocjpais:mainfrom
Conversation
Owner
|
Okay, I'm open to this, but I really need you to do some work on the CI/CD too. I get it works for you at build time but this cannot be pulled in if it cannot be distributed across platforms properly. |
Author
|
To be fair it doesn't work for me, but that's an AMD laptop issue (even though they brand it as an AI laptop) ;-) Good point regardless, I'm sorry for throwing this up without tests, my bad. All the testing scripts were too local looking to include in the commit, I overlooked that was all I had. I'll look into what can be covered in a GitHub runner, if nothing else the program flow/logic. |
Add compile-time feature flags (gpu-directml, gpu-cuda, gpu-coreml, webgpu) that enable GPU acceleration for ORT-based engines (Parakeet, Moonshine, SenseVoice). A runtime GPU provider setting in Advanced Settings lets users switch between available providers; changing the provider reloads any loaded ORT model. Stale provider values from a previous build are reset to "auto" at startup.
Points at andrewleech/transcribe-rs feat/gpu-providers (PR cjpais/transcribe-rs#49). Drop this commit once GPU provider support is published to crates.io.
bb34e86 to
9575e6f
Compare
Owner
|
im going to close this, but we will add the option, I just want to start it from scratch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ORT-based transcription engines (Parakeet, Moonshine, SenseVoice) can use GPU acceleration via ONNX Runtime execution providers, but until now the build had no way to select one. This adds compile-time Cargo feature flags (
gpu-directml,gpu-cuda,gpu-coreml,webgpu) that gate the available providers, plus a runtime settings dropdown that lets the user switch between them without restarting.Backed by transcribe-rs PR #49 which adds
GpuProviderenum,set_gpu_provider(),available_providers().flowchart LR A[Startup] --> B{Persisted provider<br/>available in build?} B -->|yes| C[set_gpu_provider] B -->|no| D[Reset to auto<br/>+ warn] D --> C E[User changes setting] --> F{Model loaded?} F -->|ORT engine| G[Unload + reload<br/>with new EP] F -->|Whisper| H[Skip reload<br/>whisper.cpp ignores EP] F -->|Transcription in flight| I[Reject + revert]The dropdown only appears when a GPU EP beyond
auto+cpuis compiled in, so default builds see no UI change.The second commit (
chore: temporarily pin transcribe-rs to git branch) should be dropped once the upstream transcribe-rs PR merges and a crates.io release includes GPU provider support.Benchmarks
Tested on AMD Ryzen AI 9 365 (Zen 5) with integrated AMD Radeon 860M iGPU. The autoregressive decoding loop in current ORT-based engines doesn't benefit from GPU parallelism — iGPU memory bandwidth is shared with CPU and per-token overhead dominates.
DirectML (Qwen3-ASR decoder only — encoder hits a DML Conv2d NaN bug and must stay on CPU):
WebGPU (Qwen3-ASR encoder + decoder, native Windows, D3D12 backend):
On this hardware, GPU acceleration doesn't help. The infrastructure is in place for users with discrete GPUs where the cost/benefit may be different, and the setting defaults to
autoso there's no regression for everyone else.Testing
cargo checkclean, 33 unit tests pass--features gpu-directml(setting persists, model reloads, stale provider resets correctly on next launch)Trade-offs and Alternatives
Whisper models (whisper.cpp backend) don't use ORT, so GPU provider changes are intentionally skipped for them — the reload would be a no-op. An alternative would be hiding the setting entirely when only Whisper models are available, but the current approach is simpler and the dropdown already hides itself when no GPU EP is compiled in.