Skip to content

feat(audio): use device default sample rate and always downsample#1084

Merged
cjpais merged 2 commits intocjpais:mainfrom
VirenMohindra:vm/default-sample-rate
Mar 19, 2026
Merged

feat(audio): use device default sample rate and always downsample#1084
cjpais merged 2 commits intocjpais:mainfrom
VirenMohindra:vm/default-sample-rate

Conversation

@VirenMohindra
Copy link
Copy Markdown
Contributor

@VirenMohindra VirenMohindra commented Mar 17, 2026

Before Submitting This PR

Human Written Description

currently we force the mic to open at 16kHz whenever the hardware advertises support for it. this can produce suboptimal audio on devices where 16kHz is technically in the supported range but isn't the native rate, ie bluetooth codecs, certain ALSA drivers on linux, some USB mics. cj suggested using the device's default sample rate and always downsampling instead, since the resampling pipeline already exists and handles it well.

this PR changes get_preferred_config() to query the device's default rate and pick the best sample format at that rate, letting the existing FrameResampler (rubato FFT) downsample to 16kHz. devices that already default to 16kHz are unaffected (resampler short-circuits).

Related Issues/Discussions

per cj's suggestion in #747 (comment)

Testing

  • built-in mic (typically 48kHz native): transcription works correctly
  • bluetooth headset: transcription works, audio quality not degraded
  • back-to-back recordings: no latency regression
  • device that defaults to 16kHz: no regression (resampler short-circuits)

AI Assistance

  • AI was used (please describe below)

If AI was used:

  • Tools used: claude code
  • How extensively: researched the audio pipeline to understand sample rate flow, wrote the implementation

instead of forcing the microphone to open at 16kHz (which can cause
issues with bluetooth codecs, some ALSA drivers, and other devices
that advertise 16kHz support but produce suboptimal audio), use the
device's native/default sample rate and let the existing FrameResampler
downsample to 16kHz for the whisper pipeline.

the resampling infrastructure (rubato FftFixedIn) already exists in
run_consumer() and short-circuits when in_hz == out_hz, so devices
that natively default to 16kHz are unaffected.
some ALSA/PipeWire drivers support default_input_config but have
a broken supported_input_configs implementation. fall back to the
default config instead of propagating the error. also add a warn
log when no config matches the device's default rate.
@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Mar 18, 2026

Thank you, I think this is the right fix. I see #1083 but prefer this I think if it works universally

@github-actions
Copy link
Copy Markdown

🧪 Test Build Ready

Build artifacts for PR #1084 are available for testing.

Download artifacts from workflow run

Artifacts expire after 30 days.

@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Mar 19, 2026

I know we don't have a lot of testing on this, and I'm a bit concerned it could break something, but I think it's a good change for now. Hopefully it will fix some outstanding issues.

I think we can pull it in and hit issues in prod if any.

@cjpais cjpais merged commit 0b3322f into cjpais:main Mar 19, 2026
5 checks passed
@VirenMohindra VirenMohindra deleted the vm/default-sample-rate branch March 19, 2026 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants