feat(audio): lazy stream close for bluetooth mic latency#747
feat(audio): lazy stream close for bluetooth mic latency#747cjpais merged 8 commits intocjpais:mainfrom
Conversation
9a0344c to
07fe20c
Compare
|
Does this cause audio issues with other applications? That's the main concern I have with this. Cause we're kind of holding a resource that we're not actively using, and I suspect that there could be conflicts with other applications who might try to gain that resource potentially like during video calls or other things. Like I know the audio stuff on Mac OS is pretty weird. And we might just need to ultimately write a better set of drivers for Mac OS to solve this problem at the root level, rather than making patches to the application. |
…LLM env URL - PR cjpais#477: Graceful error handling in audio recorder worker thread instead of panicking when no microphone config is available or stream fails to build - PR cjpais#747: Lazy stream close for bluetooth mic latency - keeps mic stream open for 30s after recording stops in OnDemand mode, eliminating BT activation delay - PR cjpais#633: Support PHONARA_CUSTOM_LLM_BASE_URL env var to override LLM base URL at runtime (useful for local LLM services on dynamic ports) - PR cjpais#872: Bump macOS minimum system version from 10.13 to 10.15 - PR cjpais#976: Add tests confirming long repeating word stutter collapse works Co-Authored-By: Claude Opus 4.6 <[email protected]>
07fe20c to
2448b87
Compare
|
but it's more nuanced than just holding a resource because during the 30s idle window, the mic isn't just sitting there dormant. the cpal AudioUnit callback keeps firing every ~30ms, the hardware is actively capturing audio, and run_consumer receives every chunk and it just discards them since for built-in and wired mics, this is mostly fine. for bluetooth though (airpods, etc), it's a bigger deal. an active input stream forces macOS into the HFP / SCO profile, which is low quality 16kHz mono for BOTH input and output on that device. so for the full 30 seconds, the user's music or podcast audio quality tanks even though handy isn't actually recording, which isn't ideal so the tradeoff is 30 seconds of active mic capture + degraded bluetooth audio vs ~1-2 seconds of bluetooth reconnection latency on back-to-back transcriptions. worth it for power users who transcribe rapidly, not worth it for everyone else some ideas to make this smarter:
i'm leaning toward option 2 since the current behavior (immediate close) is the safe default for all hardware. what do you think |
|
2 is okay. I think we can maybe do this in experimental for now. I'm also wondering if we should, to fix other issues potentially, just use the devices default sample rate and always down sample. Our pipeline handles this well already I believe and it may improve things more broadly This change could be done in another pr |
2448b87 to
31e19d4
Compare
keep the microphone stream open for 30s after recording stops to reduce latency on back-to-back transcriptions. gated behind an experimental setting (off by default) since it keeps the mic actively capturing while idle — degrading bluetooth audio quality on macOS. adds lazy_stream_close setting, tauri command, toggle in experimental settings section, and i18n strings.
31e19d4 to
0572c89
Compare
cancel_recording was unconditionally calling schedule_lazy_close, keeping the mic open for 30s even when the setting was disabled. also bump close_generation on AlwaysOn→OnDemand switch to cancel any stale lazy close timers.
sounds good, reworked this to be opt-in under experimental (off by default). added a "keep mic open between transcriptions" toggle that only shows when experimental features are enabled. default behavior (immediate close) is unchanged
for the default sample rate idea opened that as a separate PR: #1084 |
keep the microphone stream open for 30s after recording stops to reduce latency on back-to-back transcriptions. gated behind an experimental setting (off by default) since it keeps the mic actively capturing while idle — degrading bluetooth audio quality on macOS. adds lazy_stream_close setting, tauri command, toggle in experimental settings section, and i18n strings.
cancel_recording was unconditionally calling schedule_lazy_close, keeping the mic open for 30s even when the setting was disabled. also bump close_generation on AlwaysOn→OnDemand switch to cancel any stale lazy close timers.
c279b3e to
cfb9e1e
Compare
… into vm/lazy-stream-close
Summary
lazy_stream_closesetting (off by default, under experimental) that keeps the mic stream open for 30s after recording stops, reducing latency for back-to-back transcriptionsContext
per discussion in PR comments: the original unconditional 30s lazy close degraded bluetooth audio quality (macOS forces HFP/SCO profile while input stream is active). making it opt-in under experimental lets power users who transcribe rapidly enable it while keeping the safe default for everyone else.
Test plan
AI Assistance
If AI was used: