Skip to content

fix: auto-unload model after idle timeout to reduce memory#1051

Merged
cjpais merged 2 commits intocjpais:mainfrom
VirenMohindra:vm/memory-optimization
Mar 17, 2026
Merged

fix: auto-unload model after idle timeout to reduce memory#1051
cjpais merged 2 commits intocjpais:mainfrom
VirenMohindra:vm/memory-optimization

Conversation

@VirenMohindra
Copy link
Copy Markdown
Contributor

@VirenMohindra VirenMohindra commented Mar 16, 2026

summary

handy holds the transcription model in memory indefinitely (default: Never unload). on a 16 GB machine, a parakeet model uses ~1 GB, even when the user hasn't transcribed in hours

this PR adds automatic model unloading after a configurable timeout, reducing idle memory from ~1.1 GB to ~80 MB

before after
before after

changes

  • default timeout: Never -> Min5: model auto-unloads after 5 minutes of inactivity
  • fix: watcher thread killed by Drop on clones: initiate_model_load() clones TranscriptionManager, and when the clone drops, Drop::drop sets shutdown_signal = true, killing the watcher. fixed with an Arc::strong_count guard so only the last clone shuts down the watcher
  • fix: reset last_activity on model load: without this, switching models after timeout elapsed would immediately unload the just-loaded model
  • unload logging upgraded to info!() level: was debug!(), invisible at default log level
  • removed duplicate "unloaded" event: unload_model() already emits it
  • error handling on unload: match instead of if let Ok so failures are logged

test results (macOS, parakeet v2, physical footprint = activity monitor)

state memory
baseline (no model) 46 MB
model loaded 1.1 GB
after auto-unload (5s timeout) 80-88 MB
unload time 33-44ms

actual time to reach 88mb is ~30seconds

full lifecycle:

46 MB -> 1.1 GB -> 660 MB -> 330 MB -> 88 MB
(idle)  (loaded)  (unloading... OS reclaiming pages)

test plan

  • start app, timeout=sec5, transcribe -> model loads, unloads after ~12s
  • transcribe again -> model reloads, unloads after ~7s
  • watcher survives both initiate_model_load clone/drop cycles
  • activity monitor confirms 80-88 MB after unload (not 700+ MB)
  • unload duration logged at info level

@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Mar 16, 2026

I was noticing on the latest build the timeout was actually not working at all. Have you experienced this?

At least in the case of changing models from the menubar.

@VirenMohindra
Copy link
Copy Markdown
Contributor Author

VirenMohindra commented Mar 16, 2026

I was noticing on the latest build the timeout was actually not working at all. Have you experienced this?
At least in the case of changing models from the menubar.

think i found the bug. load_model() wasn't updating last_activity, so when you switch models from the menubar, here's what happens~

  1. you transcribe at T=0 -> last_activity = T=0
  2. some time passes (longer than the timeout)
  3. you switch models from the tray and load_model() loads the new model
  4. but last_activity is still T=0
  5. idle watcher checks on its next 10s tick: now - last_activity > timeout -> true -> immediately unloads the model you just loaded

so the model loads, then gets unloaded on the very next watcher cycle. it looks like the timeout "doesn't work" but really the timer was just starting from the wrong point

pushed a fix, load_model() now resets last_activity after successfully loading, so the idle timer starts fresh from the moment the model is ready. this applies to both tray model switches and the background initiate_model_load() path

@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Mar 17, 2026

the changes as they are don't work for me

I dont see prints in the console that model unloading is happening. I am also not seeing memory being freed, nor is the UI updating to show the model is unloaded.

I think the most critical paths to verify are:

  1. start the app
  2. set to never unload
  3. trigger transcription start/stop
  4. model doesnt unload
  5. set to 5 sec
  • after 5-10 sec should see print in console, and indicated in the UI that the model is unloaded (including the status bar)
  1. set to never (model does not load and loads next time)
  2. set to 2 minutes
  3. trigger transcription start/stop
  4. set to 5 seconds
  5. model should unload

I think realistically we might need a small state machine or something... Or at least very clear logic what happens when transitioning between states. It may be worth making a diagram at least... Posting that here for verification and then having an LLM go implement and verify it ideally. would be great to have it in a testable unit.

@VirenMohindra VirenMohindra force-pushed the vm/memory-optimization branch 2 times, most recently from 3a2639e to 7006902 Compare March 17, 2026 03:30
@VirenMohindra VirenMohindra changed the title fix: reduce idle memory by auto-unloading model after timeout fix: auto-unload model after idle timeout to reduce memory Mar 17, 2026
@VirenMohindra VirenMohindra force-pushed the vm/memory-optimization branch from 7006902 to 838857a Compare March 17, 2026 03:34
@VirenMohindra
Copy link
Copy Markdown
Contributor Author

VirenMohindra commented Mar 17, 2026

tested all 10 steps - everything works now. the main issue was a pre-existing bug in Drop that was killing the idle watcher thread before it could ever fire.

script here
test-model-unload.sh

root cause: initiate_model_load() clones TranscriptionManager and moves it into a spawned thread. when that thread finishes loading, the clone is dropped -> Drop::drop fires -> sets shutdown_signal = true -> watcher dies. so the watcher was alive for ~60s after startup and then permanently dead. no auto-unload ever happened

fix: added an Arc::strong_count guard in Drop - clones skip shutdown since other owners (tauri state, watcher thread) still exist. only the very last clone shuts down the watcher

test results (physical footprint, matches activity monitor)~

state memory
baseline (no model) 46 MB
model loaded 1.1 GB
after auto-unload (sec5) 80-88 MB
unload time 33-44ms

full lifecycle:

46 MB -> 1.1 GB -> 660 MB -> 330 MB -> 88 MB
(idle)  (loaded)  (unloading... OS reclaiming pages)

two back-to-back transcriptions, both auto-unloaded, watcher survived the entire session~

04:17:23  Transcription #1
04:17:35  Model idle 12s > 5s -> unloaded (44ms)
04:18:58  Transcription #2
04:19:05  Model idle 7s > 5s -> unloaded (33ms)

also upgraded unload logging from debug!() to info!() so it's visible at default log level, and replaced if let Ok with match to log unload errors

@VirenMohindra VirenMohindra force-pushed the vm/memory-optimization branch 7 times, most recently from 8e845e2 to 62a3153 Compare March 17, 2026 04:13
- Default model_unload_timeout from Never to Min5
- Fix Drop impl: use take() on watcher handle so clones from
  initiate_model_load() don't kill the watcher thread
- Reset last_activity on model load to prevent immediate unload
- Upgrade watcher logging from debug to info level
- Remove duplicate "unloaded" event (unload_model already emits it)
@VirenMohindra VirenMohindra force-pushed the vm/memory-optimization branch from 62a3153 to 6523980 Compare March 17, 2026 04:14
Comment on lines +791 to +793
if Arc::strong_count(&self.engine) > 1 {
return;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arc::strong_count docs warn it shouldn't be used for synchronization and that the count can change between the check and the shutdown_signal.store. in practice this only runs during app teardown so it's fine today, but adding this comment so future clone paths don't break this silently. alternatively, an explicit AtomicUsize refcount we control would be more robust, but YAGNI for now since i doubt we will have more clone paths

impl Default for ModelUnloadTimeout {
fn default() -> Self {
ModelUnloadTimeout::Never
ModelUnloadTimeout::Min5
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only affects fresh installs right? prob worth confirming existing users who never touched this setting won't suddenly get auto-unload behavior after upgrading

@cjpais cjpais merged commit d1da935 into cjpais:main Mar 17, 2026
5 checks passed
mussonking pushed a commit to mussonking/MadWhisp that referenced this pull request Mar 18, 2026
- Default model_unload_timeout from Never to Min5
- Fix Drop impl: use take() on watcher handle so clones from
  initiate_model_load() don't kill the watcher thread
- Reset last_activity on model load to prevent immediate unload
- Upgrade watcher logging from debug to info level
- Remove duplicate "unloaded" event (unload_model already emits it)

Co-authored-by: CJ Pais <[email protected]>
@VirenMohindra VirenMohindra deleted the vm/memory-optimization branch March 19, 2026 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants