Skip to content

perf: cache live model lookups briefly#1378

Closed
NocGeek wants to merge 1 commit intonesquena:masterfrom
NocGeek:perf/live-models-ttl-cache
Closed

perf: cache live model lookups briefly#1378
NocGeek wants to merge 1 commit intonesquena:masterfrom
NocGeek:perf/live-models-ttl-cache

Conversation

@NocGeek
Copy link
Copy Markdown
Contributor

@NocGeek NocGeek commented May 1, 2026

Summary

  • add a 60-second in-memory TTL cache for /api/models/live
  • scope cache entries by active WebUI profile and provider
  • return deep copies so callers cannot mutate cached payloads
  • add regression tests for cache hits, expiry, profile scoping, and mutation isolation

Tests

  • python3 -m py_compile api/routes.py tests/test_live_models_ttl_cache.py
  • python -m pytest tests/test_live_models_ttl_cache.py tests/test_opencode_providers.py::test_live_models_handler_delegates_to_provider_model_ids tests/test_opencode_providers.py::test_live_models_ui_no_longer_skips_any_provider
  • python -m pytest tests/test_byok_model_dropdown.py tests/test_credential_pool_providers.py tests/test_custom_providers_in_panel.py tests/test_issue1094_provider_bugs.py tests/test_provider_management.py tests/test_issues_373_374_375.py tests/test_live_models_ttl_cache.py tests/test_opencode_providers.py

Local focused result: 6 passed
Local broader provider/model result: 120 passed

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Nice — this directly closes the TODO already sitting in api/routes.py (lines ~146–148):

# TODO: Add TTL-based caching (e.g. 60s) so repeated model-list requests
# don't hit provider APIs.  The frontend already caches via _liveModelCache
# but the backend re-fetches on every /api/models/live call.

A few thoughts from skimming the diff:

  1. Profile-scoping the cache key is the right call. /api/models/live results depend on the active profile's credentials, so a global cache would mask provider keys differing across profiles. Good that the regression test asserts this explicitly.

  2. Deep-copying on read is the right call too. Returning a shared dict and letting a handler mutate it before serialization would be a great way to introduce a subtle "first request after restart looks fine, second request is missing fields" bug. Worth making sure the deep-copy path is exercised before serialization in the actual handler, not only in the test (copy.deepcopy on a 200-model list is cheap but not free — a quick time.perf_counter log would tell us if it shows up at scale).

  3. 60s TTL matches the comment hint. One thing to consider: when a user just added a new BYOK key and hits "refresh models" in the picker, they may now wait up to 60s before the new provider's models appear. If that becomes a complaint, the TTL cache could grow a "bypass on explicit refresh" query param (?nocache=1) — not blocking on this PR, just flagging.

  4. Cache invalidation on profile switch / credential rotation — worth confirming that operations that mutate provider credentials (adding/removing a BYOK key, switching a profile's credential_pool) don't leave stale entries pinned for up to a minute. If invalidation does need to be added, exposing a small _clear_live_models_cache(profile=None) and calling it from the credential-update path would be the cleanest hook.

Tests cover hit/expiry/profile/mutation, which is the right square. Will defer merge to maintainer review.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Released as part of v0.50.252 — thanks @NocGeek!

This PR was merged into the v0.50.252 release batch via #1387 alongside 5 other contributor fixes. The full CHANGELOG entry is at https://github.com/nesquena/hermes-webui/blob/master/CHANGELOG.md.

Pre-release verification: 3507 pytest tests pass, full QA harness pass (20 structural + 11 browser API + 23 Agent Browser CDP), Opus mentor APPROVED with two non-blocking follow-ups applied during the release batch (force=True on agent redactor, debug-log on profile fallback).

Closing this PR — the change is live on master.

pull Bot pushed a commit to JamesWilliam1977/hermes-webui that referenced this pull request May 1, 2026
GeoffBao pushed a commit to GeoffBao/hermes-webui that referenced this pull request May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants