fix(models): preserve /api/models cache metadata by franksong2702 · Pull Request #1239 · nesquena/hermes-webui

franksong2702 · 2026-04-28T23:56:36Z

Thinking Path

Hermes WebUI uses /api/models as more than a display list: the frontend reads it for provider identity, the configured default model, live-model enrichment, and model mismatch checks.
The cold model catalog build returns the full shape: active_provider, default_model, and groups.
The disk cache only persisted groups, and the load path accepted any cache with groups as valid.
That meant a disk-cache hit could serve an incomplete /api/models payload even though config.yaml, /api/settings, and /api/providers still had the correct provider/default-model state.
This PR fixes the cache schema at the backend source-of-truth layer so cached model catalogs keep the metadata the frontend depends on.

What Changed

Added validation for the /api/models disk-cache payload shape.
Disk cache is now valid only when it includes:
- active_provider
- default_model
- groups
Legacy groups-only cache files are rejected and rebuilt instead of being served.
The cache write path now preserves the full metadata payload instead of writing only groups.
Added focused regression coverage for:
- preserving metadata when saving cache
- rejecting legacy groups-only cache files
- rebuilding through get_available_models() instead of returning the legacy cache
Added an Unreleased changelog entry.

Why It Matters

Without this metadata, the frontend can lose the configured model/provider source of truth and fall back to weaker state such as the current dropdown value, stale localStorage, or static picker ordering.

This is intentionally a small reliability fix. It does not redesign the model picker or provider settings; it makes the existing /api/models contract stable across restarts and disk-cache hits.

Verification

python -m pytest tests/test_model_cache_metadata.py
python -m pytest tests/test_sprint11.py tests/test_ttl_cache.py tests/test_profile_switch_1200.py
python -m py_compile api/config.py
git diff --check

Risks / Follow-ups

This does not address duplicate model IDs across providers; that is covered by bug(models): session model picker loses provider identity when multiple providers expose the same model ID #1228 / fix(models): deduplicate model IDs across provider groups #1231.
This does not add dynamic Hermes Agent-backed model discovery for providers missing from WebUI's static catalog; that is discussed in bug(models): providers not in hardcoded _PROVIDER_MODELS dict are invisible in WebUI model picker despite being authenticated #1236.
External config-file changes versus disk-cache freshness remain a broader source-of-truth topic, but this PR keeps scope limited to the confirmed metadata-loss bug.

Model Used

OpenAI Codex, GPT-5. AI-assisted implementation with a delegated worker sub-agent, followed by coordinator review and verification.

nesquena-hermes · 2026-04-29T01:09:02Z

Thanks for this targeted reliability fix, @franksong2702! The cache metadata preservation problem is real — a groups-only cache hit silently breaking active_provider and default_model for the frontend is exactly the kind of subtle state drift that causes hard-to-reproduce model picker issues.

The approach looks correct:

Cache schema validation
Requiring all three keys (active_provider, default_model, groups) before a cache file is considered valid, and rejecting legacy groups-only files for rebuild, is the right call. Better to do a cold rebuild than serve a partially valid payload.

A few things to confirm before merge:

TTL cache interaction
get_available_models() uses a TTL in-memory cache alongside the disk cache. When a legacy disk cache is rejected and a rebuild is triggered, does the TTL in-memory cache also get invalidated/refreshed? Or could the rebuilt disk cache be correct while the in-memory TTL cache still holds the old groups-only shape for the duration of the TTL window?

Cache invalidation on active_provider change
If the user switches providers (updating config.yaml), the disk cache now stores active_provider. When they switch back or change provider, is the old cached active_provider value detected as stale, or does the cache only get invalidated based on TTL/file age? A cache hit with a stale active_provider would re-introduce the same class of bug this PR is fixing.

Test coverage for partial metadata
The PR covers the groups-only legacy case. Is there test coverage for a disk cache that has active_provider and groups but is missing default_model (or vice versa)? The validation should reject any partial combination.

Changelog entry
The Unreleased entry is appreciated. Confirming it's under the correct section and doesn't duplicate an existing entry.

The test file looks focused and the change to api/config.py is appropriately scoped. This looks close to ready — the TTL cache interaction question is the main thing worth confirming before merge. Thanks for the detailed Thinking Path!

🤖 Automated triage via nesquena-hermes

franksong2702 · 2026-04-29T04:01:33Z

Addressed the cache-shape and invalidation edge cases from review in f3babd1.

What changed:

TTL memory cache now goes through the same shape validation before being served. If a legacy groups-only payload or partial metadata payload is still in memory, WebUI discards it and rebuilds instead of serving it for the 24h TTL window.
Config mtime changes now also discard any disk payload that was preloaded before the config reload path, so stale active_provider / default_model metadata cannot be served after config.yaml changes.
Added regression coverage for partial metadata payloads: missing active_provider, missing default_model, missing groups, and invalid field types.
Left the changelog entry under [Unreleased] / Fixed, which matches this patch as a user-visible bug fix.

Verification:

uv run --with pytest --with pyyaml python -m pytest tests/test_model_cache_metadata.py -> 6 passed
uv run --with pytest --with pyyaml python -m pytest tests/test_model_cache_metadata.py tests/test_sprint11.py tests/test_ttl_cache.py tests/test_profile_switch_1200.py -> 28 passed
python3 -m py_compile api/config.py
git diff --check

nesquena-hermes · 2026-04-29T05:18:01Z

Thanks for the follow-up commit f3babd1, @franksong2702 — this directly addresses all three open questions from the previous review.

Confirming the resolutions:

✅ TTL memory cache shape validation — in-memory cache now validates the full three-key shape before serving. A groups-only payload in memory is discarded and rebuilt rather than served for the TTL window. This closes the race between disk cache rebuild and in-memory TTL.
✅ Config mtime invalidation — config.yaml changes now also discard disk-preloaded payloads with stale active_provider / default_model. The class of bug this PR is fixing cannot re-enter through a provider switch.
✅ Partial metadata regression coverage — four new cases: missing active_provider, missing default_model, missing groups, invalid field types. The validation rejects all partial combinations correctly.
✅ Changelog — [Unreleased] / Fixed is the correct section for a user-visible bug fix.

The 28-test pass across test_model_cache_metadata.py, test_sprint11.py, test_ttl_cache.py, and test_profile_switch_1200.py is a solid cross-suite confirmation that the fix doesn't regress the surrounding model cache behavior.

This looks ready for merge. The implementation is clean and the test coverage is thorough.

🤖 Automated triage via nesquena-hermes

nesquena-hermes · 2026-04-29T05:31:17Z

Merged in v0.50.237 via #1243. Thank you @franksong2702! 🎉

fix(models): preserve model cache metadata

db798ba

franksong2702 mentioned this pull request Apr 29, 2026

Clarify and harden Provider/Model source-of-truth across WebUI, CLI, and Gateway #1240

Open

Harden model cache invalidation paths

f3babd1

nesquena-hermes mentioned this pull request Apr 29, 2026

fix+feat: batch v0.50.237 — 21 PRs (embedded terminal, JSON/diff viewers, MCP UI, cron tracking, workspace CRUD, archive upload, DeepSeek V4, NVIDIA NIM, fixes) #1243

Merged

nesquena-hermes closed this Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(models): preserve /api/models cache metadata#1239

fix(models): preserve /api/models cache metadata#1239
franksong2702 wants to merge 2 commits intonesquena:masterfrom
franksong2702:franksong2702/model-cache-metadata

franksong2702 commented Apr 28, 2026

Uh oh!

nesquena-hermes commented Apr 29, 2026

Uh oh!

franksong2702 commented Apr 29, 2026

Uh oh!

nesquena-hermes commented Apr 29, 2026

Uh oh!

nesquena-hermes commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

franksong2702 commented Apr 28, 2026

Thinking Path

What Changed

Why It Matters

Verification

Risks / Follow-ups

Model Used

Uh oh!

nesquena-hermes commented Apr 29, 2026

Uh oh!

franksong2702 commented Apr 29, 2026

Uh oh!

nesquena-hermes commented Apr 29, 2026

Uh oh!

nesquena-hermes commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants