Skip to content

fix(profile): preserve context when starting chats#1728

Closed
starship-s wants to merge 1 commit intonesquena:masterfrom
starship-s:fix/profile-context-chat-start
Closed

fix(profile): preserve context when starting chats#1728
starship-s wants to merge 1 commit intonesquena:masterfrom
starship-s:fix/profile-context-chat-start

Conversation

@starship-s
Copy link
Copy Markdown
Contributor

@starship-s starship-s commented May 6, 2026

Thinking Path

  • Profile switching has two separate state boundaries: the selected profile in the browser/API request, and the profile/config path used by backend helpers while starting a stream.
  • The config cache was keyed only by config.yaml mtime, so a request for one profile could keep using the config loaded from another profile path when mtimes did not force a reload.
  • Empty sessions are placeholders until the first message is sent; if a user switches profiles before that first turn, the placeholder can still carry the profile stamped when it was created.
  • The narrow fix is to make config reads path-aware and to carry the active profile into chat start, retagging only empty placeholder sessions before the stream begins.
  • This avoids process-wide environment mutation and avoids changing already-populated sessions, so existing session/profile isolation remains intact.

What Changed

  • Track the active config.yaml path alongside the cached config and reload when a request resolves to a different profile path.
  • Preserve explicit in-memory cfg overrides while still reloading unchanged disk-loaded config on profile-path changes.
  • Apply the same path-aware check before rebuilding available model metadata.
  • Include the active profile in the /api/chat/start payload.
  • Validate the optional chat-start profile payload and retag only empty placeholder sessions before launching the first stream.
  • Add regression tests for profile-path config reloads, empty placeholder retagging, non-empty session preservation, and invalid profile rejection.

Why It Matters

  • Prevents profile switches from resolving providers or credentials against the previously selected profile.
  • Avoids misleading missing-key errors when different profiles use different providers or credential sets.
  • Keeps already-populated sessions anchored to their original profile while still allowing unsent placeholders to follow the user's current selection.
  • Preserves the existing per-request/per-session profile model instead of broadening the fix into process-global environment changes.

Verification

Targeted profile-switch coverage:

env -u HERMES_CONFIG_PATH -u HERMES_HOME -u HERMES_BASE_HOME -u HERMES_WEBUI_HOST -u HERMES_WEBUI_PORT -u HERMES_WEBUI_STATE_DIR python -m pytest tests/test_profile_switch_1200.py tests/test_bugbatch_apr2026.py -q

Result: 28 passed

Provider/model profile coverage:

env -u HERMES_CONFIG_PATH -u HERMES_HOME -u HERMES_BASE_HOME -u HERMES_WEBUI_HOST -u HERMES_WEBUI_PORT -u HERMES_WEBUI_STATE_DIR python -m pytest tests/test_issue604_all_providers_model_picker.py tests/test_issue1106_custom_providers_models.py tests/test_issue1611_session_profile_filtering.py -q

Result: 32 passed

CI failure reproduction subset:

env -u HERMES_CONFIG_PATH -u HERMES_HOME -u HERMES_BASE_HOME -u HERMES_WEBUI_HOST -u HERMES_WEBUI_PORT -u HERMES_WEBUI_STATE_DIR python -m pytest tests/test_issue1094_provider_bugs.py::TestBug1094HasKeyFalsePositive::test_model_api_key_only_marks_active_provider tests/test_issue1499_keyless_onboarding.py::TestKeylessChatReady::test_lmstudio_keyless_chat_ready_via_full_status tests/test_minimax_provider.py::test_minimax_cn_empty_config_provider_gets_static_models tests/test_provider_quota_status.py::test_openrouter_quota_fetches_key_endpoint_and_sanitizes_response -q

Result: 4 passed

Syntax / hygiene checks:

python -m py_compile api/config.py api/routes.py

node --check static/messages.js

git diff --check

Result: all passed.

Risks / Follow-ups

  • This touches api/config.py near the existing model-list cache invalidation path, so concurrent model-cache work may need a straightforward rebase..

Models Used

  • GPT-5.5 XHIGH for implementation
  • Claude Opus 4.7 for review

@starship-s starship-s marked this pull request as draft May 6, 2026 01:55
@starship-s starship-s force-pushed the fix/profile-context-chat-start branch from 5222ec3 to 8a462a8 Compare May 6, 2026 02:07
@starship-s starship-s marked this pull request as ready for review May 6, 2026 02:18
@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Summary

Pulled the branch and walked all four files. Two distinct fixes in here, both addressing the same symptom (profile-switch context loss on first turn) but at different layers: a path-aware reload in api/config.py, and a placeholder-retag at chat-start in api/routes.py. Both are scoped narrowly and the test coverage is solid. One question and one nit below.

Code reference

The get_config() change is the heart of this — previously it cached forever once _cfg_cache was non-empty:

def get_config() -> dict:
    """Return the cached config dict, loading from disk if needed."""
    if not _cfg_cache:
        reload_config()
    return _cfg_cache

Now it watches both the path and the mtime, and reloads when either changes — but only when there are no in-memory overrides:

config_path = _get_config_path()
try:
    current_mtime = config_path.stat().st_mtime
except OSError:
    current_mtime = 0.0
cache_stale = current_mtime != _cfg_mtime or _cfg_path != config_path
if not _cfg_cache or (cache_stale and not _cfg_has_in_memory_overrides()):
    reload_config()

(api/config.py:228-237.) _cfg_has_in_memory_overrides() compares a JSON fingerprint of the cache against the snapshot taken at the last successful disk load. That neatly preserves test-time monkeypatches and any legitimate runtime mutations of cfg (which is aliased to _cfg_cache at api/config.py:297).

Diagnosis

Reading the matching agent contract — _get_config_path() resolves to get_active_hermes_home() / "config.yaml" (api/config.py:189-199), and get_active_hermes_home() already reads _tls.profile per-thread (api/profiles.py:185-202, populated by server.py:127 and :145 from the cookie on every request). So the path was already correct per-request; the bug was only that the cached config was keyed solely on mtime and so two profiles with identical config.yaml mtimes could leak into each other. The path key in the cache invalidation closes that hole.

The chat-start retag on api/routes.py:5943-5961:

requested_profile = str(body.get("profile") or "").strip()
if requested_profile:
    try:
        from api.profiles import _PROFILE_ID_RE
        if requested_profile != "default" and not _PROFILE_ID_RE.fullmatch(requested_profile):
            return bad(handler, "invalid profile", 400)
    except ImportError:
        requested_profile = ""
if requested_profile and not _profiles_match(getattr(s, "profile", None), requested_profile):
    has_persisted_turns = bool(
        getattr(s, "messages", None)
        or getattr(s, "context_messages", None)
        or getattr(s, "pending_user_message", None)
    )
    if not has_persisted_turns:
        s.profile = requested_profile

The "empty session" check uses three predicates. messages/context_messages/pending_user_message are exactly the populated-on-first-turn fields; pending_user_message in particular is set by _prepare_chat_start_session_for_stream() at api/routes.py:5921, which runs after this retag. So a placeholder created before any chat_start has all three falsy and the retag fires; once chat_start lands once, pending_user_message (and post-stream messages) prevents a stale tag from being silently overwritten. Good.

One question

The _PROFILE_ID_RE import is wrapped in try/except ImportError, which silently disables validation when api.profiles fails to import. That is unreachable in practice (the same module is imported above by _profiles_match), so the fallback requested_profile = "" branch is dead code. Not blocking — just noting it would be a tiny bit cleaner to rely on from api.profiles import _PROFILE_ID_RE at the top of the function, given everything else here already depends on api.profiles being loadable.

Nit

static/messages.js:228 adds profile:S.activeProfile||S.session.profile||'default', but static/sessions.js:1875 (the optimistic-row builder) uses S.session.profile||S.activeProfile||'default' — the opposite priority. For a placeholder session created under profile A, S.session.profile is 'A' and S.activeProfile (the current cookie/header selection) is 'B'. The send path correctly prefers B so the retag will fire; the sidebar row will optimistically render as 'A' and only correct itself after the next session-list sync. This is a cosmetic blip, not a correctness issue, but worth a follow-up if it shows up in QA.

Verification

tests/test_profile_switch_1200.py:392-614 covers the four key paths: get_config reload on path change with identical mtime, retag of empty session, no-retag of non-empty session, and rejection of ../etc-style profile values. The os.utime setup pinning both files to the same mtime is exactly the failure mode the body describes. All three CI checks are green. Looks ready to merge.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Thanks @starship-s — this shipped in v0.51.8 (commit 85d0279) as part of a 7-PR full-sweep batch release. Stage rebased your branch onto current master, ran the full pre-release gate (4584 pytest, browser tests, Opus advisor verdict SHIP), and merged via release PR #1737.

GitHub didn't auto-close because the merge commit only references the squash-merged stage branch, not your fork's commit directly — closing manually for hygiene.

Live now on https://get-hermes.ai/ and on existing installs after git pull + restart.

Release notes: https://github.com/nesquena/hermes-webui/releases/tag/v0.51.8

pull Bot pushed a commit to soitun/hermes-webui that referenced this pull request May 6, 2026
pull Bot pushed a commit to soitun/hermes-webui that referenced this pull request May 6, 2026
PR nesquena#1728's path/mtime-aware get_config() reload broke the common test
idiom monkeypatch.setattr(config, 'cfg', {...}). The cfg = _cfg_cache
alias bound at import time means the rebinding only changes the module
attribute; _cfg_cache stays unchanged, so _cfg_has_in_memory_overrides()
returned False and the path-aware reload silently overwrote the test's
override. test_issue1426_openrouter_* and test_issue1680_codex_* failed
in the full suite while passing standalone — exact polluter signature.

Fix:
- _cfg_has_in_memory_overrides() now also detects cfg-rebind via
  cfg is not _cfg_cache.
- get_config() returns cfg (the override) when it differs from
  _cfg_cache, so callers see the test's intended override.
- 4 new regression tests pin both prongs in
  test_stage302_config_override_regression.py.

Defense-in-depth (prong 2 of test-isolation-flake-recipe):
- test_sprint3.py::test_skills_list and test_skills_list_has_required_fields
  now skip on empty skills list rather than asserting > 0 / IndexError, so
  future profile-switch / SKILLS_DIR repointing pollutions don't break
  the build. The contract under test is 'API returns a non-empty list
  when there are entries' — empty list signals a polluter elsewhere.

Pre-existing wall-clock flake fix (absorb-in-release):
- test_issue1144_session_time_sync.py::test_relative_time_uses_server_clock
  now pins Date.now() to a fixed instant. Without pinning, when CI runs
  near 08:00 UTC the projected server time crosses midnight and '5 minutes
  ago' silently becomes '1d'. Same time-of-day-pin pattern as the sibling
  test_session_bucket_uses_server_clock used.

Test count: 4580 → 4584 (+4 regression tests). 0 failures, stably green
across multiple runs.
pull Bot pushed a commit to soitun/hermes-webui that referenced this pull request May 6, 2026
…p + test-isolation fix

Constituent PRs:
- nesquena#1725 (@Michaelyklam) — simplify compact Activity row summary
- nesquena#1726 (@Michaelyklam) — delegate generic provider catalogs to Hermes CLI (slice of nesquena#1240)
- nesquena#1727 (@Michaelyklam) — link Claude Code OAuth in onboarding (closes nesquena#1362)
- nesquena#1728 (@starship-s) — preserve profile context when starting chats
- nesquena#1729 (@Michaelyklam) — persist compact Activity disclosure state
- nesquena#1730 (@Michaelyklam) — prevent sticky sidebar hover drag state
- nesquena#1732 (@Sanjays2402) — unpin scroll on small upward motion during streaming (closes nesquena#1731)

Plus 2 in-stage absorbed fixes:
- test-isolation fix: monkeypatch.setattr(config, 'cfg', X) survives PR nesquena#1728's
  path/mtime-aware get_config() reload. Mandatory before tag (Opus stage-302).
- Opus SHOULD-FIX #1: _lastScrollTop reset on session switch (nesquena#1732 follow-up).

Tests: 4537 → 4584 passing (+47). 0 regressions. Full suite ~128s. Stably green.

Pre-release verification:
- All 7 PRs CI-green individually + rebased onto master
- pytest 4584 passed, 0 failed (multiple runs)
- node -c clean on all 4 modified .js files
- 11/11 browser API endpoints PASS on isolated port 8789
- 20 QA tests via webui_qa_agent.sh PASS
- Opus advisor: SHIP, 5/5 verification clean, 0 MUST-FIX, 1 SHOULD-FIX absorbed
  (_lastScrollTop reset), 1 SHOULD-FIX deferred (nesquena#1736 — _clear_anthropic_env_values
  race, onboarding-time-only)

Closes nesquena#1362, nesquena#1731.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants