fix(profiles): profile switch now shows correct workspace, model, and chip label immediately#1203
fix(profiles): profile switch now shows correct workspace, model, and chip label immediately#1203nesquena-hermes wants to merge 2 commits intomasterfrom
Conversation
… chip label immediately (#1200) Three separate bugs caused profile switching to feel broken after each change: Bug 1 — wrong workspace (api/profiles.py): switch_profile(process_wide=False) called get_last_workspace() to populate the default_workspace return field. That function routes through get_active_profile_name() which reads thread-local context — still pointing at the OLD profile during the switch request (the Set-Cookie hasn't been processed yet). Fix: read directly from home/webui_state/last_workspace.txt where home is the already-resolved target profile directory, with fallbacks to config.yaml terminal.cwd and DEFAULT_WORKSPACE. Also: expanduser() on file content for ~/... paths; exception fallback now uses DEFAULT_WORKSPACE instead of re-calling the buggy get_last_workspace(). Bug 2 — wrong model dropdown (api/routes.py): The in-memory models cache (_available_models_cache) was not cleared after a switch. A populated cache from the old profile was served unchanged to the first /api/models request after switching. Fix: call invalidate_models_cache() in the /api/profile/switch route handler immediately after a successful switch. This forces a cold rebuild from the new profile's config on the next request. Full disk+memory invalidation is necessary because the disk cache is a single per-server file, not per-profile. Bug 3 — profile chip label stale (static/ui.js): syncTopbar() returns early when S.session is null, which is the state during a profile switch with no active conversation. The profileChipLabel update was only in the main path after that early return, so it never ran. Fix: added the chip update to the early-return branch before return;. Bug 4+4b — flaky timing tests (tests/test_issue1144_session_time_sync.py): Both skew-compensation tests used exact Date.now() equality across two consecutive calls with an internal call in between. Under CPU load, 1ms jitter caused intermittent failure. Fixed both: midpoint averaging of t0..t1 with ±5ms tolerance window. Browser verified: default→camanji→webui→default all show correct profile chip, model, and workspace immediately after switch with no reload needed. 9 new tests in tests/test_profile_switch_1200.py, including 4 named regression guards. 2773 tests pass, 0 failures.
…ace lookup api/profiles.py:17 already imports Path from pathlib at module level, so re-importing it as _Path inside switch_profile's try-block was redundant. The aliasing also created a latent NameError path: if both inner "from api.config import DEFAULT_WORKSPACE" imports failed, the final fallback str(_Path.home()) would NameError because _Path was never bound (only conditionally assigned inside the outer try). Practically unreachable (api.config has been imported successfully at module load time already, so a second import attempt won't suddenly fail), but unnecessary code surface. Switch to the module-level Path everywhere in this block. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
nesquena
left a comment
There was a problem hiding this comment.
Review — end-to-end ✅ (approved with small cleanup pushed)
What this ships
Three coordinated bugs fixed in profile switching (#1200), plus a flaky-test stabilisation:
| Bug | File | Problem | Fix |
|---|---|---|---|
| 1 | api/profiles.py:309-351 | switch_profile(process_wide=False) called get_last_workspace() which routed through thread-local profile context — still pointing at the OLD profile during the switch. Returned old workspace. |
Read directly from home / 'webui_state' / 'last_workspace.txt' with config.yaml fallbacks. |
| 2 | api/routes.py:1556-1560 | In-memory _available_models_cache not invalidated after switch → next /api/models returned old profile's models. |
Call invalidate_models_cache() after switch_profile() succeeds. |
| 3 | static/ui.js:1859-1861 | syncTopbar() early-returns when S.session is null (the state during a switch with no active conversation). The profileChipLabel update only ran in the main path AFTER the early return. |
Update the chip inside the early-return branch before return;. |
| 4 | tests/test_issue1144_session_time_sync.py | Two skew-compensation tests asserted exact equality across two Date.now() calls; intermittent 1-2ms jitter caused flakes. |
Midpoint averaging with ±5ms tolerance. |
Plus 9 new tests in tests/test_profile_switch_1200.py (5 direct verification + 4 named regression guards).
Traced against upstream hermes-agent
Profile system is shared between WebUI and hermes-agent CLI. Pulled fresh nousresearch/hermes-agent tarball.
The agent reads profile state from ~/.hermes/profiles/<name>/ directly — no dependence on WebUI's request-scoped thread-local profile. The PR's fix reads from the target profile's resolved home directory, which is the same directory the agent uses. Cross-tool consistency: a workspace persisted by the agent CLI under <profile>/webui_state/last_workspace.txt would also be picked up by the WebUI's switch-profile flow. ✅
The webui_state/ subdirectory under each profile home is WebUI-specific (set up at api/workspace.py:46). The agent CLI writes its own state under different paths. No conflict.
End-to-end trace
Bug 1 — thread-local mismatch confirmation
api/profiles.py:94-104 — get_active_profile_name() returns _tls.profile (set per-request from cookie) or falls back to _active_profile (process-global). During a process_wide=False switch:
- Request arrives with the OLD profile cookie →
set_request_profile(OLD)was called by the routing layer - Handler calls
switch_profile(NEW, process_wide=False) - Inside the function,
_tls.profileis still OLD (we haven't received a new request yet) - The PR's pre-fix code called
get_last_workspace()→get_active_profile_name()→ returns OLD → reads OLD's last_workspace.txt → wrong workspace - Set-Cookie response is sent. NEXT request will have NEW cookie set in TLS, but THIS response already has the wrong default_workspace baked in.
The new code bypasses the thread-local by reading from home / 'webui_state' / 'last_workspace.txt' where home was already resolved at line 261-263 from the user-supplied name parameter. Direct, no thread-local indirection. ✅
Fallback chain order is sensible:
last_workspace.txt— most recent user-chosen for this profilecfg.workspace/cfg.default_workspace— explicit profile config keyscfg.terminal.cwd— agent CLI's working dir (most common shared key)DEFAULT_WORKSPACE— boot-time constant
Each step verifies is_dir() so a stale path doesn't propagate.
Bug 2 — cache invalidation timing
result = switch_profile(name, process_wide=False)
from api.config import invalidate_models_cache
invalidate_models_cache()
return j(handler, result, extra_headers={'Set-Cookie': build_profile_cookie(name)})invalidate_models_cache() clears BOTH the in-memory cache and the on-disk cache (api/config.py:1148-1172). The PR's claim about disk being single-per-server (not per-profile) is correct — verified by reading the cache implementation; full invalidation is necessary.
Order: switch happens, cache cleared, response sent with cookie. The next request (with NEW cookie) hitting /api/models cold-builds from the new profile's config. ✅
Bug 3 — early-return chip update
if(!S.session){
...
if(typeof syncAppTitlebar==='function') syncAppTitlebar();
// Update profile chip even when no session is active (e.g. right after profile switch)
const _profileLabel=$('profileChipLabel');
if(_profileLabel) _profileLabel.textContent=S.activeProfile||'default';
return;
}Verified the main-path chip update at the end of the function reads the same S.activeProfile value, so the new branch matches semantics. ✅
Bug 4 — flaky test stabilisation
The original tests asserted clientNow - serverNow === 3_600_000 (exact ms equality). Both clientNow and serverNow call Date.now(), but at different physical times — so a 1ms increment between the two calls produces a diff of 3,600,001 ms, failing the equality check. The fix samples t0 before and t1 after, takes the midpoint, asserts within ±5ms. Sound stabilisation pattern.
What I caught — latent NameError in the workspace fallback
The original PR's switch_profile code did a redundant local import:
from pathlib import Path as _Path # but Path is already imported at line 17In the exception handler:
except Exception:
try:
from api.config import DEFAULT_WORKSPACE as _DW2
default_workspace = str(_DW2)
except Exception:
default_workspace = str(_Path.home()) # _Path may be unboundIf the OUTER try fails before line from pathlib import Path as _Path runs (e.g., the from api.config import DEFAULT_WORKSPACE as _DW import on the line above raises), then _Path is never bound. The double-fallback at str(_Path.home()) would raise NameError rather than gracefully producing a path.
In practice this is unreachable — api.config is already imported at module load time (line 17), so a second import attempt won't suddenly fail. But the code surface is unnecessary.
What I pushed — 5634f21
Removed the redundant Path as _Path aliasing; uses the module-level Path everywhere in the workspace-derivation block. Same behaviour, no latent NameError, cleaner. CI re-ran green on 5634f21.
Edge-case trace
| Scenario | Behaviour |
|---|---|
Switch to profile with last_workspace.txt |
Returns that path ✅ |
Switch to profile with stale last_workspace.txt (deleted dir) |
is_dir() check skips, falls through to cfg keys ✅ |
Switch to profile with no webui_state/ yet |
Falls through to cfg keys → DEFAULT_WORKSPACE ✅ |
Switch to profile with config.yaml terminal.cwd only |
Returns that path ✅ |
| Switch to profile with all-bad workspace candidates | Returns DEFAULT_WORKSPACE ✅ |
| Switch with home directory unreadable | Falls through outer except → DEFAULT_WORKSPACE ✅ |
| Switch + immediate /api/models call | Models cache rebuilt from new profile ✅ |
| Switch with no active session (composer-only) | Profile chip updates ✅ |
| Switch with active session | Chip updates via main path (existing behaviour) ✅ |
Tests
- PR's 9 new profile-switch tests + my pushed cleanup: pass.
- #1144 stabilised tests: 23/23 in
test_issue1144_session_time_sync.py, including my prior fractional-offset additions. - Local full suite: 2728 passed, 47 skipped, 0 failures (the macOS
test_sprint3quirk now fixed by my #1186 in v0.50.231). - CI on PR after my push: ✅ test (3.11), ✅ test (3.12), ✅ test (3.13).
Other audit — confirmed correct
- JS syntax:
node --checkpasses. - No agent coupling: profile system is shared but the read pattern matches what the agent CLI does.
- Profile lock not held during workspace lookup: the fallback chain runs after
with _profile_lockexits, which is correct (the lookup is read-only against a stablehomepath). - Browser QA in PR description: 3 manual switch sequences (default→camanji, camanji→webui, webui→default) all show correct chip/model/workspace immediately. Matches the static-test contract.
Minor observations (non-blocking)
- The
process_wide=Falsebranch reads cfg via directyaml.safe_loadrather thanget_config()— that's intentional (the PR comment explains:_cfg_cacheis process-global and would still hold the OLD profile's config). The slight code duplication is a worthwhile trade-off for correctness. _DW/_DW2aliasing in the import statements (from api.config import DEFAULT_WORKSPACE as _DW) is unusual but harmless — protects against accidental shadowing within the function. Could be tightened to a single_DWoutside the try block.- The
cookie + cache-clear + responseordering means the cache is invalidated even if cookie-setting fails. Not a real concern since cookie-setting is just a header append.
Recommendation
Approved (with cleanup pushed). Three real, distinct bugs in profile switching all addressed at the right layer: backend reads from the target profile's home directly (closing the thread-local race), routes call cache-invalidate after the switch (closing the model-list race), frontend chip updates from the no-session path. The flaky-test fix using midpoint averaging is the right pattern. My cleanup removes a latent NameError + redundant import without changing behaviour. Comprehensive 9-test coverage with explicit named regression guards. CI green; full suite clean. Parked at approval — ready for the release agent's merge/tag pipeline.
|
Thanks for the thorough write-up — the root cause analysis for all three bugs is clear and the fixes look correct. Bug 1 (workspace): Reading directly from the target profile's Bug 2 (model dropdown): Calling Bug 3 (profile chip label): The early-return in Timing tests: The midpoint-averaging approach with ±5ms tolerance for the skew-compensation tests is a reasonable fix for CI flakiness under CPU load — cleaner than adding arbitrary sleeps. Test coverage: 9 targeted tests covering the bug paths plus regression guards looks solid. The 2773-passed / 0-failed / 2-skipped run is a good signal. Noted that the 2 skipped are intentional macOS-only tests. One thing worth confirming before merge: does Overall this looks mergeable — clean fixes, well-tested, browser-verified across four real profiles. |
…le switch fixes (#1206) fix: batch v0.50.234-235 — XSS hardening, workspace validation, profile switch fixes v0.50.235 (#1203 — profile switch workspace/model/chip, 3 bugs + flaky test): - switch_profile now reads target profile's workspace directly (thread-local bypass) - invalidate_models_cache() after profile switch (model dropdown staleness) - syncTopbar() updates chip before early-return (no-session path) v0.50.234 (#1201/#1205 — XSS hardening + workspace security): - renderMd() full HTML attribute sanitizer replacing tag-name-only allowlist - Delegated image lightbox (removes all inline onclick) - macOS /etc → /private/etc symlink bypass fixed - /System /Library added to blocked workspace roots - Legacy /api/chat workspace trust gap closed Both PRs independently reviewed. 2787/2787 tests. QA harness 20/20 + 11/11 API checks. Co-authored-by: Brendan Schmid <[email protected]> Co-authored-by: Nathan Esquenazi <[email protected]>
…le switch fixes (nesquena#1206) fix: batch v0.50.234-235 — XSS hardening, workspace validation, profile switch fixes v0.50.235 (nesquena#1203 — profile switch workspace/model/chip, 3 bugs + flaky test): - switch_profile now reads target profile's workspace directly (thread-local bypass) - invalidate_models_cache() after profile switch (model dropdown staleness) - syncTopbar() updates chip before early-return (no-session path) v0.50.234 (nesquena#1201/nesquena#1205 — XSS hardening + workspace security): - renderMd() full HTML attribute sanitizer replacing tag-name-only allowlist - Delegated image lightbox (removes all inline onclick) - macOS /etc → /private/etc symlink bypass fixed - /System /Library added to blocked workspace roots - Legacy /api/chat workspace trust gap closed Both PRs independently reviewed. 2787/2787 tests. QA harness 20/20 + 11/11 API checks. Co-authored-by: Brendan Schmid <[email protected]> Co-authored-by: Nathan Esquenazi <[email protected]>
Summary
Profile switching was broken in three distinct ways. After clicking a different profile in the composer chip dropdown, the workspace showed the wrong directory, the model dropdown kept the old profile's models, and the profile chip label stayed on the old name. After a page refresh the profile name corrected but workspace and model were still wrong.
Root causes were three separate bugs across backend and frontend. All fixed in this PR.
Bugs fixed
Bug 1 — wrong workspace immediately after switch
File:
api/profiles.py→switch_profile(process_wide=False)switch_profile()calledget_last_workspace()to populate thedefault_workspacefield in its return value.get_last_workspace()routes throughget_active_profile_name(), which reads the thread-local profile context. During the switch request, thread-local is still set to the old profile (theSet-Cookieresponse hasn't been processed by a new request yet). So it returned the old profile's workspace.Fix: Read directly from
home / 'webui_state' / 'last_workspace.txt'wherehomeis the already-resolved target profile directory. Falls back toconfig.yamlkeys (workspace,default_workspace,terminal.cwd), thenDEFAULT_WORKSPACE. Also addsexpanduser()on the file content (for~/...paths) and fixes the exception fallback to useDEFAULT_WORKSPACEinstead of re-callingget_last_workspace().Bug 2 — wrong model dropdown immediately after switch
File:
api/routes.py→/api/profile/switchhandlerThe in-memory models cache (
_available_models_cache) was not cleared after a profile switch. The client immediately calls/api/modelsafter switching, but a fresh cache from the old profile was returned unchanged.Fix: Call
invalidate_models_cache()in the route handler immediately afterswitch_profile()succeeds. Full invalidation (in-memory + disk) is required because the disk cache is a single per-server file, not per-profile.Bug 3 — profile chip label stays stale
File:
static/ui.js→syncTopbar()syncTopbar()has an early-return path whenS.sessionis null — which it is during a profile switch with no active conversation. TheprofileChipLabelupdate was only in the main path, after that early return, so it never ran when there was no session.Fix: Added the chip update inside the
!S.sessionearly-return branch, beforereturn;.Bug 4 + 4b — flaky timing tests
File:
tests/test_issue1144_session_time_sync.pyBoth skew-compensation tests (
positive_skew,negative_skew) calledDate.now()twice with an internal call in between and asserted exact equality. Under CPU load, 1ms jitter caused intermittent failure.Fix: Both tests now use midpoint averaging of
t0..t1with ±5ms tolerance.Tests
tests/test_profile_switch_1200.py— 9 tests total:Results: 2773 passed, 0 failed, 2 skipped (macOS-only tests, intentional)
Browser verification
Tested against a running server with four real profiles (default, camanji, webui, codepath):
camanji✅gpt-5.5✅Camanji✅webui✅default✅claude-sonnet-4.6✅Home✅All updates appear immediately after switch, no reload needed.
Closes #1204