Skip to content

fix: validate WebUI launcher can import agent#1315

Closed
ccqqlo wants to merge 3 commits intonesquena:masterfrom
ccqqlo:fix/webui-agent-python-launcher
Closed

fix: validate WebUI launcher can import agent#1315
ccqqlo wants to merge 3 commits intonesquena:masterfrom
ccqqlo:fix/webui-agent-python-launcher

Conversation

@ccqqlo
Copy link
Copy Markdown
Contributor

@ccqqlo ccqqlo commented Apr 30, 2026

Summary

  • Validate the bootstrap Python can import both WebUI dependencies and run_agent.AIAgent
  • Fall back to the Hermes Agent venv when the launcher Python is only the WebUI venv
  • Fail loudly during bootstrap instead of starting healthy and later returning AIAgent not available
  • Keep pytest server auth disabled even when HERMES_WEBUI_PASSWORD is set in the shell

Test plan

  • /root/.hermes/hermes-agent/venv/bin/python -m pytest tests/test_bootstrap_python_selection.py tests/test_regressions.py::test_chat_start_returns_stream_id tests/test_regressions.py::test_chat_stream_opens_successfully tests/test_regressions.py::test_aiagent_imported_in_streaming -q

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Thanks for catching this — the underlying problem is real and the fix is well-scoped.

Confirmed against master

bootstrap.py currently has a one-line readiness check (import yaml) that only validates the WebUI's own dependencies. If discover_launcher_python() happens to find a Python that has pyyaml but not the Hermes Agent on its sys.path, bootstrap returns happy and server.py starts cleanly. The failure only surfaces later, at the point a user actually starts a chat:

api/streaming.py → AIAgent not available

That's exactly the "starts healthy then chat is broken" failure mode you describe, and it has been a recurring source of confusion (people see green /health, assume the server is fine, then can't figure out why streams 500). Validating both yaml and run_agent.AIAgent at bootstrap so the failure is loud and immediate is the right call.

Fix approach is reasonable

Three observations on the implementation:

  1. PYTHONPATH injection. Adding agent_dir to the subprocess PYTHONPATH so from run_agent import AIAgent resolves matches how server.py actually wires up imports at runtime, so the probe accurately reflects what the launched process will see. Good.
  2. Agent-venv fallback before creating a local .venv. Preferring an existing agent venv that already has both sets of deps is the right priority order — creating a fresh .venv and pip-installing requirements should be the last resort, not the second one.
  3. Loud failure with a useful message. The RuntimeError mentioning HERMES_WEBUI_PYTHON gives the user a concrete next step. That's much better than the silent "AIAgent not available" failure mode.

A couple of small notes

  • HERMES_WEBUI_PASSWORD in tests/conftest.py. Setting it to "" is the right call (you noted the rationale). Worth a comment near the line so future maintainers don't think it's a stray test-env tweak — something like # Force-disable auth even when the developer has password set in their shell.
  • Subprocess timeout. subprocess.run(...) with no timeout could in theory hang during bootstrap if Python startup is slow (corrupted venv, network-mounted home, etc.). A timeout=30 would make this fail-fast.
  • Test on Windows path separators. The new tests use tmp_path which is fine, but the agent_candidates list explicitly enumerates Scripts/python.exe paths — worth a parametrized test that covers the Windows case so the cross-platform branch doesn't bit-rot.

Recommendation

Solid bug-prevention PR. Tests are good (the AST-style probe replacement via monkeypatch.setattr is a clean way to drive ensure_python_has_webui_deps without touching the real filesystem). Once the small comment + optional timeout are addressed, this looks ready.

🤖 Automated triage via nesquena-hermes

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Thanks @ccqqlo! Putting this on hold pending CI review and a smaller scope split.

Why it's on hold

CI on this branch is failing (FAILURE on the main pytest run, two CANCELLED jobs). Before we can merge, the bootstrap-validation tests need to pass on a clean run.

Scope feedback

The PR mixes three orthogonal concerns:

  1. Bootstrap Python validation — verifying the launcher Python can import run_agent.AIAgent, falling back to the Hermes Agent venv when only the WebUI venv is on PATH. This is the headline change and the most useful one.
  2. bootstrap.py fail-loud-on-bad-import — fail at bootstrap rather than starting healthy and returning AIAgent not available later. Defensible, but it's a behaviour change for environments that previously worked degraded.
  3. tests/conftest.py — keep pytest server auth disabled when HERMES_WEBUI_PASSWORD is set in the shell. This is a useful test-isolation fix but unrelated to the bootstrap concern.

What I'd like to see

  • Land the conftest.py auth-isolation fix on its own as a tiny PR (it's a clear test-infrastructure win — should land independently and quickly).
  • Split the bootstrap validation + Hermes Agent venv fallback into a focused PR with the failing CI run debugged. If AIAgent can't be imported, what does the bootstrap do — abort with a clear message, or silently fall back? The current behaviour of "starting healthy then returning AIAgent not available" is the bug, but the fix needs to cleanly distinguish "no agent available, abort" from "agent at the alt path, use it".
  • Once we see what specifically failed in CI, we can iterate.

Happy to review the splits when you're ready. Let me know if you want me to extract just the conftest.py change into a quick first PR while you debug the bootstrap CI failures.

@nesquena-hermes nesquena-hermes force-pushed the fix/webui-agent-python-launcher branch from ea175e9 to 23880e1 Compare May 2, 2026 19:35
…nto v0.50.269

The PR added an `agent_dir` parameter to ensure_python_has_webui_deps. The
test_bootstrap_foreground.py tests (added in nesquena#1478) had `lambda p: p` stubs
that were 1-arg only. Widened to `lambda *a, **kw: a[0]` so the stubs
accept the new signature on the rebased base.

Co-authored-by: ccqqlo <[email protected]>
@nesquena-hermes nesquena-hermes force-pushed the fix/webui-agent-python-launcher branch from 23880e1 to 0076f3d Compare May 2, 2026 19:35
@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Rebased onto current master and resolved the test compatibility issue.

What we kept: your full original change to bootstrap.py — the agent-import validation in _python_can_run_webui_and_agent, the agent-venv preference in ensure_python_has_webui_deps, and the loud failure when no interpreter can import both WebUI deps and run_agent.AIAgent. Plus the HERMES_WEBUI_PASSWORD="" fixture cleanup in tests/conftest.py.

What we changed: 3 lambda stubs in tests/test_bootstrap_foreground.py (added in #1478 / v0.50.269). Your PR added a second positional argument (agent_dir) to ensure_python_has_webui_deps, but those stubs were lambda p: p (1-arg). Widened to lambda *a, **kw: a[0] so they accept the new signature.

Why this PR is still worth shipping: it solves a different production failure mode than #1478. #1478 fixed the supervisor-respawn loop (bootstrap exits, supervisor respawns, port 8787 still bound by orphan). Your PR fixes the "starts healthy, chat fails later with AIAgent not available" failure mode — the discovered launcher Python has yaml but doesn't have the agent module on its import path. Today the server reports a successful boot then 500s on the first chat with a cryptic error; with your change, bootstrap fails loud at startup with a clear message pointing at HERMES_WEBUI_PYTHON.

Test results after rebase: 3849 passed, 0 failures (was 3847 before). Your two test_bootstrap_python_selection.py tests pass + the 44 test_bootstrap_foreground.py tests from #1478 pass with the widened stubs.

Force-pushed back to your branch with --force-with-lease. Pull before any further changes:

git fetch origin
git reset --hard origin/fix/webui-agent-python-launcher

Routing this into the v0.50.270 batch release. Thanks for the patient diagnosis on this one — the "fail loudly during bootstrap instead of starting healthy and later returning AIAgent not available" framing in your PR description is exactly the right shape.

@nesquena nesquena removed the hold label May 2, 2026
The test_ensure_python_fails_loudly_when_no_interpreter_can_import_agent
test was passing locally but failing on CI runners because:

1. CI runners don't have REPO_ROOT/.venv/bin/python on the filesystem
2. The function path on missing venv calls venv.EnvBuilder(with_pip=True).create()
3. That internally calls subprocess.check_output() — a different code path
   than the monkey-patched bootstrap.subprocess.run, which only stubs run().
4. CI fails with: AttributeError: NoneType has no attribute stdout

The behavior under test is "what happens when no interpreter can import
both WebUI deps and the agent" — NOT the venv-creation path. So we sidestep
EnvBuilder by setting REPO_ROOT to tmp_path with a pre-existing
.venv/bin/python file. The venv-existence check passes, EnvBuilder is
skipped, the stubbed _python_can_run_webui_and_agent returns False on the
final check, and the expected RuntimeError fires.

Co-authored-by: ccqqlo <[email protected]>
nesquena-hermes added a commit that referenced this pull request May 2, 2026
v0.50.270 — Bootstrap launcher import validation (#1315) + Opus follow-up
@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Shipped in v0.50.270 via release PR #1487. The work landed in master via the stage-270 batch — GitHub didn't auto-mark this PR as "merged" through the no-ff merge, so closing manually. Tag v0.50.270 is live, the actual merge commit is 58571c9 fix(bootstrap): validate WebUI launcher can import agent (#1315) on master.

Together with #1478 (v0.50.269), this completes the Bug #1 family of bootstrap.py failure modes from #1458. Both modes are now caught at boot time with clear errors instead of leaking into runtime. Thanks for the focused diagnosis — the "fail loudly during bootstrap instead of starting healthy and later returning AIAgent not available" framing was exactly the right shape.

Bugs #2 (state.db FD leak) and #3 (HTTP-unhealthy wedge) under #1458 remain open awaiting diagnostic data.

pull Bot pushed a commit to soitun/hermes-webui that referenced this pull request May 2, 2026
pull Bot pushed a commit to soitun/hermes-webui that referenced this pull request May 2, 2026
nesquena#1315)

- CHANGELOG.md: v0.50.270 entry detailing nesquena#1315 + maintainer follow-ups
- ROADMAP.md: bump to v0.50.270, 3849 tests collected
- TESTING.md: bump header + total to 3849
- bootstrap.py: Opus advisor optional-followup — PYTHONPATH prepend comment

nesquena#1315 by @ccqqlo (113 LOC): bootstrap.py validates launcher Python can
import both yaml and run_agent.AIAgent. Companion fix to v0.50.269's nesquena#1478
— addresses the start-healthy-then-cryptic-fail mode (different from nesquena#1478's
supervisor-respawn loop).

3849 tests pass. Opus advisor verdict: ship as-is. CI green on contributor
branch + on local stage. QA harness all green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants