fix(bootstrap): add --foreground mode for process supervisors (closes #1458 Bug #1)#1478
fix(bootstrap): add --foreground mode for process supervisors (closes #1458 Bug #1)#1478nesquena-hermes wants to merge 2 commits intomasterfrom
Conversation
…Bug #1) Issue #1458 reports persistent-host crashes (≥1/day) when running the WebUI under launchd KeepAlive on macOS. Root cause: `bootstrap.py` calls `subprocess.Popen([python, "server.py"], start_new_session=True)`, probes /health, then exits 0. Under any process supervisor (launchd, systemd, supervisord, runit, s6), the supervisor sees its tracked PID exit, marks the program as "completed," and respawns it. The new bootstrap fails to bind port 8787 (orphaned server still has it), exits non-zero, supervisor respawns again — loop until the orphan crashes for some other reason and the next respawn finds the port free. This PR addresses Bug #1 of the three failure modes tracked in #1458: the `bootstrap.py` double-fork breaking process supervisors. Bug #2 (state.db FD leak) and Bug #3 (HTTP-unhealthy wedge) remain open under the same issue — they need diagnosis data before a fix can land. Changes ------- 1. `bootstrap.py`: - New `--foreground` argparse flag with help text mentioning launchd / systemd / supervisord. - New `_detect_supervisor()` that returns the env var name for any supervisor it detects: `INVOCATION_ID` / `JOURNAL_STREAM` / `NOTIFY_SOCKET` (systemd, s6), `XPC_SERVICE_NAME` (launchd), `SUPERVISOR_ENABLED` (supervisord), or `HERMES_WEBUI_FOREGROUND` for the explicit user opt-in. Truthy values for the explicit opt-in: `1` / `true` / `yes` / `on` (case-insensitive). - `main()` branches on `args.foreground or _detect_supervisor()`: - **Foreground path:** chdir to `agent_dir or REPO_ROOT`, then `os.execv(python, [python, server_path])` to replace the bootstrap process image with the server. The supervisor sees the long-lived server as the original child. No `wait_for_health` probe — the supervisor's KeepAlive / Restart=on-failure handles liveness. - **Default path:** unchanged. Spawn server as detached child via `Popen + start_new_session=True`, probe /health, return 0. This still works for interactive `bash start.sh` invocations. - Resolved env vars (HOST/PORT/STATE_DIR/AGENT_DIR) are now mutated on `os.environ` directly instead of into a local `env` copy so they are inherited across `os.execv`. 2. `docs/supervisor.md` (new): runnable launchd plist, systemd .service, and supervisord conf examples + a diagnostic recipe (`lsof` + ppid chain) for catching the orphan-loop in production. 3. `.gitignore`: allowlist `docs/supervisor.md` (the directory uses an opt-in pattern; matches the existing `!docs/docker.md` precedent). 4. `tests/test_bootstrap_foreground.py` (new): 35 regression tests covering the argparse flag, `_detect_supervisor()` behavior across all five supervisor env vars, the explicit opt-in's truthy/falsy values, and `main()`'s execv-vs-Popen routing decision under each input combination. `os.execv` is monkeypatched in the routing tests — we pin the structural choice (which call is made, with which args, in which cwd, with which env) not the post-exec behavior. Why this scope and no more -------------------------- Bug #2 (state.db FD leak) lists 5 candidate paths and asks the reporter for `lsof -p <pid> | sort | uniq -c | sort -rn | head -20` output to disambiguate. Until that data lands, any "fix" would be speculative — explicitly out of scope per the contributor-pickup comment on the issue. Bug #3 (launchd-running, port-listening, HTTP-unhealthy) was added in @stefanpieter's reply comment. Diagnosis is in flight; no concrete fix shape yet. Also out of scope. Running locally end-to-end verifies the behavior: ``` [bootstrap] Starting Hermes Web UI on http://127.0.0.1:8789 (foreground mode: --foreground) $ pgrep -af 'server.py' 2997632 /home/.../python /tmp/wt-fix-1458/server.py $ ps -o ppid -p 2997632 2997581 ← bash that ran bootstrap.py — same PID as the original bootstrap $ ps -p 2997581 -o cmd ... bootstrap.py ... ← but exec'd into server.py ``` The same PID that bash forked for `bootstrap.py` is now `server.py`. A supervisor watching that PID would correctly observe the long-lived server. No double-fork. Verification ------------ - 3811 tests pass (`pytest tests/` — full suite, +51 from this PR plus master-merge-in) - All 35 new bootstrap-foreground tests pass - `bash scripts/run-browser-tests.sh` PASS (HTTP API checks against worktree) - `bash scripts/webui_qa_agent.sh 8789` PASS (23/23 visual QA) - Live verified: server starts cleanly under both `--foreground` and `HERMES_WEBUI_FOREGROUND=1`; PID lineage confirms no double-fork Closes #1458 (Bug #1 only). Bugs #2 and #3 remain tracked under the issue.
Three changes from the pre-merge Opus review: **MUST-FIX** — XPC_SERVICE_NAME false-positive on macOS Terminal macOS launchd sets `XPC_SERVICE_NAME` in EVERY Terminal-spawned shell, not just real services. Typical noise values: `"0"` (truthy in Python!) and `"application.com.apple.Terminal.<UUID>"`. A bare `os.environ.get(name)` existence check would auto-promote interactive `./start.sh` runs to foreground mode on every Mac dev machine — silently breaking the most common installation path (no /health probe, no browser open, no log file, hanging shell). Fix: new `_is_real_supervisor_value()` helper that filters noise. For `XPC_SERVICE_NAME` specifically, reject `"0"` and any `"application.*"` prefix. Real launchd plists use reverse-DNS Label form (`com.<rdns>.<svc>`) which still triggers correctly. 7 new tests in `TestXPCServiceNameNoiseFilter`: - 4 noise values (`0`, Terminal.app, iTerm2, VSCode) → no detection - 3 real Label forms → correct detection - Mixed env with XPC noise + real INVOCATION_ID → falls through to systemd **SHOULD-FIX 1** — Test env leakage The original `clean_env` fixture stripped supervisor-detection env vars but not the resolved bootstrap vars (HERMES_WEBUI_HOST/PORT/AGENT_DIR) that `main()` mutates onto `os.environ`. After `test_foreground_exports_resolved_env_vars` ran, later tests would import bootstrap with polluted defaults (DEFAULT_HOST="0.0.0.0" instead of "127.0.0.1"). Existing assertions still passed (tautological vs DEFAULT_*), but it was a footgun for future tests. Fix: extend `clean_env` to also `delenv` the three resolved vars before each test. **SHOULD-FIX 2** — Pre-execv executability guard If `discover_launcher_python` returns a path that doesn't exist or isn't executable, `os.execv` raises OSError → wrapper catches → SystemExit(1) → supervisor restarts → loop forever. That's exactly the failure mode this PR is supposed to eliminate. Fix: `os.access(python_exe, os.X_OK)` check before execv. Converts infinite supervisor loop into a single visible RuntimeError. 1 new test in `TestForegroundExecutabilityGuard` pinning that the guard fires before execv when the python path is non-executable. **Docs** — supervisor.md updates - New section explaining the XPC_SERVICE_NAME noise filter and what values trigger / don't trigger detection - New section listing supervisors that are NOT auto-detected (runit, daemontools, PM2, Foreman/Honcho, custom shell-script supervisors) with explicit recommendation to set HERMES_WEBUI_FOREGROUND=1 Verification - 3820 tests pass (+9 from this commit's new tests vs the original PR push of 3811) - Filter manually verified end-to-end with the live os.environ: XPC=0 → None, XPC=application.* → None, XPC=com.example.foo → triggers - run-browser-tests.sh ALL CHECKS PASSED on the worktree Items deferred from the Opus review - #4 chdir target may not exist: REPO_ROOT comes from __file__.resolve() so it's stable; not a real concern in practice - #6 two startup messages in foreground mode: cosmetic, useful for diagnostics - #7 stricter explicit-only mode: leaves user the override of just not passing --foreground (current behavior) - #8 test stub return value: trivial, can fix later if regression surface - #9 argparse positional-after-option ordering: test reads fine These can be follow-up issues if anyone hits them.
|
Pre-merge Opus review pass complete. Three changes folded in (commit 6a26e82): MUST-FIX — XPC_SERVICE_NAME false-positive on macOS TerminalOpus caught that macOS launchd sets Fix: new 7 new tests in
SHOULD-FIX 1 — Test env leakageThe original SHOULD-FIX 2 — Pre-execv executability guardIf Docs updates
Items Opus flagged but deferred
These can be follow-up issues if anyone hits them. Final verification
Ready for independent review. |
nesquena
left a comment
There was a problem hiding this comment.
Review — end-to-end ✅ (clean approve, behavioural harness validates noise filter and routing)
P1 fix for issue #1458 (Bug #1 only). Adds --foreground flag and supervisor-env auto-detection so bootstrap.py replaces its process image with server.py via os.execv under launchd/systemd/supervisord, instead of double-forking and exiting (which causes supervisor KeepAlive/Restart=always to respawn bootstrap.py while the orphaned server still holds port 8787 — the loop the issue describes).
Bugs #2 (state.db FD leak) and #3 (HTTP-unhealthy wedge) explicitly remain open under #1458 awaiting diagnostic data — correct scope discipline per the maintainer's pickup-comment.
What this ships
- bootstrap.py (+121/-9):
--foregroundargparse flag,_SUPERVISOR_ENV_VARStuple,_is_real_supervisor_value()XPC noise filter,_detect_supervisor(), and a two-pathmain()that exec's vs. Popens. - docs/supervisor.md (new, +237 LOC): runnable launchd plist + systemd
.service+ supervisord conf + diagnosticlsof/ppidrecipe. - .gitignore: allowlist
docs/supervisor.md(matches!docs/docker.mdprecedent). - tests/test_bootstrap_foreground.py (new, +455 LOC): 44 regression tests.
Traced against upstream hermes-agent
bootstrap.py is webui-only — verified by grepping the fresh tarball: agent-side bootstrap references are unrelated (launchctl bootstrap subcommand calls in hermes_cli/gateway.py). The CLI does not import or invoke webui's bootstrap.py. No cross-tool regression surface.
End-to-end trace
Entry points:
bash start.sh --foreground→exec "${PYTHON}" bootstrap.py --no-browser --foreground(start.sh:25) —execreplaces the shell, then bootstrap exec's into server.py. Result: same PID lineage from shell → server.- launchd plist with
KeepAlive=true→ spawnsbash start.sh --foreground→ same chain. - systemd
Type=simplewithExecStart=...start.sh --foreground→ bash exec'd by systemd, then bash exec's python, then bootstrap exec's server.py. Same lineage.
Routing decision at bootstrap.py:321:
foreground_reason = "--foreground" if args.foreground else _detect_supervisor()
if foreground_reason:
# chdir → executability guard → os.execv (never returns)
else:
# legacy: Popen + wait_for_health → return 0Foreground path (bootstrap.py:322-347):
- Log starting message naming the trigger reason
os.chdir(server_cwd)withOSError → RuntimeErrortranslationos.access(python_exe, os.X_OK)guard — without this, a missing python would loop the supervisor (the exact failure mode the PR fixes)os.execv(python_exe, [python_exe, server_path])— theargv[0] == argv[1]repetition is correct CPython convention- Unreachable
RuntimeErrorafter execv
Env propagation (bootstrap.py:307-312): os.environ is mutated directly (instead of a local env dict copy). This is required for os.execv since execv inherits os.environ — a local copy would be lost. The legacy Popen path now uses env=os.environ.copy() (bootstrap.py:358) which captures the same final state. Net effect for legacy path: identical to before.
Security audit
- ✅ No XSS, no SQL, no path traversal — bootstrap is local-only, no untrusted input.
- ✅
os.execv(python_exe, ...)argument comes fromdiscover_launcher_python()which respectsHERMES_WEBUI_PYTHONenv, then probes forvenv/bin/pythonunder known dirs. No user-supplied input reaches execv beyond local config. - ✅ The pre-execv
os.accesscheck is informational/defensive — there's no security boundary; this is just to convert silent loop into visible error. - ✅ No new endpoints, no auth changes, no
config.yamlwrites.
Race / state analysis
- ✅ Pure synchronous code, no shared dicts, no locks.
os.environmutation is single-threaded at startup (before any subprocess or thread spawns). - ✅ Idempotent:
os.environ.setdefault("HERMES_WEBUI_STATE_DIR", ...)doesn't override an externally provided value, butstate_dirwas computed from that same env var, so the value is consistent either way.
Behavioural harness — XPC noise filter
✅ XPC_SERVICE_NAME='0' → False (Mac launchd descendant noise)
✅ XPC_SERVICE_NAME='application.com.apple.Terminal.UUID' → False (Terminal.app)
✅ XPC_SERVICE_NAME='' → False (empty)
✅ XPC_SERVICE_NAME='com.example.foo' → True (real Label)
✅ XPC_SERVICE_NAME='application' → True (no dot prefix — passes; theoretical edge)
✅ XPC_SERVICE_NAME='applicationCo.foo' → True (no dot — passes; theoretical edge)
✅ INVOCATION_ID='0' → True (no narrowing for non-XPC vars)
✅ INVOCATION_ID='' → False (empty)
✅ NOTIFY_SOCKET='/run/systemd/notify' → True
The filter correctly rejects Apple's known Terminal/IDE noise values. The "exactly application (no dot)" and "applicationCo.foo" cases are theoretical — Apple's actual noise always uses application.<bundle-id> form. Acceptable.
Edge-case matrix
| Scenario | Pre-PR | Post-PR |
|---|---|---|
bash start.sh (interactive Mac, XPC_SERVICE_NAME=0) |
Popen, healthy | Popen (XPC=0 rejected) ✅ |
bash start.sh (interactive Mac, XPC=application.com.apple.Terminal) |
Popen, healthy | Popen (rejected) ✅ |
launchd plist with Label com.example.foo (KeepAlive=true) |
Loops forever (orphan + respawn) | execv → supervisor sees long-lived server ✅ |
systemd Type=simple with Restart=always |
Loops forever | execv (auto-detected via INVOCATION_ID) ✅ |
supervisord with autorestart=true |
Loops forever | execv (auto-detected via SUPERVISOR_ENABLED) ✅ |
| runit / daemontools (no env var set) | Loops forever | Loops forever unless user sets HERMES_WEBUI_FOREGROUND=1 |
HERMES_WEBUI_FOREGROUND=1 (no flag) |
Popen | execv ✅ |
HERMES_WEBUI_FOREGROUND=0 (no flag, no supervisor env) |
Popen | Popen (falsy falls through) ✅ |
Both --foreground and INVOCATION_ID set |
n/a | execv, reason='--foreground' (flag wins) ✅ |
python_exe missing or non-executable in foreground mode |
execv raises → wrapper SystemExit(1) → supervisor restart loop | RuntimeError before execv with actionable message ✅ |
--foreground + positional port: bootstrap.py --foreground 9999 |
n/a | argparse handles store_true + nargs='?' interleaving ✅ |
| Legacy Popen path env content | local env copy with HOST/PORT mutated |
os.environ.copy() after os.environ mutation — identical net contents ✅ |
Tests
tests/test_bootstrap_foreground.py— 44/44 pass:TestForegroundFlag(3): argparse recognition + default + help text mentions all 3 supervisorsTestDetectSupervisor(15): clean env, each of 5 supervisor vars, 7 truthy + 7 falsy explicit-opt-in values, precedenceTestXPCServiceNameNoiseFilter(8): 4 noise values rejected, 3 real Labels accepted, mixed-env fallthroughTestMainForegroundRouting(8): default→Popen, --foreground→execv, each supervisor var auto-promotes, explicit-opt-in auto-promotesTestForegroundEnvAndCwd(3): chdir to agent_dir, exports resolved env to os.environ, skips wait_for_healthTestForegroundExecutabilityGuard(1): non-executable python raises RuntimeError before execv
- Existing
test_bootstrap_dotenv.pycontinues to pass. - Full suite: 3767 passed, 55 skipped, 3 xpassed, 0 failed in 18.84s on the PR branch (PR description claims 3811/3820; counting drift is consistent with prior batches and not blocking).
- CI: 3.11 / 3.12 / 3.13 all green.
Other audit — things that are correct already
- ✅
argv[0] == argv[1]foros.execv([python, [python, server.py]])is correct CPython convention. - ✅
start.shalready usesexec "${PYTHON}" ... "$@"so--foregroundpasses through with the bash process replaced (clean PID lineage). - ✅ Legacy Popen path
env=os.environ.copy()is a snapshot taken AFTER the os.environ mutations, so the resolved values still flow to the child. - ✅
_detect_supervisor()short-circuits on--foregroundflag — no double-evaluation cost or precedence ambiguity. - ✅
_load_repo_dotenv()runs at import time beforeDEFAULT_HOST/DEFAULT_PORTare read; tests freshly re-import bootstrap to avoid module-level state bleed. - ✅ The
clean_envfixture stripsHERMES_WEBUI_HOST/PORT/AGENT_DIRafter the agent-flagged leak — verified at tests/test_bootstrap_foreground.py:73-83.
Minor observations (non-blocking)
-
Executability guard converts loop to error, but supervisor still restarts on RuntimeError → SystemExit(1). The improvement is observability (the stderr message points at
HERMES_WEBUI_PYTHON), not loop prevention — the supervisor will still respawn until the user fixes the python path. A one-time loud failure is much better than silent flapping, but the comment "Convert to a single visible error" slightly oversells: it's a single clearer error per restart. Worth a tiny doc tweak in supervisor.md ("if you see this error in journald, fixHERMES_WEBUI_PYTHON— the supervisor will keep restarting until you do") but not blocking. -
XPC_SERVICE_NAMEfilter acceptsapplication(no dot) andapplicationCo.foo. Thevalue.startswith("application.")check requires a literal dot, so the bare word "application" or words starting with "applicationXxx" would pass through as if they were real Labels. In practice Apple always usesapplication.<bundle-id>form — no observed false negative. Could tighten tovalue == "application" or value.startswith("application.")but the value is bounded by what launchd actually emits. -
No equivalent of
HERMES_WEBUI_FOREGROUNDfor forcing OFF. If a user is in a weird env that auto-detects (say, a Docker container that happened to inheritINVOCATION_ID) and wants to force the legacy Popen path, the only escape isunset INVOCATION_ID && bash start.sh. AHERMES_WEBUI_FOREGROUND=0doesn't currently force-disable detection (the falsy values fall through to env probing — by design per the test suite). Consider adding aHERMES_WEBUI_NO_FOREGROUND=1override later if anyone hits this. Documented in supervisor.md anti-recipe section is sufficient for now. -
PR description test count off by ~50 (3811/3820 claimed vs 3767 local). Same drift pattern as prior batches.
-
docs/supervisor.mdrunit/daemontools recommendation is sound — those supervisors don't set any of our detected env vars, so the explicitHERMES_WEBUI_FOREGROUND=1instruction is correct.
Recommendation
✅ Approved. The fix correctly converts the supervisor double-fork loop into a clean os.execv replacement so KeepAlive/Restart=always work as intended. The XPC noise filter prevents auto-promotion on every Mac dev machine — a real regression risk caught in pre-merge review. The executability guard converts a silent restart loop into a visible error, which is the right tradeoff. Behavioural harness on the noise filter and 44/44 regression tests confirm structural correctness; live PID-lineage verification in the PR description proves end-to-end behavior.
Bugs #2 and #3 explicitly out of scope per maintainer guidance — correct discipline.
Parked at approval — ready for the release agent's merge/tag pipeline.
|
Shipped in v0.50.269 via release PR #1483. The work landed in master via the stage-269 batch — GitHub didn't auto-mark this PR as "merged" through the no-ff merge, so closing manually. The actual merge commit is Closes #1458 Bug #1. Bugs #2 (state.db FD leak) and #3 (HTTP-unhealthy wedge) remain open under #1458 awaiting diagnostic data per the original PR's scope discipline. |
…267 follow-ups - CHANGELOG.md: v0.50.269 entry detailing nesquena#1478 nesquena#1479 nesquena#1480 - ROADMAP.md: bump to v0.50.269, 3847 tests collected - TESTING.md: bump header + total to 3847 nesquena#1478: nesquena APPROVED self-built bootstrap.py --foreground mode (closes nesquena#1458 Bug nesquena#1, +Opus follow-ups: XPC noise filter, executability guard) nesquena#1479: surgical follow-up to nesquena#1473 — Session.compact() now includes pending_user_message nesquena#1480: bfcache pageshow restores active session via loadSession + checkInflightOnBoot 3847 tests pass (+47 net). Opus advisor on stage diff: no blockers.
…nto v0.50.269 The PR added an `agent_dir` parameter to ensure_python_has_webui_deps. The test_bootstrap_foreground.py tests (added in nesquena#1478) had `lambda p: p` stubs that were 1-arg only. Widened to `lambda *a, **kw: a[0]` so the stubs accept the new signature on the rebased base. Co-authored-by: ccqqlo <[email protected]>
…nto v0.50.269 The PR added an `agent_dir` parameter to ensure_python_has_webui_deps. The test_bootstrap_foreground.py tests (added in nesquena#1478) had `lambda p: p` stubs that were 1-arg only. Widened to `lambda *a, **kw: a[0]` so the stubs accept the new signature on the rebased base. Co-authored-by: ccqqlo <[email protected]>
nesquena#1315) - CHANGELOG.md: v0.50.270 entry detailing nesquena#1315 + maintainer follow-ups - ROADMAP.md: bump to v0.50.270, 3849 tests collected - TESTING.md: bump header + total to 3849 - bootstrap.py: Opus advisor optional-followup — PYTHONPATH prepend comment nesquena#1315 by @ccqqlo (113 LOC): bootstrap.py validates launcher Python can import both yaml and run_agent.AIAgent. Companion fix to v0.50.269's nesquena#1478 — addresses the start-healthy-then-cryptic-fail mode (different from nesquena#1478's supervisor-respawn loop). 3849 tests pass. Opus advisor verdict: ship as-is. CI green on contributor branch + on local stage. QA harness all green.
Closes #1458 (Bug #1 only)
Self-built fix for the
bootstrap.pydouble-fork that breaks process supervisors. Bugs #2 (state.db FD leak) and #3 (HTTP-unhealthy wedge) remain open under #1458 — they need diagnosis data before a non-speculative fix can land.Problem
Reporter: at least one crash per day running a persistent WebUI on a Mac mini under launchd KeepAlive.
Root cause (verified against master):
bootstrap.py:243-268callssubprocess.Popen([python_exe, "server.py"], start_new_session=True), probes/health, then exits 0. Under any process supervisor (launchd, systemd, supervisord, runit, s6), the supervisor sees the parent exit, marks the program as "completed," and respawns it. The new bootstrap fails to bind 8787 (orphaned server still has it), exits non-zero, supervisor respawns again — loop until something else crashes the orphan and the next respawn finds the port free. Reporter described this as "the agent fixes it eventually" — that's the loop intermittently succeeding.Fix
Add
--foregroundflag (and supervisor-environment auto-detection). In foreground mode, replace the bootstrap process image withserver.pyviaos.execvso the supervisor sees the long-lived server as the original child. KeepAlive / Restart=always now work correctly.Auto-detected env vars
INVOCATION_IDJOURNAL_STREAMNOTIFY_SOCKETType=notify, s6 sd_notifyXPC_SERVICE_NAMESUPERVISOR_ENABLEDHERMES_WEBUI_FOREGROUND1/true/yes/on)Routing decision
Resolved env vars (HOST/PORT/STATE_DIR/AGENT_DIR) are mutated on
os.environdirectly so they surviveos.execv.Files changed
bootstrap.py(+82/-9): argparse flag,_detect_supervisor(), two-pathmain()docs/supervisor.md(new, +185 LOC): runnable launchd plist + systemd .service + supervisord conf + diagnostic recipe.gitignore(+1): allowlistdocs/supervisor.md(matches!docs/docker.mdprecedent)tests/test_bootstrap_foreground.py(new, +380 LOC): 35 regression testsTests
35 new tests in
tests/test_bootstrap_foreground.py:--foregrounduses execv not Popen; each of 5 supervisor env vars auto-promotes to execv;HERMES_WEBUI_FOREGROUND=1auto-promotes to execvagent_dir or REPO_ROOTbefore exec; exports HOST/PORT/AGENT_DIR/STATE_DIR toos.environ; skipswait_for_healthin foreground modeos.execvis monkeypatched in the routing tests — we pin the structural choice (which path, which cwd, which env) not the post-exec behavior. Existingtest_bootstrap_dotenv.py(16 tests) continues to pass.Verification
pytest tests/ -q→ 3811 passed, 2 skipped, 3 xpassed (was 3760 before this PR + master-merge-in)bash scripts/run-browser-tests.sh→ ALL CHECKS PASSED (HTTP API contract checks against the worktree)bash scripts/webui_qa_agent.sh 8789→ 23/23 visual QA passed against a live server started withHERMES_WEBUI_FOREGROUND=1HERMES_WEBUI_FOREGROUND=1, the python process serving on 8789 is the same PID that bash originally forked forbootstrap.py. No double-fork.Backward compatibility
Default behavior (interactive
bash start.sh) is unchanged. The new path is opt-in via flag or supervisor-env detection. No breaking changes.Out of scope (explicitly)
lsof -p <pid> | sort | uniq -c | sort -rn | head -20) requested from reporter. Don't speculate-fix.docs/supervisor.md. A first-class plist asset is a separate decision.Reviewer notes
os.environmutation (instead of localenvcopy) is a behavioral change in the legacyPopenpath too: previously thePopengot a copy with HOST/PORT etc. baked in; nowos.environitself has them set. This is actually a bug fix — if someone's chained scripts inspectedos.environ['HERMES_WEBUI_PORT']after bootstrap returned, they'd previously have seen the raw user value, not the resolved port. Verified the legacy path's behavior is otherwise identical via the existingtest_bootstrap_dotenv.pysuite.os.execvargv convention: argv[0] is the program name, argv[1..] are the actual script and args. We pass[python_exe, server_path]where argv[0] == argv[1] (path repeated) — this is correct and matches CPython's convention foros.execv.