Skip to content

fix: use spawn for manual cron subprocesses#1767

Closed
Michaelyklam wants to merge 1 commit intonesquena:masterfrom
Michaelyklam:fix/issue-1754-cron-spawn
Closed

fix: use spawn for manual cron subprocesses#1767
Michaelyklam wants to merge 1 commit intonesquena:masterfrom
Michaelyklam:fix/issue-1754-cron-spawn

Conversation

@Michaelyklam
Copy link
Copy Markdown
Contributor

@Michaelyklam Michaelyklam commented May 7, 2026

Thinking Path

  • Manual WebUI cron runs execute from a multi-threaded server process.
  • The previous subprocess isolation fix kept long cron jobs out of the parent cron/profile lock, but still used fork.
  • Forking from a threaded server can inherit held import/logging/FD state into the child with no owning thread left to release it.
  • The child entry point is already top-level and picklable, so spawn is the narrow safer process boundary.
  • The queue drain-before-join behavior from PR fix: shorten cron profile lock for manual runs #1746 must stay intact because large child results can otherwise deadlock the parent.

What Changed

  • Switched _run_cron_job_in_profile_subprocess() from multiprocessing.get_context("fork") to multiprocessing.get_context("spawn").
  • Added regression coverage that contrasts a held module-level lock under fork vs spawn, demonstrating the ownership model that makes spawn safe.
  • Updated cron subprocess tests so spawn children import an isolated fake Hermes agent/cron package instead of relying on parent-process monkeypatched modules.
  • Kept the >100 KB queue drain-before-join regression and adjusted manual persistence tests to mock the subprocess boundary directly.

Fixes #1754

Why It Matters

  • Avoids probabilistic deadlocks and inherited-state hazards when manual cron runs are launched while sibling WebUI threads hold import/logging locks or other process resources.
  • Preserves the prior large-output subprocess fix while moving to a safer child-process startup model.

Verification

  • env -u HERMES_CONFIG_PATH -u HERMES_WEBUI_HOST /home/michael/.hermes/hermes-agent/venv/bin/python -m pytest tests/test_issue1574_cron_profile_lock.py tests/test_scheduled_jobs_profile_isolation.py tests/test_issue617_cron_profile_selector.py tests/test_cron_manual_run_persistence.py tests/test_cron_run_job_import.py -q25 passed in 3.63s
  • git diff --check → passed
  • env -u HERMES_CONFIG_PATH -u HERMES_WEBUI_HOST /home/michael/.hermes/hermes-agent/venv/bin/python -m pytest tests/ -q4659 passed, 2 skipped, 3 xpassed, 1 warning, 8 subtests passed in 382.22s

Risks / Follow-ups

  • spawn has a small startup overhead because it starts a fresh Python interpreter; manual cron jobs are long-running enough that this should be negligible.
  • The tests now use an isolated fake agent package for spawn child imports, which is intentionally closer to the real import boundary than parent-process sys.modules monkeypatching.

Model Used

  • OpenAI Codex gpt-5.5 via Hermes Agent CLI.
  • Tool use: Kanban task context, shell/git/pytest, file patching, and GitHub CLI.

@Michaelyklam Michaelyklam force-pushed the fix/issue-1754-cron-spawn branch from a8d26b6 to 5ec6888 Compare May 7, 2026 00:45
@Michaelyklam Michaelyklam changed the title fix: spawn manual cron subprocesses fix: use spawn for manual cron subprocesses May 7, 2026
nesquena-hermes added a commit that referenced this pull request May 7, 2026
@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Thanks @Michaelyklam — this shipped in v0.51.15 (commit 9cc1062) as part of a 4-PR full-sweep batch. Stage rebased your branch onto current master, ran the full pre-release gate (4687 pytest, browser tests, Opus advisor verdict SHIP all 4), and merged via release PR #1777.

A small auto-fix was applied at stage time: 2 of the 5 new tests were failing on dev machines that have an editable hermes_agent install — the spawn child resolves the real cron.scheduler first instead of the fake one written under HERMES_WEBUI_AGENT_DIR. Added _real_hermes_agent_editable_install_present() detector + pytest.skip guard so they skip on dev (where they cannot work as designed) and run cleanly on CI. Per Opus advisor SHOULD-FIX, the detector uses importlib.util.find_spec origin check instead of name-pattern matching against site-packages entries.

GitHub didn't auto-close because the merge commit only references the squash-merged stage branch, not your fork's commit directly — closing manually for hygiene.

Live now on existing installs after git pull + restart.

Release notes: https://github.com/nesquena/hermes-webui/releases/tag/v0.51.15

zhuyeqi pushed a commit to zhuyeqi/hermes-webui that referenced this pull request May 7, 2026
zhuyeqi pushed a commit to zhuyeqi/hermes-webui that referenced this pull request May 7, 2026
…1767, nesquena#1769, nesquena#1770)

Constituent PRs:
- nesquena#1762 (@bergeouss) openrouter/ prefix for tencent/hy3-preview:free. Closes nesquena#1744.
- nesquena#1767 (@Michaelyklam) use spawn for manual cron subprocesses. Closes nesquena#1754.
  AUTO-FIX applied: 2 tests skip on dev machines with editable hermes_agent
  install (the spawn child resolves the real cron.scheduler first instead of
  the fake one). Tightened detector to use importlib.util.find_spec origin
  check per Opus stage-309 SHOULD-FIX.
- nesquena#1769 (@nesquena-hermes, APPROVED by @nesquena) three context-menu
  essentials from nesquena#1764: Reveal-in-finder, Copy-path, Open-with-system.
- nesquena#1770 (@Michaelyklam) surface Codex usage exhaustion errors. Closes nesquena#1765.

Tests: 4662 → 4694 collected (+32). 4687 passed, 4 skipped (2 dev-only +
2 prong-2 noise), 3 xpassed, 0 failed in 135s.

Pre-release verification:
- All 4 PRs CI-green individually.
- node -c clean on all 4 changed JS files.
- 11/11 browser API endpoints PASS.
- Pre-stamp re-fetch: all PR heads match local rebases.
- Opus advisor: SHIP, all 5 verification questions clean, 0 MUST-FIX,
  2 SHOULD-FIX (one absorbed: detector tightening; one filed as nesquena#1776
  follow-up: custom provider + :free suffix edge case in nesquena#1762).

Closes nesquena#1744, nesquena#1754, nesquena#1764, nesquena#1765.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate cron subprocess from fork to spawn (PR #1746 follow-up)

2 participants