Skip to content

Recurring cron jobs are marked completed/off when croniter is missing #16265

@franksong2702

Description

@franksong2702

Summary

Recurring cron jobs can be permanently marked completed/off if the gateway process starts in a Python environment where croniter is not importable.

This is a different failure mode from a paused job: the job remains a recurring cron schedule, but after a run the persisted state becomes:

{
  "enabled": false,
  "state": "completed",
  "next_run_at": null
}

From the user's point of view, the job silently turns off.

Root cause

In cron/jobs.py:

  • croniter import failure sets HAS_CRONITER = False.
  • compute_next_run() returns None for schedule["kind"] == "cron" when HAS_CRONITER is false.
  • mark_job_run() treats next_run_at is None as terminal completion and writes enabled=false, state=completed.

That is safe for one-shot jobs, but unsafe for recurring cron schedules. A missing runtime dependency should be surfaced as a scheduler/runtime error, not persisted as successful job completion.

Reproduction

  1. Create a recurring cron job with a cron expression, for example:

    0 7,15,23 * * *
    
  2. Run the gateway from a Python environment where croniter is not installed/importable.

  3. Let the job run once.

  4. Inspect ~/.hermes/cron/jobs.json.

Observed result:

{
  "schedule": {"kind": "cron", "expr": "0 7,15,23 * * *"},
  "repeat": {"times": null, "completed": 17},
  "enabled": false,
  "state": "completed",
  "next_run_at": null,
  "last_status": "ok"
}

Expected behavior

For recurring cron jobs, failure to compute the next run due to missing croniter should not silently disable the job.

Possible expected handling:

  • keep enabled=true and set state=error, or
  • preserve the previous next_run_at and set last_error, or
  • make compute_next_run() raise a clear scheduler dependency error for cron schedules when croniter is missing.

The important invariant is: a recurring cron job should not be marked completed unless it actually reached a terminal repeat/one-shot condition.

Related issues / PRs

Environment

  • macOS
  • Hermes gateway managed as a long-running daemon
  • Gateway Python env without croniter
  • Another local Hermes/WebUI Python env had croniter, which is why this only showed up in the gateway runtime

Suggested fix

Narrow fix in cron/jobs.py:

  • distinguish terminal one-shot completion from failed next-run computation;
  • for schedule.kind == "cron" with HAS_CRONITER == False, return/raise a clear error instead of None;
  • update mark_job_run() so next_run_at is None only disables one-shot/repeat-completed jobs, not recurring cron jobs;
  • add a regression test that simulates HAS_CRONITER=False for a recurring cron schedule and asserts the job is not persisted as enabled=false, state=completed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/cronCron scheduler and job managementtype/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions