Summary
Vigilante can leave a session marked running in local state even after the implementation actually completed and the corresponding GitHub issue/PR moved on. Add stale-session recovery that re-checks GitHub issue and pull request status so the daemon can reconcile bad local state instead of continuing to report false stale-running sessions.
Problem
- Vigilante can persist a session as
running in ~/.vigilante/sessions.json even though the implementation finished successfully.
vigilante status then reports a stale session based on old last_heartbeat_at / updated_at data, even when GitHub already shows the issue and PR in a terminal or clearly advanced state.
- This creates misleading operational output and can block or confuse later automation because local state is treated as the source of truth when it is actually stale.
- This matters now because the failure mode is not self-healing: an operator has to manually run
vigilante cleanup to clear state that Vigilante should be able to reconcile safely on its own.
Context
- Observed bad scenario for
aliengiraffe/vigilante issue #299:
vigilante logs --repo aliengiraffe/vigilante --issue 299 shows the session succeeded at 2026-03-26 10:39:35 AM PDT and opened PR #319.
~/.vigilante/sessions.json still showed issue #299 as status: "running" with updated_at and last_heartbeat_at stuck at 2026-03-26T17:11:08Z.
vigilante status reported Stale sessions (1) and listed Issue #299 in aliengiraffe/vigilante: running.
- GitHub state for the same issue had already moved on:
- issue
#299 was closed
- the issue carried
vigilante:done
- the implementation log had already recorded successful completion and PR creation
- The likely trigger was a later scheduler wedge on another issue, which prevented the daemon from reconciling already-finished sessions before the process got stuck.
Desired Outcome
- Vigilante should detect when a supposedly running or stale session no longer matches GitHub reality and recover it automatically.
- Recovery should reconcile session state by re-checking the GitHub issue and any associated PR before continuing to present the session as
running.
vigilante status should stop reporting false stale-running sessions when the remote issue/PR state clearly indicates the work finished, moved to PR maintenance, was closed, or otherwise no longer belongs in running.
- Manual cleanup should remain available, but it should no longer be required for this recoverable state-drift scenario.
- Do not broaden this issue into redesigning the whole scheduler or provider lifecycle; keep the fix focused on stale session reconciliation.
Implementation Notes
- Treat this as a bug in stale-session recovery, not as a documentation-only problem.
- When a session is considered stale, reconcile it against GitHub before reporting or persisting it as still running:
- fetch the issue state and labels
- fetch the PR state if the session has a PR number or if one can be resolved from the session branch
- use that remote state to determine whether the session should transition to
success, closed, PR-maintenance tracking, or another non-running state
- Use the existing session evidence when available, including known PR number, branch, issue labels, and any per-issue session log signals, but GitHub issue/PR state should be the deciding factor when local state is obviously stale.
- Required: the stale-session path must re-check GitHub issue and PR status before continuing to report the session as running.
- Flexible: whether reconciliation happens during daemon scans,
vigilante status, stale-session detection, daemon startup recovery, or a shared recovery helper used by all of those paths.
- Preserve safe behavior when GitHub is temporarily unavailable: do not silently invent terminal states if remote reconciliation failed.
Acceptance Criteria
Testing Expectations
- Add or update tests around stale session detection and recovery in the app/status/daemon paths that read
sessions.json.
- Include a regression test that reproduces the observed
#299 scenario: local session remains running, per-issue log indicates success, and GitHub issue/PR state indicates completion.
- Add coverage for the GitHub reconciliation branches, including issue closed with
vigilante:done, open PR maintenance, merged PR, and GitHub lookup failure.
- Validate that recovered state is persisted and that
vigilante status output changes from stale-running to the reconciled state.
Operational / UX Considerations
- Prefer self-healing behavior over operator-only cleanup when the remote GitHub state makes the correct outcome clear.
- Keep status output trustworthy: if Vigilante says a session is still running, that should mean there is real evidence of active work rather than just stale local JSON.
- If reconciliation changes a session state automatically, log that transition clearly so operators can understand why the stale warning disappeared.
Summary
Vigilante can leave a session marked
runningin local state even after the implementation actually completed and the corresponding GitHub issue/PR moved on. Add stale-session recovery that re-checks GitHub issue and pull request status so the daemon can reconcile bad local state instead of continuing to report false stale-running sessions.Problem
runningin~/.vigilante/sessions.jsoneven though the implementation finished successfully.vigilante statusthen reports a stale session based on oldlast_heartbeat_at/updated_atdata, even when GitHub already shows the issue and PR in a terminal or clearly advanced state.vigilante cleanupto clear state that Vigilante should be able to reconcile safely on its own.Context
aliengiraffe/vigilanteissue#299:vigilante logs --repo aliengiraffe/vigilante --issue 299shows the session succeeded at2026-03-26 10:39:35 AM PDTand opened PR#319.~/.vigilante/sessions.jsonstill showed issue#299asstatus: "running"withupdated_atandlast_heartbeat_atstuck at2026-03-26T17:11:08Z.vigilante statusreportedStale sessions (1)and listedIssue #299 in aliengiraffe/vigilante: running.#299was closedvigilante:doneDesired Outcome
running.vigilante statusshould stop reporting false stale-running sessions when the remote issue/PR state clearly indicates the work finished, moved to PR maintenance, was closed, or otherwise no longer belongs inrunning.Implementation Notes
success,closed, PR-maintenance tracking, or another non-running statevigilante status, stale-session detection, daemon startup recovery, or a shared recovery helper used by all of those paths.Acceptance Criteria
runninglocally but the associated GitHub issue/PR state shows the work has already completed or transitioned, Vigilante reconciles the session out ofrunningautomatically.vigilante statusdoes not continue to report a stale-running session for the reproduced#299-style scenario once GitHub reconciliation succeeds.running.sessions.jsonso the same stale warning does not reappear on the next command or scan.Testing Expectations
sessions.json.#299scenario: local session remainsrunning, per-issue log indicates success, and GitHub issue/PR state indicates completion.vigilante:done, open PR maintenance, merged PR, and GitHub lookup failure.vigilante statusoutput changes from stale-running to the reconciled state.Operational / UX Considerations