Skip to content

[Feature]: Persist agent session IDs on quit and resume them on next launch #237

@forketyfork

Description

@forketyfork

Status Quo

When Architect quits, session/state.zig sends SIGTERM to the shell pid. If an AI agent (Claude Code, Codex, Gemini) is running as the foreground process in the PTY, the signal goes only to the shell — the agent doesn't receive a clean shutdown signal and has no opportunity to print its exit message with the resume command. There is no per-terminal agent_type or agent_session_id in persistence.toml, and on next launch Architect simply restores blank shells. Users must manually re-run agents with resume flags.

All three agents print a UUID to the terminal when they exit cleanly:

  • Claude: claude --resume <uuid>
  • Codex: codex resume <uuid>
  • Gemini: gemini --resume <uuid>

Objectives

When a user quits Architect while an AI agent is running, Architect should cleanly terminate the agent (giving it time to print its exit message), extract the session UUID from the terminal output, persist it, and automatically resume the agent on next launch — without requiring the user to copy-paste anything.

User Flow

Trigger: User presses Cmd+Q (or closes the window) while an agent is running in one or more terminals.

  1. Architect detects that a terminal has a running agent by inspecting the foreground process group of the PTY.
  2. Architect sends SIGTERM to the foreground process group of the PTY (via tcgetpgrp + killpg), not just the shell pid.
  3. Architect waits briefly (up to ~1–2s) for the agent to flush its exit output into the PTY buffer.
  4. Architect scans the terminal buffer (ghostty-vt scrollback) for a UUID in the agent's exit message.
  5. The UUID and agent type are saved to persistence.toml alongside the terminal's cwd.
  6. On next launch, for terminals with a saved agent session, Architect spawns the shell and writes the resume command to the PTY after startup.

Result: The user reopens Architect and sees their agents resuming in the same slots.

Resume Command Formats

Agent Resume command
Claude Code claude --resume <session_id>
Codex codex resume <session_id>
Gemini gemini --resume <session_id>

Agent Detection Strategy

At quit time, for each session, Architect calls tcgetpgrp(pty.master) to get the foreground process group leader pid, then identifies the agent using a two-step check:

  1. Process image name first (proc_name(pgid) via macOS libproc.h): if the name is claude, codex, or gemini, it's a match. This handles compiled binaries directly (Claude is already compiled; Codex and Gemini may follow).
  2. Node.js fallback: if the image name is node, read the full argv via KERN_PROCARGS2 sysctl and check whether argv[1] contains claude, codex, or gemini. This handles the current Node.js-wrapped distributions of Codex and Gemini.

Verified against live processes (all three agents running simultaneously in Architect):

  • Claude: process group leader is claude (compiled binary) — matched by step 1
  • Codex: process group leader is node /opt/homebrew/bin/codex — matched by step 2
  • Gemini: process group leader is node /opt/homebrew/bin/gemini — matched by step 2

This approach also handles the "user already quit the agent" case: if the agent has exited before Architect quits, tcgetpgrp returns the shell's pgid, no agent is detected, and nothing is persisted.

Scope

In scope:

  • Send SIGTERM to the foreground process group when an agent is active, and drain PTY output for up to ~1–2s before deinit
  • Detect agent type from the foreground process group leader using the two-step strategy above (process image name, then KERN_PROCARGS2 argv for Node.js wrappers)
  • Scan terminal buffer (ghostty-vt) for a UUID in Claude/Codex/Gemini exit messages
  • Extend persistence.toml with agent_type and agent_session_id per terminal entry
  • On restore, write the resume command to the PTY after the shell starts
  • macOS only (consistent with existing cwd persistence limitation)
  • Add an ADR documenting the agent detection and resume strategy

Out of scope:

  • Agents that don't print a UUID on SIGTERM
  • Any UI for managing or browsing past agent sessions
  • Non-macOS platforms

Implementation Plan

Affected Modules

  • src/session/state.zig: send SIGTERM to foreground process group; add agent_type/agent_session_id fields; implement agent detection (process image name + KERN_PROCARGS2 fallback); drain PTY and scan for UUID
  • src/config.zig: extend per-terminal persistence entry with agent_type and agent_session_id
  • src/app/runtime.zig: on quit, invoke agent teardown + UUID extraction before deinit; on restore, write resume command to PTY; save/load new persistence fields
  • docs/ARCHITECTURE.md: add ADR for agent detection and resume strategy
  • docs/configuration.md, README.md: document new behavior and fields

Tasks

  1. Add agent_type: ?[]const u8 and agent_session_id: ?[]const u8 to SessionState; implement foreground process detection: tcgetpgrpproc_name (compiled binary check) → KERN_PROCARGS2 argv[1] (Node.js wrapper fallback). Note: KERN_PROCARGS2 returns a binary blob with layout [int32 argc][exec_path\0][alignment padding][argv[0]\0][argv[1]\0]... — parsing requires careful pointer arithmetic and boundary checks — src/session/state.zig
  2. Implement graceful agent teardown: send SIGTERM to the foreground process group via killpg, then run a synchronous blocking drain loop (read PTY → feed into vt_stream.processBytes → repeat until agent exits or ~1.5s timeout). This is intentional blocking I/O during shutdown, justified by the same rationale as ADR-013. After draining, use app/terminal_history.extractSessionText to extract terminal text and scan for a UUID matching known exit-message patterns (claude --resume, codex resume, gemini --resume) — src/session/state.zig, src/app/terminal_history.zig
  3. Add agent_type and agent_session_id fields to the per-terminal persistence entry; update load/save/migrate logic — src/config.zig
  4. On quit in runtime.zig: for sessions where an agent is detected, call agent teardown before deinit and persist the extracted UUID + agent type — src/app/runtime.zig
  5. On restore in runtime.zig: if a terminal has a saved agent session, write the resume command (e.g. claude --resume <uuid>\n) to pending_write after the shell spawns. Note: writing immediately should work since PTY input queues until the shell reads it after startup — verify this assumption and add a note if a delay proves necessary — src/app/runtime.zig
  6. Write tests: UUID extraction from sample exit-message strings for each agent; persistence round-trip for new fields; resume command construction per agent type; agent name detection logic (both compiled binary and Node.js wrapper paths); KERN_PROCARGS2 blob parsing
  7. Add ADR to docs/ARCHITECTURE.md documenting the agent detection strategy (process image name + KERN_PROCARGS2 fallback) and the quit/resume data flow
  8. Update docs/configuration.md for new persistence fields
  9. Update README.md with user-facing description

New Dependencies

None — tcgetpgrp, killpg, and KERN_PROCARGS2 are POSIX/macOS APIs already available via @cImport; proc_name is available via macOS libproc.h.

Acceptance Criteria

  • All tasks completed
  • Running claude in a terminal, quitting Architect, and relaunching resumes the session automatically
  • Same for Codex and Gemini
  • Quitting the agent manually before quitting Architect does not trigger a resume on next launch
  • Tests cover UUID extraction from exit messages for all three agents, persistence round-trip, agent name detection, and resume command construction
  • zig build, zig build test, just lint all pass
  • ADR added to docs/ARCHITECTURE.md
  • docs/configuration.md and README.md updated

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions