ci(apm-sync): verify .claude/ without wiping it#469
Conversation
Hickey AnalysisConcerns: (1) verify Fragmentation: the scratch setup encodes one implicit invariant — every local input referenced by Concept multiplication: none. Structural patterns — magic incantations: three lines in the recipe encode external knowledge that will rot without comments:
Mitigation: all three get inline comments in the recipe explaining the external behavior they depend on. Entanglement: single recipe, no closures, no mutable state. One temporal coupling ( Severity: tiny blast radius, low change friction. Reasoning load moderate without comments, low with them. Simplifications considered and rejected:
Fact-check: re-read for phrase shapes. "Legitimate separation" verified at domain level (automated background vs. human-triggered). No "in practice doesn't", no "convention not constraint". No findings dismissed. |
ci::apm-sync previously depended on ai::apm, which did a destructive `find .claude ... -exec rm -rf` before reinstalling. That opened a window where hooks, settings, commands, rules, and skills did not exist on disk — any concurrent Claude Code session's stop hook firing during that window crashed with "No such file or directory". Replace apm-sync's verification path with a scratch-dir stage: symlink apm.yml + agents/, cp apm.lock.yaml (apm writes `generated_at` on every install, so a symlink would mutate the real lockfile), mkdir .claude/ to trigger apm's folder-based target auto-detection, run apm install in scratch, then `diff -r -x launch.json` against the live tree. The real .claude/ is never touched during CI. The ai::apm dev recipe keeps its destructive wipe — that's an explicit developer action, not a background CI step. Also refresh the stale apm#561 comment: upstream `apm prune` now exists but only handles orphan packages, not orphan files from upgraded packages, so the dev-recipe wipe is still justified.
The initial apm-sync rewrite failed CI because `.claude/scheduled_tasks.lock` (written at runtime by Claude Code's schedule wakeup machinery) exists in the live tree but not in the scratch install. The old git-status approach implicitly filtered runtime files via .gitignore; the new diff approach must do so explicitly. Exclude the three `.claude/*` runtime paths already listed in .gitignore: worktrees/, settings.local.json, scheduled_tasks.lock.
|
| Step | Status | Duration | Verification |
|---|---|---|---|
| sync | ✓ | 3s | fetch ok; forge=github |
| research | ✓ | 1m 5s | problem/fix pinned in talk phase |
| hickey | ✓ | 1m 31s | 3 magic incantations → inline comments |
| branch | ✓ | 59s | fix/apm-sync-no-wipe pushed |
| implement | ✓ | 1m 1s | happy-path + drift-detection verified locally |
| check | ✓ | 23s | pnpm typecheck clean |
| docs | ✓ | 11s | no doc changes needed |
| police | ✓ | 1m 16s | 1 elegance fix (echo→stderr) |
| fmt | ✓ | 12s | 0/5 reformatted |
| commit | ✓ | 16s | 52f2e3c pushed |
| test | ⊘ | 6s | skipped: recipe-only change |
| ci | ✓ | 5m 3s | 9/9 green after 1 retry |
| update-pr | ✓ | 25s | body refreshed; marked ready |
| done | ✓ | — | — |
| Total | ~12m 38s |
Slowest step: ci at ~40% of total, including one retry. First run caught a real edge case — .claude/scheduled_tasks.lock (runtime lockfile written by Claude Code's ScheduleWakeup tool during this very workflow) existed in the live tree but not in the scratch install. The fix: exclude the three .claude/* runtime paths already listed in .gitignore from the diff.
Optimization suggestions
- Test apm-sync against a dirty
.claude/— enumerate runtime/gitignored files before declaring local verification complete. Would have caught thescheduled_tasks.lockedge case pre-commit. - Tighter talk→implement handoff — when talk ends with a concrete recipe, paste it into the hickey prompt so implement is pure mechanical application.
just ci::apm-syncfor recipe-only iterations — ~4s single-step retry beats running the full matrix (5m) when iterating on the recipe itself.
Workflow completed at 2026-04-11T23:13:18Z.
ci::apm-syncno longer mutates the live.claude/tree. Previously it depended onai::apm, which didfind .claude -mindepth 1 ... -exec rm -rfand then ranapm install— opening a window where.claude/hooks/,settings.json, commands, rules, and skills did not exist on disk. Any concurrent Claude Code session in the same worktree whose stop hook fired during that window crashed withNo such file or directory.The fix performs the verification in a scratch directory: symlink
apm.ymlandagents/,cp apm.lock.yaml,mkdir .claude/(this is how apm's target auto-detection picks claude mode — seetarget_detection.py:82,102), runapm installinside scratch, thendiff -ragainst the live tree. The real.claude/is never touched during CI.The lockfile is copied rather than symlinked because
apm installwritesgenerated_atthrough on every run — a symlink would mutate the real lockfile. Runtime files that apm doesn't manage (launch.json, plus the three.claude/*entries already listed in.gitignore:worktrees/,settings.local.json,scheduled_tasks.lock) are excluded from the diff — the oldgit statusapproach filtered these implicitly via gitignore; the new diff approach has to do it explicitly. The devai::apmrecipe keeps its destructive wipe — that's an explicit developer action, not a background CI step.Closes #468.