Skip to content

ci(apm-sync): verify .claude/ without wiping it#469

Merged
srid merged 5 commits intomasterfrom
fix/apm-sync-no-wipe
Apr 12, 2026
Merged

ci(apm-sync): verify .claude/ without wiping it#469
srid merged 5 commits intomasterfrom
fix/apm-sync-no-wipe

Conversation

@srid
Copy link
Copy Markdown
Member

@srid srid commented Apr 11, 2026

ci::apm-sync no longer mutates the live .claude/ tree. Previously it depended on ai::apm, which did find .claude -mindepth 1 ... -exec rm -rf and then ran apm install — opening a window where .claude/hooks/, settings.json, commands, rules, and skills did not exist on disk. Any concurrent Claude Code session in the same worktree whose stop hook fired during that window crashed with No such file or directory.

The fix performs the verification in a scratch directory: symlink apm.yml and agents/, cp apm.lock.yaml, mkdir .claude/ (this is how apm's target auto-detection picks claude mode — see target_detection.py:82,102), run apm install inside scratch, then diff -r against the live tree. The real .claude/ is never touched during CI.

The lockfile is copied rather than symlinked because apm install writes generated_at through on every run — a symlink would mutate the real lockfile. Runtime files that apm doesn't manage (launch.json, plus the three .claude/* entries already listed in .gitignore: worktrees/, settings.local.json, scheduled_tasks.lock) are excluded from the diff — the old git status approach filtered these implicitly via gitignore; the new diff approach has to do it explicitly. The dev ai::apm recipe keeps its destructive wipe — that's an explicit developer action, not a background CI step.

Also refreshes the stale apm#561 comment — upstream apm prune now exists, but it only handles orphan packages (not orphan files from upgraded packages), so the wipe in the dev recipe is still justified.

Closes #468.

@srid
Copy link
Copy Markdown
Member Author

srid commented Apr 11, 2026

Hickey Analysis

Concerns: (1) verify .claude/ matches sources, (2) security audit, (3) don't mutate the live tree during CI. The fix cleanly separates #3 from the dev action — apm-sync no longer shares a destructive hot path with the apm dev recipe.

Fragmentation: the scratch setup encodes one implicit invariant — every local input referenced by apm.yml must be symlinked into scratch. Today that's just ./agents. If a future apm.yml adds a second local dep, the recipe will break loudly (apm install fails fast on missing path). Self-enforcing, not silent drift. No other invariants found.

Concept multiplication: none. apm (dev, destructive) and apm-sync (CI, verify) diverge by purpose, blast radius, and trigger — legitimate separation.

Structural patterns — magic incantations: three lines in the recipe encode external knowledge that will rot without comments:

  1. mkdir "$scratch/.claude" — triggers apm's folder-based target auto-detection. Empirically -t claude does NOT produce an equivalent result; local packages silently no-op.
  2. cp apm.lock.yaml vs ln -s for everything else — apm writes generated_at through the lockfile on every install, so a symlink would mutate the real file.
  3. diff -r -x launch.json — user-owned file outside apm's management.

Mitigation: all three get inline comments in the recipe explaining the external behavior they depend on.

Entanglement: single recipe, no closures, no mutable state. One temporal coupling (mkdir .claudeapm install order matters, dependency invisible) addressed by comment.

Severity: tiny blast radius, low change friction. Reasoning load moderate without comments, low with them.

Simplifications considered and rejected:

Fact-check: re-read for phrase shapes. "Legitimate separation" verified at domain level (automated background vs. human-triggered). No "in practice doesn't", no "convention not constraint". No findings dismissed.

srid added 2 commits April 11, 2026 19:07
ci::apm-sync previously depended on ai::apm, which did a destructive
`find .claude ... -exec rm -rf` before reinstalling. That opened a
window where hooks, settings, commands, rules, and skills did not
exist on disk — any concurrent Claude Code session's stop hook
firing during that window crashed with "No such file or directory".

Replace apm-sync's verification path with a scratch-dir stage: symlink
apm.yml + agents/, cp apm.lock.yaml (apm writes `generated_at` on
every install, so a symlink would mutate the real lockfile), mkdir
.claude/ to trigger apm's folder-based target auto-detection, run
apm install in scratch, then `diff -r -x launch.json` against the
live tree. The real .claude/ is never touched during CI.

The ai::apm dev recipe keeps its destructive wipe — that's an
explicit developer action, not a background CI step. Also refresh
the stale apm#561 comment: upstream `apm prune` now exists but
only handles orphan packages, not orphan files from upgraded
packages, so the dev-recipe wipe is still justified.
The initial apm-sync rewrite failed CI because `.claude/scheduled_tasks.lock`
(written at runtime by Claude Code's schedule wakeup machinery) exists in
the live tree but not in the scratch install. The old git-status approach
implicitly filtered runtime files via .gitignore; the new diff approach
must do so explicitly.

Exclude the three `.claude/*` runtime paths already listed in .gitignore:
worktrees/, settings.local.json, scheduled_tasks.lock.
@srid srid marked this pull request as ready for review April 11, 2026 23:13
@srid
Copy link
Copy Markdown
Member Author

srid commented Apr 11, 2026

/do results

Step Status Duration Verification
sync 3s fetch ok; forge=github
research 1m 5s problem/fix pinned in talk phase
hickey 1m 31s 3 magic incantations → inline comments
branch 59s fix/apm-sync-no-wipe pushed
implement 1m 1s happy-path + drift-detection verified locally
check 23s pnpm typecheck clean
docs 11s no doc changes needed
police 1m 16s 1 elegance fix (echo→stderr)
fmt 12s 0/5 reformatted
commit 16s 52f2e3c pushed
test 6s skipped: recipe-only change
ci 5m 3s 9/9 green after 1 retry
update-pr 25s body refreshed; marked ready
done
Total ~12m 38s

Slowest step: ci at ~40% of total, including one retry. First run caught a real edge case — .claude/scheduled_tasks.lock (runtime lockfile written by Claude Code's ScheduleWakeup tool during this very workflow) existed in the live tree but not in the scratch install. The fix: exclude the three .claude/* runtime paths already listed in .gitignore from the diff.

Optimization suggestions

  • Test apm-sync against a dirty .claude/ — enumerate runtime/gitignored files before declaring local verification complete. Would have caught the scheduled_tasks.lock edge case pre-commit.
  • Tighter talk→implement handoff — when talk ends with a concrete recipe, paste it into the hickey prompt so implement is pure mechanical application.
  • just ci::apm-sync for recipe-only iterations — ~4s single-step retry beats running the full matrix (5m) when iterating on the recipe itself.

Workflow completed at 2026-04-11T23:13:18Z.

@srid srid mentioned this pull request Apr 12, 2026
@srid srid merged commit 4cdb81b into master Apr 12, 2026
10 of 11 checks passed
@srid srid deleted the fix/apm-sync-no-wipe branch April 12, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci::apm-sync wipes .claude/ mid-run, breaking running Claude Code stop hooks

1 participant