Skip to content

shared/apm.md restore step crashes on PRs that modify tracked files (e.g. apm.lock.yaml) #28256

@danielmeppiel

Description

@danielmeppiel

Summary

.github/workflows/shared/apm.md (the recommended APM integration template documented at https://githubnext.github.io/gh-aw/reference/dependencies/) crashes any pull_request_target agentic workflow whose triggering PR modifies apm.lock.yaml (or any other tracked file that overlaps the bundle's contents). Failure surfaces in gh-aw's own checkout_pr_branch.cjs step with no mention of APM, so adopters debug the wrong layer or abandon.

This affects every adopter who uses APM dependencies with gh-aw, on the canonical first-run PR ("I added a dependency, please review.").

Originating failure

error: Your local changes to the following files would be overwritten by checkout:
        apm.lock.yaml
ERR_API: Failed to checkout PR branch: The process '/usr/bin/git' failed with exit code 1

Root cause

shared/apm.md has an asymmetric isolation contract between the Pack job and the Restore pre-agent-step:

Step isolated working-directory
Pack 'true' /tmp/gh-aw/apm-workspace
Restore unset '.' (= ${{ github.workspace }})

Restore extracts the bundle (which contains .github/{skills,agents,instructions,prompts}/, apm.lock.yaml, apm.yml, apm_modules/) directly into ${{ github.workspace }}. Verified in microsoft/apm-action's src/runner.ts:

  • actionOwnsDir = isolated || packInput || !!bundleInput — bundle mode already implies the action creates the working directory if missing, so symmetric isolation is a one-line change with no side-effects.
  • The unpacker contract (apm unpack, unpacker.py:37-39) is documented as: "If a local file has the same name as a bundle file, the bundle file wins (overwrite)" — so any tracked file in the consumer repo whose path collides with the bundle gets dirtied.

In pull_request_target flows, gh-aw's checkout_pr_branch.cjs then runs git checkout -B <branch> origin/pr-head, which aborts on dirty tracked files. The agent never starts.

Why this matters beyond one PR

Recommended fix

Restore extracts into a temp directory (symmetric with Pack), and a small bridge step stages only what the agent needs into ${{ github.workspace }} using no-clobber copy semantics so the consumer's tracked files always win. This mirrors APM's own discovery priority (discover_primitives_with_dependencies — local primitives have highest priority).

Exact YAML diff

In .github/workflows/shared/apm.md, change ONLY the pre-agent-steps: block. jobs:, import-schema:, the Pack step, and everything else are untouched:

 pre-agent-steps:
   - name: Download APM bundle artifact
     uses: actions/[email protected]
     with:
       name: ${{ needs.activation.outputs.artifact_prefix }}apm
       path: /tmp/gh-aw/apm-bundle
   - name: Find APM bundle path
     id: apm_bundle
     run: echo "path=$(find /tmp/gh-aw/apm-bundle -name '*.tar.gz' | head -1)" >> "$GITHUB_OUTPUT"
   - name: Restore APM packages
     uses: microsoft/[email protected]
     with:
       bundle: ${{ steps.apm_bundle.outputs.path }}
+      working-directory: /tmp/gh-aw/apm-restore
+  - name: Stage APM primitives for agent discovery
+    run: |
+      src=/tmp/gh-aw/apm-restore
+      # apm_modules/ is gitignored and fully package-owned; safe to copy wholesale.
+      [ -d "$src/apm_modules" ] && cp -a "$src/apm_modules" .
+      # Primitives: add only new files. No-clobber ensures repo-tracked files
+      # are never overwritten -- adopter's primitives always win over bundle contents.
+      for d in skills agents instructions prompts; do
+        [ -d "$src/.github/$d" ] || continue
+        mkdir -p ".github/$d"
+        cp -rn "$src/.github/$d/." ".github/$d/"
+      done

That is the entire change. No new import-schema inputs. No adopter-facing API surface change. Adopters keep importing exactly as before:

imports:
  - uses: shared/apm.md
    with:
      packages:
        - microsoft/apm-sample-package

Why this shape

Concern How this fix addresses it
apm.lock.yaml collision with PR checkout Eliminated at the root: lockfile stays in /tmp/gh-aw/apm-restore, never touches the workspace.
Tracked primitives in consumer repo (e.g. .github/agents/reviewer.agent.md) cp -rn skips them. Repo's version wins. PR checkout succeeds.
Agent primitive discovery (${{ github.workspace }}/.github/{skills,agents,instructions,prompts}/) Bridge stages exactly there. Discovery path unchanged.
Empty workspace (no actions/checkout) Bridge copies everything; nothing to clobber.
push / schedule / workflow_dispatch flows (no subsequent PR checkout) No regression: bridge runs unconditionally, no git dependency, no error swallowing.
apm_modules/ (gitignored, package-owned) Wholesale cp -a is safe.

Why not the simpler alternatives

We considered and rejected:

  1. git checkout -- . 2>/dev/null || true after Restore (the workaround we shipped in microsoft/apm's own copy of shared/apm.md, see apm unpack writes apm.lock.yaml / apm.yml to output dir, violating documented metadata-only contract microsoft/apm#901): blunt revert of all tracked file modifications (not just APM's), git-dependent (silently swallows errors in no-checkout flows), and reads as a workaround in a public template. Fine as a single-repo internal mitigation; wrong shape for the shared template that every adopter imports.

  2. isolated: 'true' on the Restore step with working-directory: '.': catastrophic — isolated mode in apm-action clears existing primitive dirs under .github/ first, which would WIPE the consumer's existing primitives. (Verified in microsoft/apm-action src/runner.ts:316clearExistingPrimitives is invoked in isolated mode.)

  3. isolated: 'true' + working-directory: /tmp/...: redundant — bundleInput already triggers actionOwnsDir = true, so the action creates the temp dir without isolated. Adding the flag adds reader cognitive load with no behavior change.

Edge cases / hardening notes

  1. cp -rn portability: -n (no-clobber) is supported by GNU coreutils (all GitHub-hosted Ubuntu runners) and BSD cp (macOS runners). For self-hosted runners on exotic distros, a safe fallback is rsync -a --ignore-existing (also available on all GitHub-hosted runners). Not a blocker for hosted runners.

  2. Package intentionally replaces a repo-tracked primitive: If an APM package ships, e.g., .github/agents/foo.agent.md and the consumer has the same path tracked, cp -rn keeps the consumer's version. This is the correct semantic (repo authority > package defaults) and matches APM's local discovery priority. If a consumer ever needs the package version to win, the right place to add a --force-primitives flag is microsoft/apm-action, not the shared template.

  3. Pre-existing apm_modules/ in workspace: in normal CI, apm_modules/ is gitignored and absent. Hardening option (paranoia-level, not required): replace cp -a "$src/apm_modules" . with rm -rf apm_modules; cp -a "$src/apm_modules" ..

  4. Symmetry with Pack: Pack uses working-directory: /tmp/gh-aw/apm-workspace; this fix uses /tmp/gh-aw/apm-restore. Both stay under gh-aw's /tmp/gh-aw/* convention. Mental model: "APM operates in temp space; only primitives bridge to the workspace."

  5. Future apm-action v2: a native restore-to mode that does the isolated extract + selective stage in one step would let the bridge step disappear. Out of scope for this issue; track separately against microsoft/apm-action.

Acceptance criteria

  • A pull_request_target agentic workflow built on shared/apm.md succeeds on a PR that modifies apm.lock.yaml. Verify against the failing job above (https://github.com/microsoft/apm/actions/runs/24883083247/job/72856352586?pr=889) once microsoft/apm pulls the new shared/apm.md.
  • A workflow built on shared/apm.md whose triggering PR modifies a tracked .github/agents/*.agent.md succeeds, and the PR's version of that primitive is the one the agent discovers (not the bundle's).
  • A push-triggered workflow built on shared/apm.md discovers all bundle-installed primitives at ${{ github.workspace }}/.github/{skills,agents,instructions,prompts}/.
  • A workflow_dispatch-triggered workflow with no actions/checkout at all discovers all bundle-installed primitives.

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions