Skip to content

v1.5: restore mode must install apm CLI so bundles are unpacked via 'apm unpack' (not raw tar fallback) #26

@danielmeppiel

Description

@danielmeppiel

Summary

In restore mode (bundle: input set), microsoft/apm-action deliberately skips installing the apm CLI (runner.ts:90// RESTORE MODE: extract bundle, skip APM installation entirely.). As a result, extractBundle (bundler.ts:62-89) always falls through to its tar xzf --strip-components=1 fallback, which dumps the full bundle contents — including apm.lock.yaml and apm.yml — into working-directory.

When working-directory is a git checkout (the default '.' = ${{ github.workspace }}), those tracked files become dirty, and any subsequent git checkout collides:

error: Your local changes to the following files would be overwritten by checkout:
        apm.lock.yaml
ERR_API: Failed to checkout PR branch: The process '/usr/bin/git' failed with exit code 1

This breaks every pull_request_target agentic workflow whose triggering PR modifies apm.lock.yaml -- i.e., the canonical first-run PR for a new APM adopter.

Root cause is isolated to this repo

The apm unpack CLI is correct. It explicitly writes only deployed_files from the lockfile and never copies apm.lock.yaml / apm.yml to the output dir (verified in microsoft/apm src/apm_cli/bundle/unpacker.py). The CLI's documented contract holds.

The bug is the tar fallback in this action: raw tar xzf does not know about APM's "lockfile is metadata, not output" contract, so it extracts everything. The fallback exists for the case "apm not installed" -- but in restore mode the action chooses to never install apm (for speed), so the fallback path is taken on every restore. Result: in 100% of restore invocations, the workspace is dirtied.

The action itself already knows this is the inferior code path -- runner.ts:96 logs "unverified -- install APM for integrity checks" whenever the fallback runs. We've been shipping that warning to every consumer for the entire lifetime of restore mode.

Originating failure

Proposed fix

Always install the apm CLI in restore mode, so extractBundle takes the apm unpack path (which honors the CLI's documented contract). Drop the "skip APM installation entirely" optimization at runner.ts:90.

Behavior change (vs. current)

Current Proposed
apm CLI installed in restore mode? No Yes (consistent with install/pack modes)
Extraction path used in restore mode tar xzf (always; fallback) apm unpack (always; primary path)
apm.lock.yaml written to working-directory Yes (tar dumps everything) No (apm unpack only writes deployed_files)
apm.yml written to working-directory Yes (tar dumps everything) No (same)
.github/{skills,agents,instructions,prompts}/ written to working-directory Yes Yes
apm_modules/ written to working-directory Yes Yes
Bundle integrity verification No (unverified warning logged) Yes (CLI verifies file presence + scans for hidden Unicode before extraction)
New inputs -- None

Implementation sketch

In src/runner.ts:

   // RESTORE MODE: extract bundle, skip APM installation entirely.
   // Directory was already created above (actionOwnsDir = true for bundle mode).
   if (bundleInput) {
+    // Install apm so extractBundle takes the verified `apm unpack` path
+    // instead of the raw `tar xzf` fallback (which dumps the full bundle
+    // including apm.lock.yaml/apm.yml into working-directory).
+    await ensureApmInstalled();
+
     const bundlePath = await resolveLocalBundle(bundleInput, resolvedDir);
     core.info(`Restoring bundle: ${bundlePath}`);
     const result = await extractBundle(bundlePath, resolvedDir);
-    const verifiedMsg = result.verified ? ' (verified)' : ' (unverified -- install APM for integrity checks)';
-    core.info(`Restored ${result.files} file(s)${verifiedMsg}`);
+    core.info(`Restored ${result.files} file(s) (verified)`);
     ...

Optional belt-and-braces (in src/bundler.ts): if the tar xzf fallback ever runs again in the future (e.g., apm install fails), narrow it so it cannot dirty the workspace:

-  const rc = await exec.exec('tar', ['xzf', resolvedBundle, '-C', resolvedOutput, '--strip-components=1'], {
+  const rc = await exec.exec('tar', [
+    'xzf', resolvedBundle, '-C', resolvedOutput,
+    '--strip-components=1',
+    '--exclude=apm.lock.yaml',
+    '--exclude=apm.lock',     // legacy bundle naming
+    '--exclude=apm.yml',
+  ], {
     ignoreReturnCode: true,
   });

Performance note

Installing apm adds a one-time download per job (~single-digit MB tarball, no compilation). On hosted runners that is sub-second to a few seconds, negligible vs. the cost of running an agent job. The "skip apm install" optimization is not worth a correctness bug that crashes every dependency-PR.

Versioning

Ship as v1.5 with a prominent Changed entry calling out: "Restore mode now installs the apm CLI and verifies bundles via apm unpack. As a side-effect, apm.lock.yaml and apm.yml are no longer written to working-directory -- they were never supposed to be (per apm unpack's documented contract); the tar fallback was leaking them. If you depended on those files appearing post-restore, retrieve them from the bundle artifact directly via tar xzf <bundle> --include='*/apm.lock.yaml' -O."

Not a major (v2): the visible inputs/outputs are unchanged, and the dropped behavior was a contract-violating side-effect of a fallback path. Per semver-pragmatic convention, this is a bug fix.

Acceptance criteria

  • A pull_request_target workflow that calls microsoft/[email protected] with bundle: set, on a PR that modifies apm.lock.yaml, succeeds -- no git checkout collision.
  • The action log shows "Using apm unpack (with verification)..." (not the unverified tar fallback) on every restore invocation.
  • ${{ github.workspace }}/apm.lock.yaml and ${{ github.workspace }}/apm.yml are not modified by the restore step (verify with git status immediately after).
  • All bundle primitives are present at ${{ github.workspace }}/.github/{skills,agents,instructions,prompts}/; ${{ github.workspace }}/apm_modules/ is materialized.

Migration impact

  • github/gh-aw shared/apm.md: bump pin from microsoft/[email protected] to @v1.5. No template change beyond the version bump. (The bridge step the gh-aw team would otherwise have had to add -- see shared/apm.md restore step crashes on PRs that modify tracked files (e.g. apm.lock.yaml) github/gh-aw#28256, since closed -- becomes unnecessary.)
  • microsoft/apm own copy of shared/apm.md: drop the temporary post-restore git checkout -- . 2>/dev/null || true mitigation once @v1.5 is pinned. Re-import gh-aw's shared/apm.md upstream.
  • External adopters using [email protected] directly without gh-aw: workspace stops getting dirtied on restore; restore now also gets bundle integrity verification (previously logged as "unverified"). Consumers who depended on apm.lock.yaml / apm.yml appearing in working-directory post-restore must retrieve them from the bundle artifact via raw tar (documented in the v1.5 release notes).

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions