feat(sandbox): docker-agent kit, gateway allowlist, and assorted --sandbox fixes#2844
Merged
Conversation
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
One LIKELY medium-severity finding in the new sandbox kit code.
trungutt
approved these changes
May 21, 2026
6062f01 to
4508d54
Compare
Add support for staging a docker-agent kit (skills and prompt files) in the sandbox before launch. The kit is built by the host, redacted with portcullis, and mounted read-only at /agent-kit inside the sandbox. This allows skills and prompt files to travel with the agent even though the sandbox /Users/dgageot differs from the host's. - New pkg/promptfiles: centralizes prompt-file lookup logic (used by both the add_prompt_files hook and the kit builder). - New pkg/sandbox/kit: builds the kit by collecting skills (via skills.Load) and per-agent prompt files, storing them in <cache>/sandbox-kits/<hash>/ with redaction applied to all text files via portcullis.Redact. Skips files already covered by the live workspace mount. - pkg/skills/local.go: when DOCKER_AGENT_KIT_DIR is set, skill discovery is rooted only at the kit's skills directory; host paths are skipped since they don't exist inside the sandbox. - pkg/hooks/builtins/add_prompt_files.go: delegates to promptfiles and prefers the kit over $HOME inside the sandbox. - pkg/sandbox.Backend.Ensure: changed to accept extras []string instead of a single extra path, enabling multiple mounts for the kit and config dir. - cmd/root: builds the kit before sandbox.Ensure, mounts it at /agent-kit, forwards DOCKER_AGENT_KIT_DIR env var to the sandbox, and cleans up on exit. Added --no-kit and (hidden) --kit-keep flags. - Tests: added coverage for kit build (redaction, workspace scoping, rebuild semantics, helpers) and kit-aware skill/prompt-file resolution. Moved existing add_prompt_files tests to pkg/promptfiles.
Security: symlink escape prevention, concurrency-safe atomic builds, symlink-resolved path comparisons, file ref canonicalisation, permission preservation, host path redaction from on-disk manifest. Quality: path normalisation in extras dedup, skip map in flag handling, test isolation, new test coverage for edge cases.
When --sandbox stages a kit, list every file shipped to the sandbox so
the user can see what's being mounted, where each file came from, and
which ones were scrubbed by portcullis before they reached the sandbox.
Example output:
Preparing docker-agent kit at <cache>/sandbox-kits/<hash>
skills:
plain (from ~/.agents/skills/plain)
SKILL.md
with-secret (from ~/.agents/skills/with-secret)
SKILL.md (redacted)
helper.sh
prompt files:
AGENTS.md (from ~/AGENTS.md, redacted)
summary: 2 skills, 1 prompt file, 2 secrets redacted
Implementation notes:
* New (*kit.Result).PrintSummary(io.Writer). Walks the staged tree
rather than reconstructing paths from the manifest so the listing
reflects exactly what the sandbox will see (symlink-followed,
escape-rejected, etc.).
* Redaction.Target is now recorded relative to the kit root \u2014 was
storing the absolute host destination, which is inconsistent with
Entry.Target and would leak the kit's host path into the on-disk
manifest. copyFile/copyTree thread the kit root through.
* host paths displayed via PrintSummary collapse $HOME to '~' so the
output stays readable and doesn't bleed the local username when
shared in screenshots.
* promote() now tolerates concurrent winners: when the rename loses
to another goroutine that has already published a complete kit at
the same final dir, we accept the winner's tree instead of erroring
out (the loser's staging is dropped by the deferred rollback).
* Tests cover the rendered output (header + per-file redaction marker
+ summary line + ~ collapsing), the empty/nil-receiver edge cases,
and the new kit-relative Redaction.Target invariant.
"docker agent run --sandbox <alias>" launched a sandbox without the alias's destination YAML mounted, so the in-sandbox agent could not read the file. ExtraWorkspace was deciding what to mount with a hand-rolled check that called filepath.Abs() on the raw arg and then matched on the extension. For an alias name like "gopher" that produced "/wd/gopher" (no extension, no file on disk) and the function returned "" \u2014 no extra mount. Delegate to config.Resolve instead. It is the same code that the runtime uses to dispatch the ref (alias -> file/oci/url/builtin), so ExtraWorkspace now picks the right host directory regardless of the input form. The Source.ParentDir contract already returns "" for non-file sources (built-ins, OCI, URLs), so those continue to need no mount. Drop the dead sandbox.ResolveAlias \u2014 unreferenced after the rewrite. Tests: regression for the alias case (yaml gets mounted) plus an OCI-backed alias case (still no mount), alongside the existing in-workspace / outside-workspace / built-in / OCI scenarios.
Docker Desktop's "docker sandbox ls --json" no longer wraps the
list under "vms" \u2014 it now returns {"sandboxes": [...]} the same
way "sbx ls --json" does. With the old key, ForWorkspace silently
returned nil for every lookup, so:
* Ensure could never reuse an existing sandbox (always created a fresh
one), and "docker sandbox create" then suffixed the name with "-1"
/ "-2" / ... because it had its own (working) registry of in-use
names \u2014 producing the "sandbox \u2026 already exists for this workspace.
Creating \u2026 -1 instead" notice on every run.
* Worse, after the suffixed create, the post-create ForWorkspace lookup
also returned nil, so Ensure failed with "sandbox was created but
could not be found" before the inner agent ever ran.
Both backends are aligned now; vmListKey stays as a struct field so
either backend can drift again without ripple changes.
While here, surface the rm failure that hides behind "_ = rmCmd.Run()":
log the error and the rm command's combined output at debug level so
the next time docker sandbox refuses to delete a stale entry we can
see why instead of silently leaking name suffixes.
…rovider addGatewayFlags wraps the command's PersistentPreRunE so it can populate runConfig.ModelsGateway from the env / user config. The old wrapper called runConfig.EnvProvider() *first* and only invoked the parent's PersistentPreRunE at the end \u2014 but EnvProvider() builds and caches the full provider chain on the first call, so any state the parent installs afterwards is invisible to that cached chain. In particular the root PersistentPreRunE applies --config-dir / --cache-dir / --data-dir overrides via paths.SetConfigDir et al. When --sandbox forwards --config-dir <host-path> to the inner docker-agent inside the container, the inner's gateway pre-run ran before the override landed, so: * environment.NewDefaultProvider() captured paths.GetConfigDir() at its default value (~/.config/cagent inside the sandbox image, not the host config dir bind-mounted from --config-dir). * The cached SandboxTokenProvider then read from the wrong path, always missed the host-written sandbox-tokens.json, and the inner failed config.CheckRequiredEnvVars with "sorry, you first need to sign in Docker Desktop to use the Docker AI Gateway" \u2014 even though the user was signed in and the host token writer was working. Run the parent first; everything else stays the same. The walk-up search for an ancestor PersistentPreRunE that handles deeply nested commands (root \u2192 serve \u2192 api) moves into a small runParentPreRun helper. Regression test asserts the new ordering by recording the first time the env provider is consulted and checking the parent has already run by then.
Two related fixes for "docker agent run --sandbox" with a models gateway configured: 1. Forward the host's Docker Desktop token directly as -e DOCKER_TOKEN=<value> when the gateway is set. The host-side SandboxTokenWriter already drops a refreshed JSON file under the mounted config dir for long-running sessions, but the inner's startup check (config.CheckRequiredEnvVars) runs *before* the in- sandbox config-dir override applies on existing v1.59.x sandbox images \u2014 so the SandboxTokenProvider starts out reading from the wrong path and the very first request fails with the gateway sign-in error. Seeding DOCKER_TOKEN through the OsEnvProvider lets that initial check pass; the file path becomes the source of truth once the inner has fully started, which keeps token rotation working for sessions that outlive the JWT lifetime. 2. Stop deleting the kit directory at the end of every run. The sandbox is reused across runs (deterministic name + extras keyed by content hash), and the bind-mount holds a hard reference to the kit's host path; deleting that dir leaves the next run unable to start the sandbox. The kit lives in the cache dir keyed on a content hash, so subsequent runs for the same agent overwrite it in place; total disk use is bounded by the number of distinct agents the user has run. The hidden --kit-keep flag becomes pointless once the kit is always kept; remove it.
When an agent's add_prompt_files entry resolves to a file inside the
live workspace mount (e.g. AGENTS.md sitting next to the agent YAML),
the kit builder correctly skips staging a redacted copy \u2014 the live
mount surfaces it inside the sandbox \u2014 but the file was missing from
Manifest.PromptFiles entirely. Users who looked at the printed kit
summary saw "0 prompt files" and concluded the kit was broken even
though the agent would receive AGENTS.md via the workspace mount.
Track these files in the manifest with Target == "" (introduced as
Entry.IsStaged()) so they show up in the printed summary, tagged as
"workspace mount" instead of redacted/staged. The runtime behaviour
is unchanged: the in-sandbox cwd-walk still finds the file directly,
and the kit's prompt_files dir continues to ship only host-only
copies (e.g. ~/AGENTS.md when there is no project-local one).
The on-disk manifest.json gets json:",omitempty" on Target so
non-staged entries don't leak an empty target string when serialised.
Tests:
* TestBuild_PromptFileInWorkspaceIsRecordedButNotStaged \u2014 the user's
scenario: AGENTS.md only inside the workspace; entry is recorded
but no copy lands under <kit>/prompt_files.
* TestBuild_PromptFilesCollectedAndScopedOutsideWorkspace updated:
asserts both the workspace and $HOME copies appear in the manifest
and that the staged copy carries the $HOME content.
* TestPrintSummary_WorkspacePromptFile \u2014 locks in the user-visible
output ("AGENTS.md (from /\u2026/AGENTS.md, workspace mount)").
Regression coverage for a question that came up while reviewing the "workspace prompt file is just listed, not staged" change: what if add_prompt_files resolves to AGENTS.md sitting in the workspace's *parent* directory (e.g. a monorepo / dotfiles layout)? I checked behaviour inside an actual running sandbox and confirmed that the parent of the workspace mount is synthesised \u2014 the host file at the same path is invisible inside the sandbox. The kit already handles this correctly because isUnder(parent, workspace) returns false, so the parent's AGENTS.md falls through to the staging branch and lands in <kit>/prompt_files/AGENTS.md. Lock that behaviour in with a test mirroring the layout: parent dir holds the AGENTS.md, child dir is the workspace; assert the parent file is staged (Target non-empty, content preserved) rather than recorded as a workspace mount.
When forwarding DOCKER_TOKEN into the sandbox we were calling envProvider.Get(ctx, "DOCKER_TOKEN") on the host. That chain consults OsEnvProvider first, so any pre-existing DOCKER_TOKEN value in the user's shell environment shadowed the live Docker Desktop backend \u2014 and Docker Desktop's gateway JWTs expire in ~15 min, so a stale exported token effectively never works. Bypass the chain and call desktop.GetToken(ctx) directly, the same source [sandbox.StartTokenWriterIfNeeded] uses for the file-based refresher. The forwarded value is now guaranteed to be the fresh JWT Docker Desktop currently considers valid; for sessions that outlive the token's lifetime, the file writer keeps rotating it on sandbox images that have the persistent-pre-run fix landed in cmd/root/flags.go.
The sandbox template ships with a default-deny network proxy that
allows direct hosts of the major model providers (api.anthropic.com,
api.openai.com, ...) but blocks every *.docker.com endpoint. When
the agent is configured to talk to the Docker AI Gateway, that
default-deny rule turns every request into a 403 from the proxy with
the message:
Blocked by network policy: domain ai-backend-service.docker.com:443
detail: no matching allow rule \u2014 blocked by default deny policy
\u2026 which the inner agent surfaces as 'HTTP 403' from the gateway,
indistinguishable from a real auth failure. (I confirmed this from
inside a running sandbox by curling the URL directly with a fresh
JWT \u2014 same 403 from the proxy, never reaches the gateway.)
Make this allowance part of the auto-kit pipeline:
* pkg/sandbox.Backend gains an AllowHosts(ctx, name, hosts) method
that wraps the per-backend spelling: 'sbx policy allow network
SANDBOX hosts,...' for the sbx backend, 'docker sandbox network
proxy SANDBOX --allow-host ...' for the docker backend. Both
apply dynamically post-create and survive restarts of the
sandbox.
* runInSandbox parses the host out of runConfig.ModelsGateway and
calls AllowHosts after Ensure. Empty / malformed gateway URLs are
logged at debug level and ignored \u2014 if the user isn't routing
through the gateway there's nothing to allowlist.
* gatewayHostPort handles both fully formed URLs
(https://example.com:443/proxy) and bare authorities
(example.com:443) so the existing free-form ModelsGateway value
passes through unchanged.
Verified end-to-end: 'docker agent run --sandbox gopher' against
the staging gateway, which used to return the 403 above, now
prints 'Hi! How can I help you with your Go code today?'. Debug
log shows: Allowed sandbox network access ... Rule added to policy
local (scope: sandbox:...).
Test: TestGatewayHostPort covers empty / bare / URL / port / path
/ query forms.
Surface the gateway choice between the kit summary and the sandbox
creation step so the user can see at a glance whether the inner agent
will route through Docker's AI gateway or hit the providers directly.
Distinguishing those two paths up-front turns later HTTP 403s into
'oh, the gateway host got allowlisted in the proxy' instead of 'auth
broken somewhere'.
Output examples:
Preparing docker-agent kit at \u2026
skills:
prompt files:
summary: 6 skills, 1 prompt file
Models gateway: https://ai-backend-service-stage.docker.com/proxy (allowlisting ai-backend-service-stage.docker.com in the sandbox proxy)
\u2713 Created sandbox \u2026
Preparing docker-agent kit at \u2026
\u2026
Models gateway: none (talking to providers directly)
\u2713 Created sandbox \u2026
Test: TestPrintModelsGateway covers the no-gateway case, the URL
case (shows the host that will be allow-listed) and the bare-
authority case (no separate allow-list note since host == gateway).
A second review pass over the recent sandbox commits surfaced a few real issues. None of them break the user-visible behaviour but each either leaks something it shouldn't, papers over a future bug, or trusts user input it should be validating. * cmd/root/sandbox.go: stop forwarding DOCKER_TOKEN as '-e DOCKER_TOKEN=<jwt>' inline. The full argv is logged by slog at debug level, so a freshly issued Docker Desktop bearer token was ending up in cagent.debug.log every run. Pass it by name only and set the value via cmd.Env (the EnvForAgent pattern), so the token reaches the inner without ever appearing in argv. Same treatment for DOCKER_AGENT_MODELS_GATEWAY for consistency. * cmd/root/sandbox.go: redact userinfo before printing the gateway URL. A configured gateway like 'https://user:[email protected]/proxy' used to print verbatim to stdout. displayGatewayURL now masks the user/password as '***@host' \u2014 we rebuild the string by hand because url.User on the parsed value URL-escapes asterisks into %2A%2A%2A, which is technically correct but unreadable. Also reword the unset case to 'Models gateway: none configured' instead of the previous 'talking to providers directly' which was misleading when the inner falls back to DOCKER_TOKEN / DMR. * cmd/root/sandbox.go: gatewayHostPort now handles every realistic shape \u2014 fully formed URLs, scheme-relative '//host', bare authorities with optional :port and trailing path/query/fragment, IPv6 hosts, scheme-without-host, opaque schemes ('mailto:...'), and bogus 'foo:bar://x'. New unit tests cover all of those. * pkg/sandbox/sandbox.go: re-add the legacy 'vms' JSON key fallback in ForWorkspace. Older Docker Desktop / sbx versions still wrap the list under that key; without the fallback those users would silently lose sandbox reuse and accumulate suffixed duplicates on every run. Log a warning so they know to upgrade. Test updated to assert the legacy form resolves a match. * pkg/sandbox/backend.go: AllowHosts now filters empty entries (silently) and rejects entries containing commas or whitespace (loudly). The sbx backend joins the host list with commas before forwarding it to the policy engine; an unescaped embedded comma would let a single value smuggle two distinct rules into the engine. Whitespace gets the same treatment for defence in depth. * pkg/sandbox/extras_test.go: extract Ensure's extras-cleaning loop into a small package-private cleanExtras helper and add direct tests for the canonical-path collapse case ('foo' vs 'foo/.' vs 'foo/sub/..'), the workspace-self-match drop, and order preservation across duplicates.
Running 'docker agent run --sandbox <agent>' for any agent that uses
auto-install (e.g. the gopher agent that wants gopls) failed silently:
'go install' returned an opaque "403 Blocked by network policy" from
proxy.golang.org because the sandbox proxy denies every host that
isn't explicitly allowed. Same root cause as the models-gateway 403
fixed earlier in this branch — just for a different host set.
Generalise allowGatewayHost into allowSandboxHosts. It opens the
minimum: the configured Docker AI gateway when set, plus the
well-known package hosts the toolinstall package reaches at runtime,
gated on whether the agent actually has a toolset that can
auto-install.
Detection lives in the kit builder, which already loads the agent
config:
Result.NeedsToolInstall is true iff cfg has at least one toolset
where:
- type is "mcp" or "lsp" (top-level cfg.MCPs entries are
implicitly mcp);
- Command is set (no command means nothing to look up);
- Version is not "false"/"off" (the per-toolset opt-out
[toolinstall.EnsureCommand] honours).
Package-host set when the gate fires:
github.com (release downloads via aqua)
api.github.com (latest-release lookup)
raw.githubusercontent.com (aqua registry data)
objects.githubusercontent.com (release-asset redirect target)
codeload.github.com (source-zip endpoint)
proxy.golang.org (Go module proxy)
sum.golang.org (Go checksum DB)
storage.googleapis.com (Go toolchain blob storage —
needed when a go.mod pins a
newer Go than the sandbox image)
When neither the gateway nor a tool-install toolset applies, the
function calls the backend with no hosts and short-circuits, leaving
the sandbox's strict default-deny intact.
Surface the decision so the user sees what holes were punched:
between the kit summary / models-gateway line and the sandbox-create
line we now print
Tool install: agent has at least one MCP/LSP toolset, allowlisting 8 package hosts in the sandbox proxy
— only when the gate fires.
Tests:
* TestNeedsAutoInstall covers nil cfg, empty cfg, agent with an
installable lsp, top-level mcps entry, per-toolset disable
("off"/"false", case-insensitive), no-command, and shell-type
toolsets.
* TestAutoInstallHosts spot-checks the required entries and asserts
none of them carry the comma / whitespace AllowHosts rejects.
Verified end-to-end: 'go install golang.org/x/tools/gopls@latest'
inside the sandbox now succeeds and writes a working binary, where
before it failed at the proxy.golang.org CONNECT. An agent without
any MCP/LSP toolset gets neither the install hosts nor the printed
"Tool install" line, only the gateway when it's configured.
Running '--sandbox' twice for the same workspace with different
mounts (e.g. with vs without --no-kit, or after the kit feature was
added on top of an existing setup) was leaving the user with
suffixed sandbox names and confusing 'Note: sandbox already exists,
creating ...-2 instead' warnings, ultimately ending in:
Error: opening filesystem /Users/dgageot/.agents:
open /Users/dgageot/.agents: no such file or directory
Two compounding bugs:
1. 'sbx rm <name>' prompts for confirmation when stdin isn't a TTY
('ERROR: stdin is not a terminal; use --force to skip
confirmation'). Our previous _ = rmCmd.Run() / CombinedOutput()
call was running it without --force and without stdin attached,
so every rm silently failed and the stale sandbox lived on. Then
docker / sbx create detected the name as taken and suffixed the
new sandbox with -1, -2, ... which never went away.
2. ForWorkspace returned only the FIRST match, but the same primary
workspace can end up bound to several sandboxes once suffixing
has happened. We were checking the canonical name's mounts (no
kit, no .agents), finding it stale, trying to rm it (silent
failure per #1), and creating yet another suffix on top of the
pile.
Fix:
* Add an rmExtraArgs field on the Backend struct. The sbx backend
fills it with ["--force"]; the docker backend leaves it nil
because its rm has no confirmation prompt and rejects --force.
Backend.rm wraps the per-backend invocation so callers don't have
to remember.
* New Backend.allForWorkspace returns every sandbox whose primary
workspace matches wd. Ensure walks the whole list, reuses the
first one whose mounts already cover the requested set, and
otherwise removes every match before creating a fresh sandbox.
ForWorkspace stays as the 'first match' convenience.
* The cleanup loop logs each rm at WARN when it fails so the next
diagnosis isn't silent. After a successful run there is exactly
one sandbox per workspace with the canonical name.
Verified by reproducing the user's broken state (two stale
sandboxes for the same workspace, one with the wrong mounts, one
with no useful mounts), then running 'docker agent run --sandbox
gopher': both stale sandboxes are removed and a single canonical
'docker-agent-<workspace-hash>' is created in their place.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Running an agent inside
--sandboxis the right safety story, but in practice it broke as soon as you tried any non-trivial agent. The list of issues this PR fixes (in the order I hit them while debugging):~/.codex/skills,~/.claude/skills,~/.agents/skills, andadd_prompt_filesentries (AGENTS.mdand friends) walked up fromcwdand$HOME.$HOMEinside the sandbox is unrelated to the host's$HOME..agents/skillsdiscovered above the workspace) is unreachable.docker agent run --sandbox <alias>ignored the alias and launched without the alias's destination YAML mounted.docker agent run --sandboxfailed at startup withyou first need to sign in Docker Desktop to use the Docker AI Gatewayeven though the user was signed in.*.docker.comhost (the gateway is one).goplsproduced opaque403 Blocked by network policyerrors fromproxy.golang.org.-1,-2, …) until the daemon eventually refused to start a new one withError: opening filesystem /Users/.../.agents: no such file or directory. Two compounding bugs:sbx rmrequires--forcewhen stdin isn't a TTY (we were calling it without), andForWorkspaceonly inspected the first match when several existed for the same workspace.pkg/sandboxmade the symptom space confusing while debugging the above.Solution
1. A docker-agent kit
Before launching the sandbox, the host stages a self-contained directory under
<cache>/sandbox-kits/<hash>/:It's bind-mounted read-only into the sandbox at
/agent-kit, withDOCKER_AGENT_KIT_DIR=/agent-kitforwarded so the in-sandbox resolvers find it. Every text file is run throughportcullis.Redactduring staging, so secrets that survive in~/.agents/skills/foo/SKILL.mddon't reach the sandbox.The runtime resolvers consult
DOCKER_AGENT_KIT_DIRand behave as no-ops when it isn't set:pkg/skills/local.go: when set, the search is rooted only at<kit>/skills. Host paths are skipped because they don't exist inside the sandbox.pkg/hooks/builtins/add_prompt_files.go: when set, prompt-file lookups prefer the kit over$HOME(the workspace cwd-walk still wins for files served live by the workspace mount).The kit lifetime is now bound to the cache: it's keyed by content hash and rebuilt atomically (
mkdtemp→os.Rename→ reap), and we never delete it after a run because the docker sandbox we reuse holds a hard reference to its bind-mount path.2. User-visible kit summary
Between the kit prep and the sandbox-create line, the CLI now prints what the agent will see:
workspace mountmarks files reachable through the live workspace mount (no kit copy needed);redactedmarks files whereportcullis.Redactscrubbed at least one secret. The gateway URL is rendered through a credential-safe formatter (https://user:pw@gw/...→https://***@gw/...) so it never leaks userinfo to stdout / logs.3. Network policy: allowlist the gateway and (when needed) tool-install hosts
The sandbox template's HTTP CONNECT proxy at
gateway.docker.internal:3128enforces a default-deny policy that allows the major model providers (api.anthropic.com,api.openai.com, ...) but blocks every*.docker.comhost and the package-registry / source hosts auto-install reaches at runtime. The kit pipeline now opens the minimum:runConfig.ModelsGatewayis set;github.com,api.github.com,*.githubusercontent.com,proxy.golang.org,sum.golang.org,storage.googleapis.com, …) only when the agent has at least one MCP / LSP toolset that may auto-install.Detection is done at kit build time by walking
cfg.MCPsandcfg.Agents[*].Toolsetsfor entries with typemcp/lsp, aCommandset, andVersionnot"false"/"off"(the per-toolset opt-outtoolinstall.EnsureCommandalready honours). When neither feature is in play the strict default-deny is preserved.Per-backend command spelling is hidden in
Backend.allowHostsArgs:sbx policy allow network <name> <hosts>for sbx,docker sandbox network proxy <name> --allow-host <host>for docker.gatewayHostPorthandles every realistic shape: full URLs, scheme-relative//host, bare authorities with optional:portand trailing path/query/fragment, IPv6 hosts, scheme-without-host, opaque schemes (mailto:), and bogusfoo:bar://x.4. Token forwarding without leakage
For the inner agent's startup gateway check we forward the live Docker Desktop JWT. Two important details, both surfaced through review:
OsEnvProviderfirst and would shadow Docker Desktop with a stale exportedDOCKER_TOKEN) and calldesktop.GetToken(ctx)directly — same source as the file-based refresher.-e DOCKER_TOKENname-only in argv and inject the value throughcmd.Env. The previous-e DOCKER_TOKEN=<jwt>form was leaking the live bearer token into the slog'dExecuting in sandboxdebug log every run.The same name-only pattern is used for
DOCKER_AGENT_MODELS_GATEWAY, both for argv-leak hygiene and consistency.5. Sandbox lifecycle: stop accumulating suffixed sandboxes
Two compounding bugs were causing
Note: sandbox already exists, creating ...-2 insteadwarnings to pile up across runs, eventually leading toError: opening filesystem /Users/.../.agents:sbx rm <name>prompts for confirmation when stdin isn't a TTY (ERROR: stdin is not a terminal; use --force to skip confirmation). Our previous rm was running without--forceand without an attached stdin, so every rm silently failed and the stale sandbox lived on. Thensbx createsaw the name as taken and suffixed the new one with-1,-2, …ForWorkspacereturned only the first match. Once suffixing had happened, the canonical name (<workspace>with no suffix) was the oldest entry; we'd inspect its mounts (stale), try to rm it (silent failure per above), and create yet another suffix on top.Backendnow carries anrmExtraArgsfield —["--force"]for sbx, nil for docker (whosermhas no prompt and rejects the flag). NewBackend.allForWorkspacereturns every sandbox whose primary workspace matches;Ensurewalks the whole list, reuses the first one whose mounts already cover the requested set (extra read-only mounts are harmless), and otherwise removes every match before creating fresh. After a run there is exactly one sandbox per workspace with the canonical name.6. Latent regression fixes (drove half of the debugging)
pkg/sandbox/sandbox.go—docker sandbox ls --jsonandsbx ls --jsonboth return{"sandboxes": [...]}now; the code was still readingvms. Switched to the new key, kept a fallback for older CLIs with a warn-level log.cmd/root/flags.go—addGatewayFlagsmaterialisedrunConfig.EnvProvider()before invoking the parentPersistentPreRunE. The env-provider chain caches its result, so anypaths.SetConfigDirfrom--config-dir(set by the parent) was invisible — the in-sandboxSandboxTokenProviderthen read from the wrong path on existing v1.59.x sandbox images. Fixed the ordering; regression-tested.pkg/sandbox/args.go—ExtraWorkspacedid its ownfilepath.Abs+ extension check, which produced""for alias names likegopher. Delegated toconfig.Resolve+Source.ParentDirso the alias's destination YAML actually gets mounted.Code organisation
pkg/promptfilesadd_prompt_fileslookup; consumed by both the runtime hook and the kit builderpkg/sandbox/kitportcullis.Redact, writes a manifest, atomic-promotes the staging dir, computesNeedsToolInstallpkg/sandboxBackend.Ensure(extras []string),Backend.AllowHosts(name, hosts),Backend.allForWorkspace,Backend.rm(with per-backendrmExtraArgs),cleanExtras(path-canonical dedup), the legacy-key fallback inForWorkspacecmd/root/sandbox.goTests
ghp_…-shaped tokens, workspace-scoped exclusion vs parent-dir staging, symlink-escape rejection, intra-root symlinks, executable-bit preservation, on-disk manifest leakage check, concurrentBuildsafety,hashKeycanonicalisation across path formsNeedsToolInstallmcpsentry, per-toolset disable ("off"/"false", case-insensitive), no-command, shell-type toolsetspkg/skillsandpkg/promptfiles: kit-aware fallbacks,~/$HOMEcollapsing~collapsing / nil-receiver / empty-kit casesgatewayHostPortacross every URL shape;displayGatewayURLuserinfo masking;printModelsGatewaycases including credentials-bearing URLs;autoInstallHostsspot-checkcleanExtrascanonical-path collapse, workspace-self-match drop, order preservation;AllowHostsrejects comma/whitespace, drops empties;ForWorkspacelegacyvmsfallbackaddGatewayFlagsparent ordering regression testExtraWorkspaceregression test for the--sandbox <alias>casetask build,task test,task lintall green.Verified end-to-end
docker agent run --sandbox gopheragainst the Docker AI Gateway: kit prepared, gateway allowlisted, token forwarded, agent answers. The original403 Blocked by network policyandyou first need to sign in Docker Desktoperrors are gone.go install golang.org/x/tools/gopls@latestinside the sandbox now succeeds and writes a working binary.Tool install:line; the strict default-deny stays.Commits (15)