Skip to content

cli: Add explicit daemon lifecycle commands#376

Merged
peterj merged 12 commits intoagentregistry-dev:mainfrom
timflannagan:refactor/explicit-daemon-lifecycle
Mar 20, 2026
Merged

cli: Add explicit daemon lifecycle commands#376
peterj merged 12 commits intoagentregistry-dev:mainfrom
timflannagan:refactor/explicit-daemon-lifecycle

Conversation

@timflannagan
Copy link
Copy Markdown
Collaborator

@timflannagan timflannagan commented Mar 18, 2026

Description

The root command's PersistentPreRunE hook previously auto-started
Docker containers (postgres + registry server) whenever the
registry URL targeted localhost:12121. This was a footgun for
users with existing registries, e.g. port-forwarding a Kubernetes
registry to localhost:12121 would risk silently spinning up a
separate local instance if the port-forward dropped.

We now follow the Docker CLI/daemon model where the CLI never
auto-starts infrastructure and fails fast if the registry is
unreachable. Daemon lifecycle is managed explicitly through
arctl daemon start, stop, and status subcommands. The stop
command accepts a --purge flag to also remove data volumes.

The client connectivity check is also simplified from 3 retries
with exponential backoff to a single ping, since the retries
only existed to compensate for the race between auto-starting
containers and the client connecting.

Fixes #307.

Change Type

/kind fix

Changelog

NONE

Additional Notes

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the CLI’s implicit “auto-start local docker-compose daemon when targeting localhost” behavior and replaces it with explicit daemon lifecycle commands (arctl daemon start|stop|status), so the CLI fails fast when the registry is unreachable rather than silently spinning up local infrastructure.

Changes:

  • Add explicit daemon command group with start, stop (with --purge), and status.
  • Extend the docker-compose daemon manager with Stop() / Purge() and refactor compose command construction.
  • Simplify client connectivity verification from retry/backoff to a single Ping().

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/types/types.go Extends DaemonManager interface with stop/purge lifecycle methods.
pkg/daemon/dockercompose/manager.go Implements Stop()/Purge() via docker compose down, refactors command construction.
pkg/cli/root.go Removes daemon auto-start logic; adds daemon command to root; simplifies client creation and error message.
pkg/cli/root_test.go Updates tests to reflect new pre-run behavior and error messaging.
pkg/cli/commands_test.go Updates command tree expectations to include daemon.
internal/client/client.go Removes retry-based ping logic; uses single Ping() for connectivity check.
internal/cli/daemon/daemon.go Introduces daemon command tree wiring to the daemon manager.
internal/cli/daemon/daemon_test.go Adds unit tests for daemon start/stop/status commands.
Comments suppressed due to low confidence (1)

pkg/cli/root.go:41

  • Removing CLIOptions.DaemonManager is a compile-time breaking change for any external code configuring the CLI via cli.Configure(). If pkg/cli is intended to be used as a library, consider keeping the field (even if unused by default) or introducing a new options struct/versioned API to avoid breaking downstream builds.
// CLIOptions configures the CLI behavior.
// Can be extended for more options (e.g. client factory).
type CLIOptions struct {
	// AuthnProviderFactory provides CLI-specific authentication.
	AuthnProviderFactory types.CLIAuthnProviderFactory

	// OnTokenResolved is called when a token is resolved.
	// This allows extensions to perform additional actions when a token is resolved (e.g. storing locally).
	OnTokenResolved func(token string) error

	// ClientFactory creates the API client. If nil, uses client.NewClientWithConfig (requires network).
	ClientFactory ClientFactory
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@timflannagan timflannagan added the work in progress This pr is still being worked on label Mar 18, 2026
@timflannagan
Copy link
Copy Markdown
Collaborator Author

Need to figure out the e2e failures a bit. We were relying on the implicit behavior before. Have a couple of options here, but need to explore a bit more locally the right approach here.

@timflannagan timflannagan force-pushed the refactor/explicit-daemon-lifecycle branch from 999c614 to 6ab748a Compare March 19, 2026 18:46
@timflannagan timflannagan removed the work in progress This pr is still being worked on label Mar 19, 2026
The root command's PersistentPreRunE hook previously auto-started
Docker containers (postgres + registry server) whenever the
registry URL targeted localhost:12121. This was a footgun for
users with existing registries, e.g. port-forwarding a Kubernetes
registry to localhost:12121 would risk silently spinning up a
separate local instance if the port-forward dropped.

We now follow the Docker CLI/daemon model where the CLI never
auto-starts infrastructure and fails fast if the registry is
unreachable. Daemon lifecycle is managed explicitly through
arctl daemon start, stop, and status subcommands. The stop
command accepts a --purge flag to also remove data volumes.

The client connectivity check is also simplified from 3 retries
with exponential backoff to a single ping, since the retries
only existed to compensate for the race between auto-starting
containers and the client connecting.
Follows Go naming conventions where the package name already
provides context, so daemon.New reads better than
daemon.NewDaemonCmd.
Extract a composeCmd helper in the docker compose manager to
eliminate duplicated exec.Cmd construction across Start, down,
and isContainerRunning. This also fixes a subtle inconsistency
where down and isContainerRunning were using the raw ComposeYAML
instead of the patched output from getComposeYAML.

Remove the IsRunning guards from the start and stop subcommands
since docker compose up and down are idempotent operations. The
guards introduced a TOCTOU race without adding value.
We now use a GOTESTSUM variable across unit, integration, and e2e test targets so
we can override the command path in different environments without editing the
Makefile each time.
This brings the e2e harness in line with the explicit daemon lifecycle behavior
and removes backend-specific assumptions from shared tests.

Previously, docker e2e targets depended on compose setup while k8s tests still
assumed localhost URLs. That caused drift and failures when running against
Kind load balancer endpoints.

Now, docker e2e starts and purges through arctl daemon commands, k8s discovers
the registry LoadBalancer URL at runtime, and daemon startup always pulls the
latest configured image tag.
This prevents local-only CLI commands from failing due to registry connectivity checks
and keeps command behavior aligned with explicit daemon control.

Previously, root pre-run setup still ran for local commands and completion
flows, so commands like completion, build, and version could fail with
connection errors when the daemon was not running.

Now, those command paths are explicitly skipped in pre-run setup, tests cover
the skip matrix, and version falls back to a direct client so --json still
works when pre-run initialization is bypassed.
This updates user and governance docs to reflect the explicit daemon controls.

The docs now direct users to run arctl daemon start before registry commands,
and they call out rerunning daemon start after upgrading the CLI.
This keeps the e2e target readable by moving daemon lifecycle actions into
explicit helper targets.

Now test-e2e-docker depends on daemon-start and uses daemon-stop-purge in its
EXIT trap, so the command flow is clearer and easier to maintain.
This consolidates local lifecycle management around daemon commands and
keeps the CLI path configurable.

run-docker and down now call daemon targets, docker-compose helper targets are
removed, and daemon helper commands use ARCTL with a default of ./bin/arctl.
This keeps the quick reference in sync with the Makefile lifecycle changes
by pointing to daemon-start and daemon-stop instead of removed compose targets.
This keeps the fallback note but shortens the wording around the version
client initialization path when root pre-run is skipped.

Signed-off-by: timflannagan <[email protected]>
This makes backend intent explicit in deploy e2e tests and removes repeated
string checks spread across multiple cases.

The local-only lifecycle test is now named accordingly, unsupported targets are
skipped through a shared helper, and target constants make local versus
kubernetes behavior easier to read and maintain.
@timflannagan timflannagan force-pushed the refactor/explicit-daemon-lifecycle branch from 3f40d46 to 2a8cc8c Compare March 20, 2026 21:48
@peterj peterj enabled auto-merge March 20, 2026 21:51
@peterj peterj added this pull request to the merge queue Mar 20, 2026
Merged via the queue into agentregistry-dev:main with commit 7f03a94 Mar 20, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve daemon lifecycle management

3 participants