Skip to content

feat(acp): add sessions_spawn streamTo parent relay for ACP spawns#34055

Closed
vincentkoc wants to merge 46 commits intomainfrom
vincentkoc-code/cli-acpx-reliability-20260304
Closed

feat(acp): add sessions_spawn streamTo parent relay for ACP spawns#34055
vincentkoc wants to merge 46 commits intomainfrom
vincentkoc-code/cli-acpx-reliability-20260304

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

Summary

  • Problem: ACP sessions_spawn runs are non-blocking but gave poor parent-session visibility during long or blocked initial turns.
  • Why it matters: Orchestrator sessions could not see child progress/no-output state without external process inspection.
  • What changed:
    • Added sessions_spawn.streamTo: "parent" support for runtime: "acp".
    • Added ACP parent relay that forwards initial child-run progress/no-output/completion/failure updates into requester session system events.
    • When streamTo: "parent" is used, direct child delivery for that initial run is disabled.
    • Added tests for relay behavior, ACP spawn behavior, and schema/tool forwarding.
    • Updated docs and changelog.
  • What did NOT change (scope boundary):
    • No persistent tail-log file or full WebSocket parity implementation yet; this addresses parent visibility for the initial run path.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • sessions_spawn now accepts streamTo: "parent" for ACP runtime.
  • With runtime: "acp" + streamTo: "parent", requester session receives progress/no-output/completion/failure system-event updates for the initial child run.
  • Initial child run uses deliver: false in that mode to avoid direct child-delivery duplication.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 22+
  • Model/provider: ACP harness session path
  • Integration/channel (if any): sessions/tool runtime path
  • Relevant config (redacted): ACP enabled

Steps

  1. Run sessions_spawn with { runtime: "acp", streamTo: "parent", task: "..." } from an active requester session.
  2. Observe system-event updates in requester session while child run executes.
  3. Verify direct child delivery is disabled for that initial run.

Expected

  • Parent session receives progress/no-output/completion updates for initial ACP run.

Actual

  • Covered by new tests and implementation.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • relay progress + completion updates
    • no-output + resumed update behavior
    • ACP spawn streamTo parent dispatch behavior (deliver=false)
    • sessions tool schema/forwarding updates
  • Edge cases checked:
    • streamTo: "parent" rejected when requester session context is missing
  • What you did not verify:
    • end-to-end channel UX on live gateway/channel environment

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly:
    • stop using streamTo: "parent" (default behavior unchanged)
    • revert relay + ACP spawn stream target commits
  • Files/config to restore:
    • src/agents/acp-spawn-parent-stream.ts
    • src/agents/acp-spawn.ts
    • src/agents/tools/sessions-spawn-tool.ts
  • Known bad symptoms reviewers should watch for:
    • missing or noisy requester system-event updates
    • unexpected delivery suppression when not using streamTo: "parent"

Risks and Mitigations

  • Risk:
    • Parent relay could produce noisy updates on very chatty child output.
    • Mitigation:
      • Buffered/compacted forwarding with periodic flush and bounded snippets.
  • Risk:
    • Missing requester session key with streamTo: "parent" could silently fail.
    • Mitigation:
      • Explicit validation error when requester session context is unavailable.

@vincentkoc vincentkoc marked this pull request as ready for review March 4, 2026 06:30
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 4, 2026

Greptile Summary

This PR adds streamTo: "parent" support for runtime: "acp" sessions spawned via sessions_spawn, enabling the orchestrator session to receive child-run progress, stall notices, and completion/failure updates as system events instead of relying on direct child delivery.

Changes:

  • New acp-spawn-parent-stream.ts relay: buffers assistant deltas with periodic flush, detects no-output stalls, and self-cleans on lifecycle end/error events.
  • acp-spawn.ts: validates streamTo: "parent" requires an active requester session key, sets deliver: false for the initial child run when streaming to parent, and starts the relay after the gateway call succeeds.
  • sessions-spawn-tool.ts: adds streamTo to the schema and forwards it to spawnAcpDirect for ACP runs.
  • Test coverage added for relay behavior, ACP spawn dispatch, and schema forwarding.

Issues found:

  • The dispose handle returned by startAcpSpawnParentStreamRelay is discarded in acp-spawn.ts. The relay self-cleans when lifecycle events fire, but if neither end nor error ever fires (child crash, hung run), the setInterval watcher and onAgentEvent subscription will leak with no external way to force cleanup.
  • streamTo: "parent" is silently ignored when paired with runtime: "subagent" in sessions-spawn-tool.ts; callers will get no error or warning.
  • Minor: the stopNoOutputWatcher variable in acp-spawn-parent-stream.ts holds a NodeJS.Timeout (timer ID), not a stop function — the name is misleading relative to the flushTimer convention used in the same file.

Confidence Score: 3/5

  • Safe to merge with the relay resource-leak addressed; default behavior is unchanged.
  • The core feature logic is correct and backward-compatible — deliver: false is only set when streamTo: "parent" is explicitly requested, validation for a missing session context is in place, and test coverage is solid. The score is reduced because the discarded dispose handle introduces a real (if low-frequency) resource leak: any child run that never emits a lifecycle event will leave a live setInterval and event subscription with no external cleanup path. The silent ignore of streamTo for subagent runtime is a DX gap that could cause silent misconfiguration.
  • src/agents/acp-spawn.ts (discarded relay dispose handle) and src/agents/tools/sessions-spawn-tool.ts (silent streamTo drop for subagent).

Last reviewed commit: 9726ec5

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9726ec5238

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 763045210a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +118 to +123
fs.mkdirSync(path.dirname(logPath), {
recursive: true,
});
const line = `${JSON.stringify(entry)}\n`;
fs.appendFileSync(logPath, line, {
encoding: "utf-8",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid blocking sync I/O in relay delta hot path

The relay logs every assistant delta via writeLogLine, which currently does fs.mkdirSync and fs.appendFileSync for each event. In streamTo: "parent" runs with long outputs, this introduces synchronous disk I/O on the event loop for nearly every token and can slow both gateway responsiveness and relay updates. Because logPath is populated for normal ACP sessions, this path is exercised in production; use async/buffered writes (and one-time directory creation) to prevent per-delta blocking.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 3243053. Relay logging no longer uses sync fs calls in the assistant-delta path. It now buffers JSONL lines in-memory and flushes with async mkdir/appendFile via a serialized write chain (with one-time dir creation), so token streaming does not block the event loop with per-delta sync I/O.

Comment on lines +232 to +235
emit(
`Started ${relayLabel} session ${params.childSessionKey}. Streaming progress updates to parent session.`,
`${contextPrefix}:start`,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Delay start notice until child dispatch is accepted

The relay emits Started ... Streaming progress updates immediately on registration, but registration happens before spawnAcpDirect calls callGateway("agent"). If dispatch then fails (for example gateway timeout/unavailable), spawnAcpDirect returns an error and disposes the relay, yet this start event is still queued and later injected into the parent prompt as trusted runtime metadata even though no child run started. Emitting the start notice only after dispatch success (or adding a compensating failure event) avoids this false state.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 3243053. We now register the relay with emitStartNotice: false before dispatch, and call notifyStarted() only after callGateway("agent") succeeds. On dispatch failure, we dispose the relay without emitting the start notice. See src/agents/acp-spawn.ts and src/agents/acp-spawn-parent-stream.ts.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61ec2233f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +245 to +246
toTrimmedString((data as { delta?: unknown } | undefined)?.delta) ??
toTrimmedString((data as { text?: unknown } | undefined)?.text);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve whitespace in relayed assistant deltas

In startAcpSpawnParentStreamRelay, assistant chunks are normalized with toTrimmedString before buffering, which strips leading/trailing whitespace from every delta. ACP streaming deltas commonly include leading spaces/newlines (for example tokenized chunks like " world"), so this causes words and paragraph boundaries to collapse in parent progress updates (streamTo: "parent"), making the relay summaries inaccurate exactly when direct child delivery is disabled. Keep raw delta text for buffering and only trim for emptiness checks.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 96771db. Assistant delta buffering now preserves raw chunk whitespace and only trims for emptiness checks, so token boundaries like leading spaces/newlines are not dropped before relay buffering. Added regression coverage in acp-spawn-parent-stream.test.ts (preserves delta whitespace boundaries in progress relays).

@osolmaz osolmaz closed this Mar 4, 2026
@osolmaz osolmaz deleted the vincentkoc-code/cli-acpx-reliability-20260304 branch March 4, 2026 10:49
@osolmaz
Copy link
Copy Markdown
Contributor

osolmaz commented Mar 4, 2026

This is merged in #34310, needed to address codex review issues on my own branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation maintainer Maintainer-authored PR size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants