fix(orchestrator): preserve completed responses on client disconnect by Chengxiwei · Pull Request #1059 · looplj/axonhub

Chengxiwei · 2026-03-14T10:20:25Z

What this fixes

When the client disconnects near the end of a streamed response, AxonHub may observe ctx cancellation before the local stream consumer sees the terminal event.

In the Claude Code -> AxonHub -> Codex chain, that can misclassify an actually completed response as canceled / failed even though the full response has already been streamed upstream.

Approach

This update keeps the completion proof generic instead of special-casing OpenAI Responses status values.

The outbound finalization logic now treats a response as completed when either:

an explicit terminal event was observed
or aggregation succeeds and meta.Usage != nil

At the same time:

explicit upstream stream errors still remain failures
ctx cancellation alone is no longer treated as automatic failure if completion has already been proven
missing usage is not treated as proof of failure; it is just absence of one completion signal

Why this shape

The maintainer feedback was to avoid protocol-specific completion handling in the orchestrator.

Using aggregated usage as an additional positive completion signal makes the rule work across providers / formats that already populate usage on complete responses, while still preserving terminal-event-based completion for streams that do not rely on usage.

Tests

This PR includes:

targeted orchestrator regression coverage in internal/server/orchestrator/outbound_test.go
an integration-style client-disconnect regression in integration_test/openai/responses/streaming/client_disconnect_test.go

Covered cases include:

missing usage does not count as aggregated completion
aggregated usage counts as completion
client cancel after the response is effectively complete is still persisted as completed
explicit stream errors still remain failures

Verification

go test ./internal/server/orchestrator/...

gemini-code-assist · 2026-03-14T10:20:43Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical edge case in the orchestrator's outbound streaming logic where client disconnections near the completion of a stream could lead to incorrectly marking a fully completed response as 'canceled' or 'failed'. The changes introduce a more robust finalization mechanism that aggregates partial stream data to determine the true completion status, ensuring that valid responses are always persisted correctly, regardless of client-side timing issues. This improves the reliability of stream processing, particularly in complex multi-hop proxy environments.

Highlights

Outbound Stream Finalization Logic: The Close method in OutboundPersistentStream was refactored to intelligently determine the final status of a streaming response, especially when a client disconnects prematurely. It now aggregates collected chunks to check for completion even if a terminal event was not explicitly received.
Protocol-Aware Completion Detection: Introduced logic to detect completion based on the API format. For OpenAI Responses, a status: 'completed' field is required in the aggregated payload. For chat completions, completion can be inferred even without an explicit status field.
Preservation of Completed Responses: If the aggregated response is determined to be complete, the execution status is now correctly persisted as completed, even if the downstream client disconnected or the context was canceled before a terminal event was processed.
Enhanced Error Handling: Explicit upstream stream errors (not just context cancellations) are now treated as failures, ensuring that genuinely incomplete or corrupted responses are correctly marked.
Detailed Finalization Diagnostics: Added comprehensive logging to trace the decision-making process during stream finalization, providing better insights into why a stream was marked as completed, canceled, or failed.
New Integration Test Case: A new integration test (TestResponsesStreamingClientDisconnectAfterOutputTextDone) was added to specifically simulate a client disconnecting immediately after receiving the final text payload but before the explicit response.completed event, verifying the fix in a realistic scenario.

Changelog

integration_test/openai/responses/streaming/client_disconnect_test.go
- Added TestResponsesStreamingClientDisconnectAfterOutputTextDone to simulate client disconnects after final text output but before explicit completion events.
internal/server/orchestrator/outbound.go
- Imported encoding/json for JSON unmarshalling.
- Refactored Close method to incorporate aggregation logic for determining stream completion status.
- Introduced logFinalizationDecision function for detailed logging of stream finalization outcomes.
- Added aggregatedOutboundResponseStatus function to extract status from aggregated JSON response bodies.
- Implemented persistAggregatedResponse function to handle persistence of aggregated responses, including usage logs and metrics.
- Created isCompletedAggregatedOutboundResponse function to perform protocol-aware completion detection based on API format and response status.
internal/server/orchestrator/outbound_test.go
- Added new imports for authz, enttest, request, and requestexecution.
- Modified mockTransformer to include fields for aggregatedResponse, aggregatedMeta, aggregatedErr, and apiFormat to support new aggregation tests.
- Updated mockTransformer.AggregateStreamChunks and mockTransformer.APIFormat to use the new fields.
- Added sliceEventStream helper struct for mocking stream events in tests.
- Introduced TestIsCompletedAggregatedOutboundResponse to test the new completion detection logic.
- Added TestOutboundPersistentStream_Close_AggregatedResponsesCompletionHandling with sub-tests covering various scenarios: in-progress responses, aggregated completed responses, and client-canceled streams with completed aggregated responses.

Activity

The author, Chengxiwei, has implemented new logic to preserve completed responses even if the client disconnects prematurely.
New unit tests were added in internal/server/orchestrator/outbound_test.go to cover various scenarios of aggregated response completion and client disconnection.
An integration test was added in integration_test/openai/responses/streaming/client_disconnect_test.go to validate the fix in a real-world client disconnect scenario.
Verification steps using go test and specific integration test commands were provided by the author.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses an important edge case in stream handling where a completed stream could be incorrectly marked as canceled if the client disconnects near completion. The fix introduces logic to aggregate response chunks and determine completion status from the aggregated payload, even without a terminal stream event. The changes are well-tested with new integration and unit tests. My review focuses on a couple of areas for simplification and improved maintainability in the new logic.

internal/server/orchestrator/outbound.go

looplj · 2026-03-14T11:29:32Z

应该可以做的通用一点，不需要针对 response 特殊处理。

agg 以后，如果 usage 有数据，应该就是完整的了

Chengxiwei · 2026-03-14T11:56:34Z

Updated this based on your suggestion.

The completion check is now generic instead of special-casing Responses:

explicit terminal events still count as completion
after aggregation, meta.Usage != nil is treated as an additional positive completion signal
explicit upstream stream errors still remain failures

So ctx cancellation by itself no longer forces a failed / canceled outcome if completion has already been proven.

I also cleaned up the earlier review leftovers around the removed status-based path and kept the regression coverage for the client-disconnect case.

internal/server/orchestrator/outbound.go

gemini-code-assist bot reviewed Mar 14, 2026

View reviewed changes

internal/server/orchestrator/outbound.go Show resolved Hide resolved

internal/server/orchestrator/outbound.go Outdated Show resolved Hide resolved

looplj reviewed Mar 14, 2026

View reviewed changes

internal/server/orchestrator/outbound.go Outdated Show resolved Hide resolved

Chengxiwei force-pushed the pr/outbound-client-disconnect branch 2 times, most recently from 5c88768 to f913e45 Compare March 14, 2026 13:18

fix(orchestrator): preserve completed responses on client disconnect

fed294c

Chengxiwei force-pushed the pr/outbound-client-disconnect branch from f913e45 to fed294c Compare March 14, 2026 13:20

looplj approved these changes Mar 14, 2026

View reviewed changes

looplj merged commit 6b11671 into looplj:release/v0.9.x Mar 14, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(orchestrator): preserve completed responses on client disconnect#1059

fix(orchestrator): preserve completed responses on client disconnect#1059
looplj merged 1 commit intolooplj:release/v0.9.xfrom
Chengxiwei:pr/outbound-client-disconnect

Chengxiwei commented Mar 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

looplj commented Mar 14, 2026

Uh oh!

Chengxiwei commented Mar 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Chengxiwei commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this fixes

Approach

Why this shape

Tests

Verification

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

looplj commented Mar 14, 2026

Uh oh!

Chengxiwei commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Chengxiwei commented Mar 14, 2026 •

edited

Loading

Chengxiwei commented Mar 14, 2026 •

edited

Loading