refactor: Avoid synthetic prompt for approval continuations#2598
refactor: Avoid synthetic prompt for approval continuations#2598daryllimyt merged 17 commits intomainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be1d4381c1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
This comment has been minimized.
This comment has been minimized.
|
✅ No security or compliance issues detected. Reviewed everything up to 70c4cfe. Security OverviewDetected Code Changes
|
There was a problem hiding this comment.
1 issue found across 14 files
Confidence score: 5/5
- Low-risk change overall: the only reported issue is low severity (3/10) and appears confined to test coverage rather than runtime behavior.
- The key concern is in
tests/temporal/test_durable_agent_workflow.py, where dropping the failure payload assertion weakens retry validation and could let an error-result write regression slip through. - Pay close attention to
tests/temporal/test_durable_agent_workflow.py- restore the failure payload assertion so the retry path still verifies the error result is persisted.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="tests/temporal/test_durable_agent_workflow.py">
<violation number="1" location="tests/temporal/test_durable_agent_workflow.py:816">
P3: Restore the failure payload assertion here; otherwise the retry test no longer verifies the error result was written.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
380c6dd to
d255055
Compare
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d255055467
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e1994872c0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 44b691d305
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Found 7 test failures on Blacksmith runners: Failures
|

Summary
isMetacontinuation tickuser/tool_resulthistory rows so the model sees the real approved or denied tool resultExisting Pending Approvals
This should be safe for workflows that are currently waiting on approval. Those workflows are paused before post-approval execution at
await self.approvals.wait(). After the approval signal arrives, the new code runs the post-approval path: execute approved/denied tools, reconcile the session history by inserting the realuser/tool_resultrow, reload the session history, then resume Claude Code with the hidden meta continuation tick.Existing pending approvals should already have the assistant
tool_userow and Claude Code interrupt artifacts in session history. The reconciliation step is designed for that state: find the matching assistanttool_use, remove interrupt artifacts including legacy"interrupted"placeholders, and insert the real tool result immediately after the tool use.Caveats:
tool_use, reconciliation logs and returns without inserting atool_result.Attempted Alternative and Limitation
We also tried a cleaner continuation shape: keep session service as the canonical owner of the approved
user/tool_resultrow, seed Claude Code with history only through the original assistanttool_use, then send that finaltool_resultas the first resumed SDK stdin message instead of using a hiddenContinue.tick.That does not work correctly with the current Claude Code SDK/CLI behavior. In live cluster testing, the DB showed the original approved tool result persisted correctly, but Claude Code inserted a synthetic assistant row (
<synthetic>/No response requested.) before the replayed stdintool_result. That broke the cleanassistant tool_use -> user tool_resultadjacency and the model requested the same tool again, creating a second pending approval for the same user request.TODO: revisit this if Claude Code exposes a supported “resume and consume pending tool_result” API, or if the SDK/CLI allows passing a tool result without inserting the synthetic assistant placeholder. Until then, this PR keeps the more reliable design: preseed the reconciled
tool_resultinto session history and use a hidden metadata continuation tick, while filtering those control-plane rows from normal chat history.Tests
uv run pytest tests/unit/test_agent_runtime.py tests/unit/test_agent_session_activities.py tests/unit/test_agent_session_approval_sink.py::test_replace_interrupt_with_tool_results_replaces_legacy_interrupted_row tests/temporal/test_durable_agent_workflow.py::test_agent_workflow_routes_approved_tools_to_executor_and_reconciles_history tests/temporal/test_durable_agent_workflow.py::test_agent_workflow_does_not_retry_approved_tool_failures -quv run ruff check packages/tracecat-ee/tracecat_ee/agent/workflows/durable.py tests/integration/test_agent_worker.py tests/temporal/test_durable_agent_workflow.py tests/unit/test_agent_runtime.py tests/unit/test_agent_session_activities.py tests/unit/test_agent_session_approval_sink.py tests/unit/test_agent_session_messages.py tests/unit/test_agent_socket_io.py tests/unit/test_vercel_adapter.py tracecat/agent/adapter/vercel.py tracecat/agent/common/protocol.py tracecat/agent/executor/activity.py tracecat/agent/executor/schemas.py tracecat/agent/runtime/claude_code/runtime.py tracecat/agent/session/activities.py tracecat/agent/session/service.pyuv run ruff format --check packages/tracecat-ee/tracecat_ee/agent/workflows/durable.py tests/integration/test_agent_worker.py tests/temporal/test_durable_agent_workflow.py tests/unit/test_agent_runtime.py tests/unit/test_agent_session_activities.py tests/unit/test_agent_session_approval_sink.py tests/unit/test_agent_session_messages.py tests/unit/test_agent_socket_io.py tests/unit/test_vercel_adapter.py tracecat/agent/adapter/vercel.py tracecat/agent/common/protocol.py tracecat/agent/executor/activity.py tracecat/agent/executor/schemas.py tracecat/agent/runtime/claude_code/runtime.py tracecat/agent/session/activities.py tracecat/agent/session/service.pyuv run basedpyright --warnings packages/tracecat-ee/tracecat_ee/agent/workflows/durable.py tests/integration/test_agent_worker.py tests/temporal/test_durable_agent_workflow.py tests/unit/test_agent_runtime.py tests/unit/test_agent_session_activities.py tests/unit/test_agent_session_approval_sink.py tests/unit/test_agent_session_messages.py tests/unit/test_agent_socket_io.py tests/unit/test_vercel_adapter.py tracecat/agent/adapter/vercel.py tracecat/agent/common/protocol.py tracecat/agent/executor/activity.py tracecat/agent/executor/schemas.py tracecat/agent/runtime/claude_code/runtime.py tracecat/agent/session/activities.py tracecat/agent/session/service.py