Skip to content

Commit c3c7a99

Browse files
stainluobviyus
andauthored
fix: repair sanitized replay tool results before send (#67620) (thanks @stainlu)
* fix(agents): preserve native Anthropic tool IDs for hybrid providers Fixes #66892 MiniMax and other hybrid providers use api.minimaxi.com/anthropic (modelApi: anthropic-messages), which generates and expects native Anthropic tool_call_ids in toolu_* format. The hybrid replay policy (buildHybridAnthropicOrOpenAIReplayPolicy) applied strict sanitization that stripped underscores from these IDs, causing MiniMax to reject them with error 2013. The native Anthropic provider already preserved these IDs via preserveNativeAnthropicToolUseIds (added in 4613f12). This commit enables the same flag for the hybrid anthropic-messages branch, so toolu_* IDs pass through unsanitized while other synthetic IDs still get strict cleanup. * fix(agents): repair sanitized replay tool results before send * fix: repair sanitized replay tool results before send (#67620) (thanks @stainlu) * fix: preserve aborted-span tool results during replay sanitize (#67620) (thanks @stainlu) --------- Co-authored-by: Ayaan Zaidi <[email protected]>
1 parent de129a6 commit c3c7a99

5 files changed

Lines changed: 145 additions & 18 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ Docs: https://docs.openclaw.ai
4040
- Agents/tool-loop: enable the unknown-tool stream guard by default. Previously `resolveUnknownToolGuardThreshold` returned `undefined` unless `tools.loopDetection.enabled` was explicitly set to `true`, which left the protection off in the default configuration. A hallucinated or removed tool (for example `himalaya` after it was dropped from `skills.allowBundled`) would then loop "Tool X not found" attempts until the full embedded-run timeout. The guard has no false-positive surface because it only triggers on tools that are objectively not registered in the run, so it now stays on regardless of `tools.loopDetection.enabled` and still accepts `tools.loopDetection.unknownToolThreshold` as a per-run override (default 10). (#67401) Thanks @xantorres.
4141
- TUI/streaming: add a client-side streaming watchdog to `tui-event-handlers` so the `streaming · Xm Ys` activity indicator resets to `idle` after 30s of delta silence on the active run. Guards against lost or late `state: "final"` chat events (WS reconnects, gateway restarts, etc.) leaving the TUI stuck on `streaming` indefinitely; a new system log line surfaces the reset so users know to send a new message to resync. The window is configurable via the new `streamingWatchdogMs` context option (set to `0` to disable), and the handler now exposes a `dispose()` that clears the pending timer on shutdown. (#67401) Thanks @xantorres.
4242
- Extensions/lmstudio: add exponential backoff to the inference-preload wrapper so an LM Studio model-load failure (for example the built-in memory guardrail rejecting a load because the swap is saturated) no longer produces a WARN line every ~2s for every chat request. The wrapper now records consecutive preload failures per `(baseUrl, modelKey, contextLength)` tuple with a 5s → 10s → 20s → … → 5min cooldown and skips the preload step entirely while a cooldown is active, letting chat requests proceed directly to the stream (the model is often already loaded via the LM Studio UI). The combined `preload failed` log line now reports consecutive-failure count and remaining cooldown so operators can act on the real issue instead of drowning in repeated warnings. (#67401) Thanks @xantorres.
43+
- Agents/replay: re-run tool/result pairing after strict replay tool-call ID sanitization on outbound requests so Anthropic-compatible providers like MiniMax no longer receive malformed orphan tool-result IDs such as `...toolresult1` during compaction and retry flows. (#67620) Thanks @stainlu.
4344

4445
## 2026.4.15-beta.1
4546

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
import type { AgentMessage } from "@mariozechner/pi-agent-core";
2+
import { describe, expect, it } from "vitest";
3+
import { sanitizeReplayToolCallIdsForStream } from "./attempt.tool-call-normalization.js";
4+
5+
describe("sanitizeReplayToolCallIdsForStream", () => {
6+
it("drops orphaned tool results after strict id sanitization", () => {
7+
const messages: AgentMessage[] = [
8+
{
9+
role: "toolResult",
10+
toolCallId: "call_function_av7cbkigmk7x1",
11+
toolUseId: "call_function_av7cbkigmk7x1",
12+
toolName: "read",
13+
content: [{ type: "text", text: "stale" }],
14+
isError: false,
15+
} as never,
16+
];
17+
18+
expect(
19+
sanitizeReplayToolCallIdsForStream({
20+
messages,
21+
mode: "strict",
22+
repairToolUseResultPairing: true,
23+
}),
24+
).toEqual([]);
25+
});
26+
27+
it("keeps matched assistant and tool-result ids aligned", () => {
28+
const rawId = "call_function_av7cbkigmk7x1";
29+
const messages: AgentMessage[] = [
30+
{
31+
role: "assistant",
32+
content: [{ type: "toolUse", id: rawId, name: "read", input: { path: "." } }],
33+
} as never,
34+
{
35+
role: "toolResult",
36+
toolCallId: rawId,
37+
toolUseId: rawId,
38+
toolName: "read",
39+
content: [{ type: "text", text: "ok" }],
40+
isError: false,
41+
} as never,
42+
];
43+
44+
const out = sanitizeReplayToolCallIdsForStream({
45+
messages,
46+
mode: "strict",
47+
repairToolUseResultPairing: true,
48+
});
49+
50+
expect(out).toMatchObject([
51+
{
52+
role: "assistant",
53+
content: [{ type: "toolUse", id: "callfunctionav7cbkigmk7x1", name: "read" }],
54+
},
55+
{
56+
role: "toolResult",
57+
toolCallId: "callfunctionav7cbkigmk7x1",
58+
toolUseId: "callfunctionav7cbkigmk7x1",
59+
toolName: "read",
60+
},
61+
]);
62+
});
63+
64+
it("keeps real tool results for aborted assistant spans", () => {
65+
const rawId = "call_function_av7cbkigmk7x1";
66+
const out = sanitizeReplayToolCallIdsForStream({
67+
messages: [
68+
{
69+
role: "assistant",
70+
stopReason: "aborted",
71+
content: [{ type: "toolUse", id: rawId, name: "read", input: { path: "." } }],
72+
} as never,
73+
{
74+
role: "toolResult",
75+
toolCallId: rawId,
76+
toolUseId: rawId,
77+
toolName: "read",
78+
content: [{ type: "text", text: "partial" }],
79+
isError: false,
80+
} as never,
81+
{
82+
role: "user",
83+
content: [{ type: "text", text: "retry" }],
84+
} as never,
85+
],
86+
mode: "strict",
87+
repairToolUseResultPairing: true,
88+
});
89+
90+
expect(out).toMatchObject([
91+
{
92+
role: "assistant",
93+
stopReason: "aborted",
94+
content: [{ type: "toolUse", id: "callfunctionav7cbkigmk7x1", name: "read" }],
95+
},
96+
{
97+
role: "toolResult",
98+
toolCallId: "callfunctionav7cbkigmk7x1",
99+
toolUseId: "callfunctionav7cbkigmk7x1",
100+
toolName: "read",
101+
},
102+
{
103+
role: "user",
104+
},
105+
]);
106+
});
107+
});

src/agents/pi-embedded-runner/run/attempt.tool-call-normalization.ts

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,11 @@ import {
66
isRedactedSessionsSpawnAttachment,
77
sanitizeToolUseResultPairing,
88
} from "../../session-transcript-repair.js";
9-
import { extractToolCallsFromAssistant } from "../../tool-call-id.js";
9+
import {
10+
extractToolCallsFromAssistant,
11+
sanitizeToolCallIdsForCloudCodeAssist,
12+
type ToolCallIdMode,
13+
} from "../../tool-call-id.js";
1014
import { normalizeToolName } from "../../tool-policy.js";
1115
import { shouldAllowProviderOwnedThinkingReplay } from "../../transcript-policy.js";
1216
import type { TranscriptPolicy } from "../../transcript-policy.js";
@@ -868,6 +872,25 @@ export function wrapStreamFnTrimToolCallNames(
868872
};
869873
}
870874

875+
export function sanitizeReplayToolCallIdsForStream(params: {
876+
messages: AgentMessage[];
877+
mode: ToolCallIdMode;
878+
allowedToolNames?: Set<string>;
879+
preserveNativeAnthropicToolUseIds?: boolean;
880+
preserveReplaySafeThinkingToolCallIds?: boolean;
881+
repairToolUseResultPairing?: boolean;
882+
}): AgentMessage[] {
883+
const sanitized = sanitizeToolCallIdsForCloudCodeAssist(params.messages, params.mode, {
884+
preserveNativeAnthropicToolUseIds: params.preserveNativeAnthropicToolUseIds,
885+
preserveReplaySafeThinkingToolCallIds: params.preserveReplaySafeThinkingToolCallIds,
886+
allowedToolNames: params.allowedToolNames,
887+
});
888+
if (!params.repairToolUseResultPairing) {
889+
return sanitized;
890+
}
891+
return sanitizeToolUseResultPairing(sanitized);
892+
}
893+
871894
export function wrapStreamFnSanitizeMalformedToolCalls(
872895
baseFn: StreamFn,
873896
allowedToolNames?: Set<string>,

src/agents/pi-embedded-runner/run/attempt.ts

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,6 @@ import { resolveSystemPromptOverride } from "../../system-prompt-override.js";
115115
import { buildSystemPromptParams } from "../../system-prompt-params.js";
116116
import { buildSystemPromptReport } from "../../system-prompt-report.js";
117117
import { resolveAgentTimeoutMs } from "../../timeout.js";
118-
import { sanitizeToolCallIdsForCloudCodeAssist } from "../../tool-call-id.js";
119118
import { UNKNOWN_TOOL_THRESHOLD } from "../../tool-loop-detection.js";
120119
import {
121120
resolveTranscriptPolicy,
@@ -225,6 +224,7 @@ import {
225224
wrapStreamFnRepairMalformedToolCallArguments,
226225
} from "./attempt.tool-call-argument-repair.js";
227226
import {
227+
sanitizeReplayToolCallIdsForStream,
228228
wrapStreamFnSanitizeMalformedToolCalls,
229229
wrapStreamFnTrimToolCallNames,
230230
} from "./attempt.tool-call-normalization.js";
@@ -1251,25 +1251,23 @@ export async function runEmbeddedAttempt(
12511251
if (!Array.isArray(messages)) {
12521252
return inner(model, context, options);
12531253
}
1254-
const allowProviderOwnedThinkingReplay = shouldAllowProviderOwnedThinkingReplay({
1255-
modelApi: (model as { api?: unknown })?.api as string | null | undefined,
1256-
policy: transcriptPolicy,
1257-
});
1258-
const sanitized = sanitizeToolCallIdsForCloudCodeAssist(
1259-
messages as AgentMessage[],
1254+
const nextMessages = sanitizeReplayToolCallIdsForStream({
1255+
messages: messages as AgentMessage[],
12601256
mode,
1261-
{
1262-
preserveNativeAnthropicToolUseIds: transcriptPolicy.preserveNativeAnthropicToolUseIds,
1263-
preserveReplaySafeThinkingToolCallIds: allowProviderOwnedThinkingReplay,
1264-
allowedToolNames,
1265-
},
1266-
);
1267-
if (sanitized === messages) {
1257+
allowedToolNames,
1258+
preserveNativeAnthropicToolUseIds: transcriptPolicy.preserveNativeAnthropicToolUseIds,
1259+
preserveReplaySafeThinkingToolCallIds: shouldAllowProviderOwnedThinkingReplay({
1260+
modelApi: (model as { api?: unknown })?.api as string | null | undefined,
1261+
policy: transcriptPolicy,
1262+
}),
1263+
repairToolUseResultPairing: transcriptPolicy.repairToolUseResultPairing,
1264+
});
1265+
if (nextMessages === messages) {
12681266
return inner(model, context, options);
12691267
}
12701268
const nextContext = {
12711269
...(context as unknown as Record<string, unknown>),
1272-
messages: sanitized,
1270+
messages: nextMessages,
12731271
} as unknown;
12741272
return inner(model, nextContext as typeof context, options);
12751273
};

src/plugins/provider-replay-helpers.test.ts

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,6 @@ describe("provider replay helpers", () => {
9393
});
9494

9595
it("builds hybrid anthropic or openai replay policy", () => {
96-
// Sonnet 4.6 preserves thinking blocks even when flag is set
9796
const sonnet46Policy = buildHybridAnthropicOrOpenAIReplayPolicy(
9897
{
9998
provider: "minimax",
@@ -107,7 +106,6 @@ describe("provider replay helpers", () => {
107106
});
108107
expect(sonnet46Policy).not.toHaveProperty("dropThinkingBlocks");
109108

110-
// Legacy model still drops
111109
expect(
112110
buildHybridAnthropicOrOpenAIReplayPolicy(
113111
{

0 commit comments

Comments
 (0)