Skip to content

agents: repair empty Gemini tool names on every outbound request#48748

Open
maxtongwang wants to merge 2 commits intoopenclaw:mainfrom
maxtongwang:fix/gemini-empty-tool-name-repair
Open

agents: repair empty Gemini tool names on every outbound request#48748
maxtongwang wants to merge 2 commits intoopenclaw:mainfrom
maxtongwang:fix/gemini-empty-tool-name-repair

Conversation

@maxtongwang
Copy link
Copy Markdown

@maxtongwang maxtongwang commented Mar 17, 2026

Summary

  • Problem: Gemini 3 (preview) can return functionCall blocks with empty name fields. When this happens, pi-ai stores name: "" in the assistant message and pi-agent-core creates a toolResult with toolName: "". On the next turn, google-shared.js serializes this as functionResponse.name: "". Gemini rejects the entire request with HTTP 400 INVALID_ARGUMENT:
    GenerateContentRequest.contents[N].parts[0].function_response.name: Name cannot be empty.
    
    In longer sessions, every subsequent turn in the same session fails — the empty-name entries accumulate in history and all get rejected together.
  • Why it matters: Any user on a Gemini 3 model hitting a multi-tool session gets a hard 400 crash mid-conversation with no recovery path. Issue Bug: Gemini provider generates empty causing 400 errors #16263.
  • Root cause: sanitizeSessionHistory (which calls sanitizeToolCallInputs + sanitizeToolUseResultPairing) handles this case correctly, but it only runs once at attempt start. Tool call → result cycles within the agent loop build new messages and pass them directly to the provider, bypassing the repair.
  • What changed: Added a streamFn wrapper (enabled via new repairToolNamesOnEveryTurn flag in TranscriptPolicy, set true for all Google model APIs) that re-runs sanitizeToolCallInputs + sanitizeToolUseResultPairing on every outbound request. Empty-name tool calls are dropped from the assistant message; the orphaned tool results are then dropped as well.
  • What did NOT change: Non-Google providers are unaffected. No behavior change when all tool call names are valid. The fix is purely defensive.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution

Linked Issue/PR

User-visible / Behavior Changes

Google model sessions that previously crashed with "Name cannot be empty" 400 errors after the first tool call now recover gracefully — empty-name tool calls and their results are silently dropped, allowing the conversation to continue.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local OpenClaw gateway
  • Model/provider: google/gemini-3-flash-preview via google-generative-ai API
  • Integration/channel: Discord

Steps

  1. Configure OpenClaw with google/gemini-3-flash-preview as the primary model.
  2. Send a message that causes the agent to use multiple tools (e.g., web search + a file read).
  3. Observe subsequent turns in the same session.

Expected

Conversation continues normally even after Gemini returns a function call with an empty name field.

Actual (before fix)

LLM error: {
  "error": {
    "code": 400,
    "message": "* GenerateContentRequest.contents[2].parts[0].function_response.name: Name cannot be empty.
* GenerateContentRequest.contents[2].parts[1].function_response.name: Name cannot be empty.
* GenerateContentRequest.contents[4].parts[0].function_response.name: Name cannot be empty.
* GenerateContentRequest.contents[8].parts[0].function_response.name: Name cannot be empty.
...",
    "status": "INVALID_ARGUMENT"
  }
}

Every subsequent turn in the session accumulated more empty-name entries and continued to fail.

Evidence

  • Failing test/log before + passing after

Regression test added: "drops empty-name tool calls and their paired tool results (Gemini #16263)" in session-transcript-repair.test.ts. Verifies the two-step repair pipeline correctly handles the empty-name scenario end-to-end.

Policy test added: "enables repairToolNamesOnEveryTurn for Google APIs" and "disables repairToolNamesOnEveryTurn for non-Google providers" in transcript-policy.test.ts.

Live verification: fix deployed on a local OpenClaw instance with google/gemini-3-flash-preview. Multi-turn tool use sessions that previously hard-crashed at turn 2+ now run to completion.

Human Verification (required)

  • Verified scenarios: multi-turn Gemini 3 Flash Preview session via Discord with tool use; confirmed 400 errors no longer occur
  • Edge cases checked: Anthropic + OpenAI sessions unaffected (flag disabled for non-Google); empty-name tool calls correctly cleaned without dropping valid tool calls in the same message
  • What I did not verify: Gemini 2.5 / Gemini 2.0 with the same edge case (they may not trigger empty-name function calls in practice)

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: set repairToolNamesOnEveryTurn: false in resolveTranscriptPolicy
  • Files/config to restore: src/agents/transcript-policy.ts
  • Known bad symptoms: none expected; the wrapper is a no-op when all tool call names are valid

Risks and Mitigations

  • Risk: slight per-turn overhead from running sanitizeToolCallInputs + sanitizeToolUseResultPairing on every Google request
    • Mitigation: both functions return the input reference unchanged when no repair is needed (O(n) scan but no allocation), so the hot path is cheap

🤖 AI-assisted (Claude Sonnet 4.6 + human review). Fix was validated in live production before being cleaned up for upstream.

Gemini 3 (preview) can return functionCall blocks with empty name fields.
When this happens, pi-ai stores name="" in the assistant message and
pi-agent-core creates a toolResult with toolName="". On the next turn,
google-shared.js serializes this as functionResponse.name="" and Gemini
rejects the whole request with HTTP 400 INVALID_ARGUMENT:

  "GenerateContentRequest.contents[N].parts[0].function_response.name:
   Name cannot be empty."

sanitizeSessionHistory (which calls sanitizeToolCallInputs +
sanitizeToolUseResultPairing) correctly handles this case, but it only
runs once at attempt start. Tool call → result cycles within the agent
loop build new messages and pass them directly to the provider, bypassing
the repair.

Fix: add a streamFn wrapper (enabled via new repairToolNamesOnEveryTurn
flag in TranscriptPolicy, set true for all Google model APIs) that re-runs
sanitizeToolCallInputs + sanitizeToolUseResultPairing on every outbound
request. Empty-name tool calls are dropped from the assistant message; the
now-orphaned tool results are then dropped as well. This mirrors the
existing sanitizeToolCallIds wrapper added for Mistral.

Verified: fixes the Gemini 3 Flash Preview "Name cannot be empty" 400s
in live production sessions with multi-turn tool use.

Refs: openclaw#16263

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: S labels Mar 17, 2026
@maxtongwang
Copy link
Copy Markdown
Author

The two failing CI jobs (check and secrets) are pre-existing failures on origin/main — confirmed by checking the latest main CI run which shows both jobs failing with identical errors before this PR existed.

check failure: TypeScript errors in src/api/channelContentConfig.ts (unrelated to this PR — no changes to that file).

secrets (zizmor) failure: Bash heredoc parsing error in the CI script itself — not triggered by any workflow file changes in this PR.

My changes (src/agents/transcript-policy.ts, src/agents/pi-embedded-runner/run/attempt.ts, and the two test files) are clean: pnpm tsgo passes on them, pnpm format:fix produces no diff, and all new/related tests pass locally.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR fixes a hard 400 crash that affected all multi-turn Gemini 3 sessions that involved tool use. Gemini 3 (preview) can return functionCall blocks with empty name fields; these get persisted in the session transcript and, on the next LLM call, cause the Google API to reject the entire request with INVALID_ARGUMENT: Name cannot be empty. The fix adds a streamFn wrapper — enabled via the new repairToolNamesOnEveryTurn flag in TranscriptPolicy, set to true for all Google model APIs — that re-runs sanitizeToolCallInputs + sanitizeToolUseResultPairing on every outbound request, dropping empty-name tool calls and their orphaned results before they reach the API.

Key changes:

  • transcript-policy.ts: Adds repairToolNamesOnEveryTurn: boolean to TranscriptPolicy; set to isGoogle in resolveTranscriptPolicy. Consistent with the existing pattern for other provider-specific flags.
  • attempt.ts: Adds the streamFn wrapper at the correct point in the existing wrapper chain (after sanitizeToolCallIds, before the yield-abort and name-trimming wrappers). The pattern mirrors the established dropThinkingBlocks and sanitizeToolCallIds wrappers exactly.
  • Tests: Regression test covers the full two-step repair pipeline; policy tests verify the flag is scoped to Google APIs only.

Minor observations:

  • The early-return check repaired2 === messages is functionally correct (relies on both sanitize functions returning the input reference unchanged when making no changes), but would be more self-documenting as repaired1 === messages && repaired2 === repaired1.
  • sanitizeToolUseResultPairing does more than drop orphans — it also inserts synthetic error results for unmatched tool calls. This is the existing behavior from sanitizeSessionHistory, so it's consistent, but it means the wrapper is slightly broader in scope than what the comment describes for the Bug: Gemini provider generates empty causing 400 errors #16263 case. Consider only calling sanitizeToolUseResultPairing when repaired1 !== messages to limit the change to the exact fix needed.

Confidence Score: 4/5

  • Safe to merge; the fix is correct, well-tested, and the hot path is a no-op for well-formed transcripts.
  • The fix closely mirrors existing wrapper patterns, is gated behind a Google-only flag, adds no new network calls or permissions, and is backed by both a targeted regression test and live production verification. The two minor style observations (reference-equality comment clarity and the broader-than-necessary sanitizeToolUseResultPairing invocation) do not affect correctness.
  • The new wrapper block in src/agents/pi-embedded-runner/run/attempt.ts (lines 2060–2084) is the only net-new logic worth a second glance; everything else is formatting or test additions.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 2075

Comment:
**Short-circuit check compares `repaired2` against `messages`, not `repaired1`**

The early-return condition `repaired2 === messages` works correctly because both `sanitizeToolCallInputs` and `sanitizeToolUseResultPairing` return the input reference unchanged when they make no modifications. However, a small clarification:

- If step 1 changes the array (`repaired1 !== messages`) but step 2 makes no further changes, then `repaired2 === repaired1 !== messages` — the condition is still false and the correctly repaired array is forwarded. ✓
- Only when _both_ steps are no-ops is `repaired2 === messages` true, allowing the fast path.

This is technically correct, but the intent might be clearer if the check explicitly tests that neither step mutated the array:

```suggestion
          if (repaired1 === messages && repaired2 === repaired1) {
```

This makes the "no repair needed" invariant explicit and won't silently misfire if either sanitizer's reference-equality contract ever changes.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 2074

Comment:
**`sanitizeToolUseResultPairing` may insert synthetic tool results**

`sanitizeToolUseResultPairing` not only drops orphaned results — it also inserts synthetic `isError: true` "missing tool result" entries for any assistant tool call that lacks a matching result in the history. This means the wrapper does slightly more than what the comment at line 2069–2070 describes.

In practice this should be benign (the attempt-start `sanitizeSessionHistory` will already have repaired any mismatches), but the in-flight messages built by the agent loop during a turn are not guaranteed to be fully paired before this wrapper runs. If a mid-turn incomplete assistant message were somehow in `context.messages`, this could synthesize unexpected results.

Consider narrowing the wrapper to only `sanitizeToolCallInputs` (the actual fix for #16263) and only calling `sanitizeToolUseResultPairing` when the first step actually dropped something:

```typescript
const repaired1 = sanitizeToolCallInputs(messages as AgentMessage[], {
  allowedToolNames: repairAllowedToolNames,
});
if (repaired1 === messages) {
  return inner(model, context, options);
}
const repaired2 = sanitizeToolUseResultPairing(repaired1);
```

This is strictly targeted at the reported issue and avoids the (admittedly unlikely) side-effect of synthetic result injection.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: e0709f5

Address Greptile review findings:
- Only call sanitizeToolUseResultPairing when sanitizeToolCallInputs
  actually dropped something (repaired1 !== messages). This avoids
  the side-effect of synthetic result injection on normal turns where
  all tool call names are already valid.
- Restructure to fast-path on repaired1 === messages, making the
  no-repair invariant explicit rather than relying on a compound
  reference-equality check across both steps.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@maxtongwang
Copy link
Copy Markdown
Author

Addressed both Greptile P2 findings in 6894e4f:

  • Short-circuit check: restructured to fast-path on repaired1 === messages (before calling sanitizeToolUseResultPairing), making the "no repair needed" invariant explicit.
  • Synthetic result injection risk: sanitizeToolUseResultPairing is now only called when sanitizeToolCallInputs actually dropped something (repaired1 !== messages), avoiding any inadvertent synthetic result injection on normal turns.

@maxtongwang
Copy link
Copy Markdown
Author

All CI failures on this run are pre-existing on origin/main and unrelated to this PR:

  • secrets / check: same failures as run 23179681175 on main before this PR was opened.
  • checks-windows: CHAT_CHANNEL_ORDER is not iterable in src/auto-reply/reply/elevated-allowlist-matcher.ts — not a file this PR touches.
  • checks (contracts/extensions/channels/test): failures in src/channels/plugins/contracts/, extension AxiosError 400s — not files this PR touches.
  • extension-fast (*): all in extension code unrelated to this PR.

This PR only modifies: src/agents/transcript-policy.ts, src/agents/pi-embedded-runner/run/attempt.ts, src/agents/transcript-policy.test.ts, and src/agents/session-transcript-repair.test.ts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

1 participant