fix: validate base64 image data before sending to LLM APIs#18219
fix: validate base64 image data before sending to LLM APIs#18219Grynn wants to merge 1 commit intoopenclaw:mainfrom
Conversation
Additional Comments (1)
RFC 4648 allows whitespace to be ignored during decoding, and many base64 encoders insert newlines every 76 characters. If upstream code passes base64 with embedded whitespace, this will incorrectly reject it. consider normalizing whitespace before validation: Prompt To Fix With AIThis is a comment left during a code review.
Path: src/agents/tool-images.ts
Line: 47:56
Comment:
base64 strings with embedded whitespace (newlines, spaces) are valid in many contexts but will be rejected here
RFC 4648 allows whitespace to be ignored during decoding, and many base64 encoders insert newlines every 76 characters. If upstream code passes base64 with embedded whitespace, this will incorrectly reject it.
consider normalizing whitespace before validation:
```suggestion
function isStrictBase64(str: string): boolean {
if (!str || str.length === 0) {
return false;
}
// Strip whitespace that RFC 4648 allows decoders to ignore
const normalized = str.replace(/\s/g, '');
// Must be a multiple of 4 and only contain valid chars
if (normalized.length % 4 !== 0) {
return false;
}
return /^[A-Za-z0-9+/]*={0,2}$/.test(normalized);
}
```
How can I resolve this? If you propose a fix, please make it concise. |
Anthropic's API rejects base64 data that Node.js Buffer.from(s,'base64') silently accepts. This causes 'invalid base64 data' errors that crash sessions. Changes to sanitizeContentBlocksImages in tool-images.ts: - Add validateAndCleanBase64() that strips embedded whitespace (MIME line breaks), validates length is multiple of 4, and checks for valid chars - Add stripDataUrlPrefix() to handle data URLs (data:image/...;base64,) that may arrive from some code paths - Gracefully replace invalid image blocks with text placeholders instead of letting them propagate to the API New test cases: - Invalid base64 data (special chars) → replaced with text - Data URL prefix → stripped and processed normally - MIME-style line breaks → cleaned and processed normally - Truncated base64 (e.g. from session cleanup) → replaced with text - Empty/whitespace-only data → replaced with text
298c909 to
9659a03
Compare
Rebased & improvedRebased on current Changes in this update:
Root cause analysisThe error
This fix adds validation and cleaning at the sanitization layer, gracefully replacing invalid images with text placeholders before they reach the API. |
Summary
Adds strict base64 validation in
sanitizeContentBlocksImages()to prevent invalid base64 data from crashing sessions when sent to LLM APIs.Problem
When an image content block contains invalid base64 data, the Anthropic API rejects the request:
The session becomes permanently broken because the corrupted content is persisted in the session JSONL and replayed on every subsequent API call. Node.js
Buffer.from(s, 'base64')silently ignores invalid characters, so the existing sanitization pipeline doesn't catch the issue before it hits the API.Changes
src/agents/tool-images.ts:isStrictBase64()— RFC 4648 §4 compliant validator (correct charset + padding)stripDataUrlPrefix()— stripsdata:image/...;base64,prefixes that some code paths may leave in the data fieldsanitizeContentBlocksImages()and log+omit invalid blocks gracefully (replaced with a text placeholder) instead of passing them to the APIsrc/agents/tool-images.e2e.test.ts:Why this matters
This is defense-in-depth. The upstream
@mariozechner/pi-aiAnthropic provider has a catch-all else clause inconvertContentBlocks()andconvertMessages()that treats any non-text block as an image without validating the data field. Once bad data slips through, the session is stuck in a permanent 400 error loop.Fixes #18212
Related: #11475 (session stuck in permanent 400 error loop)
Greptile Summary
Added strict RFC 4648 base64 validation to prevent invalid image data from crashing sessions when sent to LLM APIs
isStrictBase64()validator to catch malformed base64 before API submissionstripDataUrlPrefix()to handle data URL prefixes that may be present in image blocksConfidence Score: 4/5
Last reviewed commit: 298c909
(5/5) You can turn off certain types of comments like style here!