fix: validate base64 image data before API submission#18263
fix: validate base64 image data before API submission#18263steipete merged 3 commits intoopenclaw:mainfrom
Conversation
Adds explicit base64 format validation in sanitizeContentBlocksImages() to prevent invalid image data from being sent to the Anthropic API. The Problem: - Node's Buffer.from(str, "base64") silently ignores invalid characters - Invalid base64 passes local validation but fails at Anthropic's stricter API - Once corrupted data persists in session history, every API call fails The Fix: - Add validateAndNormalizeBase64() function that: - Strips data URL prefixes (e.g., "data:image/png;base64,...") - Validates base64 character set with regex - Checks for valid padding (0-2 '=' chars) - Validates length is proper for base64 encoding - Invalid images are replaced with descriptive text blocks - Prevents permanent session corruption Tests: - Rejects invalid base64 characters - Strips data URL prefixes correctly - Rejects invalid padding - Rejects invalid length - Handles empty data gracefully Closes openclaw#18212 Co-Authored-By: Claude Opus 4.5 <[email protected]>
src/agents/tool-images.ts
Outdated
| const log = createSubsystemLogger("agents/tool-images"); | ||
|
|
||
| // Valid base64 character set (standard + URL-safe variants) | ||
| const BASE64_REGEX = /^[A-Za-z0-9+/=_-]*$/; |
There was a problem hiding this comment.
URL-safe base64 accepted but never normalized
The regex includes _ and - (URL-safe base64 variants), and the comment says "standard + URL-safe variants." However, the function returns the data with these characters intact — it never converts them to standard base64 (- → +, _ → /). The Anthropic API expects standard base64, so URL-safe encoded data would pass this validation but could still be rejected by the API, which is the exact scenario this PR aims to prevent.
Either reject URL-safe characters (remove _- from the regex) or normalize them:
| const BASE64_REGEX = /^[A-Za-z0-9+/=_-]*$/; | |
| const BASE64_REGEX = /^[A-Za-z0-9+/=]*$/; |
If you want to accept URL-safe base64 gracefully, keep the permissive regex but add a normalization step after validation:
data = data.replace(/-/g, "+").replace(/_/g, "/");Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/tool-images.ts
Line: 21:21
Comment:
**URL-safe base64 accepted but never normalized**
The regex includes `_` and `-` (URL-safe base64 variants), and the comment says "standard + URL-safe variants." However, the function returns the data with these characters intact — it never converts them to standard base64 (`-` → `+`, `_` → `/`). The Anthropic API expects standard base64, so URL-safe encoded data would pass this validation but could still be rejected by the API, which is the exact scenario this PR aims to prevent.
Either reject URL-safe characters (remove `_-` from the regex) or normalize them:
```suggestion
const BASE64_REGEX = /^[A-Za-z0-9+/=]*$/;
```
If you want to accept URL-safe base64 gracefully, keep the permissive regex but add a normalization step after validation:
```ts
data = data.replace(/-/g, "+").replace(/_/g, "/");
```
How can I resolve this? If you propose a fix, please make it concise.
src/agents/tool-images.ts
Outdated
| const BASE64_REGEX = /^[A-Za-z0-9+/=_-]*$/; | ||
|
|
||
| /** | ||
| * Validates and normalizes base64 image data before processing. | ||
| * - Strips data URL prefixes (e.g., "data:image/png;base64,") | ||
| * - Validates base64 character set | ||
| * - Ensures the string is not empty after trimming | ||
| * | ||
| * Returns the cleaned base64 string or throws an error if invalid. | ||
| */ | ||
| function validateAndNormalizeBase64(base64: string): string { | ||
| let data = base64.trim(); | ||
|
|
||
| // Strip data URL prefix if present (e.g., "data:image/png;base64,...") | ||
| const dataUrlMatch = data.match(/^data:[^;]+;base64,(.*)$/i); | ||
| if (dataUrlMatch) { | ||
| data = dataUrlMatch[1].trim(); | ||
| } | ||
|
|
||
| if (!data) { | ||
| throw new Error("Base64 data is empty"); | ||
| } | ||
|
|
||
| // Check for valid base64 characters | ||
| // Node's Buffer.from silently ignores invalid chars, but Anthropic API rejects them | ||
| if (!BASE64_REGEX.test(data)) { |
There was a problem hiding this comment.
Regex allows = padding anywhere in the string
BASE64_REGEX uses the character class [A-Za-z0-9+/=_-], which permits = at any position — not just at the end. A string like SGVs=bG8= passes the regex check and the padding check (which only looks at trailing =), but is not valid base64. Since the goal is to reject strings that Anthropic would reject, this gap could let malformed data through.
A cleaner approach is to validate the structure rather than just the character set:
| const BASE64_REGEX = /^[A-Za-z0-9+/=_-]*$/; | |
| /** | |
| * Validates and normalizes base64 image data before processing. | |
| * - Strips data URL prefixes (e.g., "data:image/png;base64,") | |
| * - Validates base64 character set | |
| * - Ensures the string is not empty after trimming | |
| * | |
| * Returns the cleaned base64 string or throws an error if invalid. | |
| */ | |
| function validateAndNormalizeBase64(base64: string): string { | |
| let data = base64.trim(); | |
| // Strip data URL prefix if present (e.g., "data:image/png;base64,...") | |
| const dataUrlMatch = data.match(/^data:[^;]+;base64,(.*)$/i); | |
| if (dataUrlMatch) { | |
| data = dataUrlMatch[1].trim(); | |
| } | |
| if (!data) { | |
| throw new Error("Base64 data is empty"); | |
| } | |
| // Check for valid base64 characters | |
| // Node's Buffer.from silently ignores invalid chars, but Anthropic API rejects them | |
| if (!BASE64_REGEX.test(data)) { | |
| const BASE64_REGEX = /^[A-Za-z0-9+/]*={0,2}$/; |
This ensures = only appears as 0–2 trailing padding characters, and you can remove the separate padding check on lines 51–54.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/tool-images.ts
Line: 21:46
Comment:
**Regex allows `=` padding anywhere in the string**
`BASE64_REGEX` uses the character class `[A-Za-z0-9+/=_-]`, which permits `=` at any position — not just at the end. A string like `SGVs=bG8=` passes the regex check and the padding check (which only looks at trailing `=`), but is not valid base64. Since the goal is to reject strings that Anthropic would reject, this gap could let malformed data through.
A cleaner approach is to validate the structure rather than just the character set:
```suggestion
const BASE64_REGEX = /^[A-Za-z0-9+/]*={0,2}$/;
```
This ensures `=` only appears as 0–2 trailing padding characters, and you can remove the separate padding check on lines 51–54.
How can I resolve this? If you propose a fix, please make it concise.- Use stricter regex: /^[A-Za-z0-9+/]*={0,2}$/ ensures = only at end
- Normalize URL-safe base64 to standard (- → +, _ → /)
- Added tests for padding in wrong position and URL-safe normalization
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Summary
The Problem
As described in #18212:
Buffer.from(str, "base64")silently ignores invalid charactersThe Fix
Added
validateAndNormalizeBase64()function insrc/agents/tool-images.tsthat:data:image/png;base64,...)=chars at end)Invalid images are replaced with text blocks explaining the error, preventing session corruption while preserving conversation flow.
Test plan
Added 5 new tests in
tool-images.e2e.test.ts:All 9 tests pass (4 existing + 5 new).
Closes #18212
🤖 Generated with Claude Code
Greptile Summary
Adds base64 validation for image data before sending to the Anthropic API, preventing permanent session corruption from malformed base64 strings. Invalid images are gracefully replaced with descriptive text blocks.
validateAndNormalizeBase64()insrc/agents/tool-images.tsthat strips data URL prefixes, validates base64 character set, padding, and lengthsanitizeContentBlocksImagesflow, replacing invalid images with text error blocks instead of allowing them through_,-) but does not normalize them to standard base64 (+,/), which could still cause Anthropic API rejections=padding at any position in the string, not just at the end, allowing some malformed strings throughConfidence Score: 3/5
src/agents/tool-images.ts— theBASE64_REGEXand validation logic need tighteningLast reviewed commit: 9df7e96
(2/5) Greptile learns from your feedback when you react with thumbs up/down!