Skip to content

fix: validate base64 image data before API submission#18263

Merged
steipete merged 3 commits intoopenclaw:mainfrom
sriram369:fix/validate-base64-image-data
Feb 16, 2026
Merged

fix: validate base64 image data before API submission#18263
steipete merged 3 commits intoopenclaw:mainfrom
sriram369:fix/validate-base64-image-data

Conversation

@sriram369
Copy link

@sriram369 sriram369 commented Feb 16, 2026

Summary

  • Adds explicit base64 format validation before sending images to Anthropic API
  • Prevents permanent session corruption from invalid image data
  • Invalid images are gracefully replaced with descriptive text blocks

The Problem

As described in #18212:

  • Node's Buffer.from(str, "base64") silently ignores invalid characters
  • Invalid base64 passes local validation but fails at Anthropic's stricter API
  • Once corrupted data persists in session JSONL, every subsequent API call replays the bad data and fails

The Fix

Added validateAndNormalizeBase64() function in src/agents/tool-images.ts that:

  • Strips data URL prefixes (e.g., data:image/png;base64,...)
  • Validates base64 character set with regex
  • Checks for valid padding (0-2 = chars at end)
  • Validates length is proper for base64 encoding (no remainder of 1)

Invalid images are replaced with text blocks explaining the error, preventing session corruption while preserving conversation flow.

Test plan

Added 5 new tests in tool-images.e2e.test.ts:

  • Rejects invalid base64 characters
  • Strips data URL prefixes correctly
  • Rejects invalid padding
  • Rejects invalid length
  • Handles empty data gracefully

All 9 tests pass (4 existing + 5 new).

Closes #18212

🤖 Generated with Claude Code

Greptile Summary

Adds base64 validation for image data before sending to the Anthropic API, preventing permanent session corruption from malformed base64 strings. Invalid images are gracefully replaced with descriptive text blocks.

  • Introduces validateAndNormalizeBase64() in src/agents/tool-images.ts that strips data URL prefixes, validates base64 character set, padding, and length
  • Integrates the validation into the existing sanitizeContentBlocksImages flow, replacing invalid images with text error blocks instead of allowing them through
  • The base64 regex accepts URL-safe characters (_, -) but does not normalize them to standard base64 (+, /), which could still cause Anthropic API rejections
  • The regex also permits = padding at any position in the string, not just at the end, allowing some malformed strings through
  • Adds 5 new e2e tests covering the validation cases

Confidence Score: 3/5

  • The PR addresses a real problem with good error recovery, but the regex has gaps that could let some malformed base64 through
  • The overall approach is correct and the error recovery (replacing invalid images with text blocks) is well done. However, the base64 validation regex has two gaps: it accepts URL-safe characters without normalizing them, and permits padding characters mid-string. These could still allow invalid data to reach the API, partially undermining the PR's goal.
  • src/agents/tool-images.ts — the BASE64_REGEX and validation logic need tightening

Last reviewed commit: 9df7e96

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Adds explicit base64 format validation in sanitizeContentBlocksImages()
to prevent invalid image data from being sent to the Anthropic API.

The Problem:
- Node's Buffer.from(str, "base64") silently ignores invalid characters
- Invalid base64 passes local validation but fails at Anthropic's stricter API
- Once corrupted data persists in session history, every API call fails

The Fix:
- Add validateAndNormalizeBase64() function that:
  - Strips data URL prefixes (e.g., "data:image/png;base64,...")
  - Validates base64 character set with regex
  - Checks for valid padding (0-2 '=' chars)
  - Validates length is proper for base64 encoding
- Invalid images are replaced with descriptive text blocks
- Prevents permanent session corruption

Tests:
- Rejects invalid base64 characters
- Strips data URL prefixes correctly
- Rejects invalid padding
- Rejects invalid length
- Handles empty data gracefully

Closes openclaw#18212

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: S labels Feb 16, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

const log = createSubsystemLogger("agents/tool-images");

// Valid base64 character set (standard + URL-safe variants)
const BASE64_REGEX = /^[A-Za-z0-9+/=_-]*$/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL-safe base64 accepted but never normalized

The regex includes _ and - (URL-safe base64 variants), and the comment says "standard + URL-safe variants." However, the function returns the data with these characters intact — it never converts them to standard base64 (-+, _/). The Anthropic API expects standard base64, so URL-safe encoded data would pass this validation but could still be rejected by the API, which is the exact scenario this PR aims to prevent.

Either reject URL-safe characters (remove _- from the regex) or normalize them:

Suggested change
const BASE64_REGEX = /^[A-Za-z0-9+/=_-]*$/;
const BASE64_REGEX = /^[A-Za-z0-9+/=]*$/;

If you want to accept URL-safe base64 gracefully, keep the permissive regex but add a normalization step after validation:

data = data.replace(/-/g, "+").replace(/_/g, "/");
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/tool-images.ts
Line: 21:21

Comment:
**URL-safe base64 accepted but never normalized**

The regex includes `_` and `-` (URL-safe base64 variants), and the comment says "standard + URL-safe variants." However, the function returns the data with these characters intact — it never converts them to standard base64 (`-``+`, `_``/`). The Anthropic API expects standard base64, so URL-safe encoded data would pass this validation but could still be rejected by the API, which is the exact scenario this PR aims to prevent.

Either reject URL-safe characters (remove `_-` from the regex) or normalize them:
```suggestion
const BASE64_REGEX = /^[A-Za-z0-9+/=]*$/;
```

If you want to accept URL-safe base64 gracefully, keep the permissive regex but add a normalization step after validation:
```ts
data = data.replace(/-/g, "+").replace(/_/g, "/");
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 21 to 46
const BASE64_REGEX = /^[A-Za-z0-9+/=_-]*$/;

/**
* Validates and normalizes base64 image data before processing.
* - Strips data URL prefixes (e.g., "data:image/png;base64,")
* - Validates base64 character set
* - Ensures the string is not empty after trimming
*
* Returns the cleaned base64 string or throws an error if invalid.
*/
function validateAndNormalizeBase64(base64: string): string {
let data = base64.trim();

// Strip data URL prefix if present (e.g., "data:image/png;base64,...")
const dataUrlMatch = data.match(/^data:[^;]+;base64,(.*)$/i);
if (dataUrlMatch) {
data = dataUrlMatch[1].trim();
}

if (!data) {
throw new Error("Base64 data is empty");
}

// Check for valid base64 characters
// Node's Buffer.from silently ignores invalid chars, but Anthropic API rejects them
if (!BASE64_REGEX.test(data)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex allows = padding anywhere in the string

BASE64_REGEX uses the character class [A-Za-z0-9+/=_-], which permits = at any position — not just at the end. A string like SGVs=bG8= passes the regex check and the padding check (which only looks at trailing =), but is not valid base64. Since the goal is to reject strings that Anthropic would reject, this gap could let malformed data through.

A cleaner approach is to validate the structure rather than just the character set:

Suggested change
const BASE64_REGEX = /^[A-Za-z0-9+/=_-]*$/;
/**
* Validates and normalizes base64 image data before processing.
* - Strips data URL prefixes (e.g., "data:image/png;base64,")
* - Validates base64 character set
* - Ensures the string is not empty after trimming
*
* Returns the cleaned base64 string or throws an error if invalid.
*/
function validateAndNormalizeBase64(base64: string): string {
let data = base64.trim();
// Strip data URL prefix if present (e.g., "data:image/png;base64,...")
const dataUrlMatch = data.match(/^data:[^;]+;base64,(.*)$/i);
if (dataUrlMatch) {
data = dataUrlMatch[1].trim();
}
if (!data) {
throw new Error("Base64 data is empty");
}
// Check for valid base64 characters
// Node's Buffer.from silently ignores invalid chars, but Anthropic API rejects them
if (!BASE64_REGEX.test(data)) {
const BASE64_REGEX = /^[A-Za-z0-9+/]*={0,2}$/;

This ensures = only appears as 0–2 trailing padding characters, and you can remove the separate padding check on lines 51–54.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/tool-images.ts
Line: 21:46

Comment:
**Regex allows `=` padding anywhere in the string**

`BASE64_REGEX` uses the character class `[A-Za-z0-9+/=_-]`, which permits `=` at any position — not just at the end. A string like `SGVs=bG8=` passes the regex check and the padding check (which only looks at trailing `=`), but is not valid base64. Since the goal is to reject strings that Anthropic would reject, this gap could let malformed data through.

A cleaner approach is to validate the structure rather than just the character set:
```suggestion
const BASE64_REGEX = /^[A-Za-z0-9+/]*={0,2}$/;
```

This ensures `=` only appears as 0–2 trailing padding characters, and you can remove the separate padding check on lines 51–54.

How can I resolve this? If you propose a fix, please make it concise.

- Use stricter regex: /^[A-Za-z0-9+/]*={0,2}$/ ensures = only at end
- Normalize URL-safe base64 to standard (- → +, _ → /)
- Added tests for padding in wrong position and URL-safe normalization

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@steipete steipete merged commit 63fb998 into openclaw:main Feb 16, 2026
23 checks passed
@sebslight
Copy link
Member

Reverted in 69872ea (PR #19221).

This was an accidental merge; we’ll revisit the intended change in a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Invalid base64 image data crashes session — no validation before Anthropic API call

3 participants

Comments