Skip to content

Comments

fix: validate base64 image data before sending to LLM APIs#18219

Open
Grynn wants to merge 1 commit intoopenclaw:mainfrom
Grynn:fix/validate-base64-image-data-18212
Open

fix: validate base64 image data before sending to LLM APIs#18219
Grynn wants to merge 1 commit intoopenclaw:mainfrom
Grynn:fix/validate-base64-image-data-18212

Conversation

@Grynn
Copy link
Contributor

@Grynn Grynn commented Feb 16, 2026

Summary

Adds strict base64 validation in sanitizeContentBlocksImages() to prevent invalid base64 data from crashing sessions when sent to LLM APIs.

Problem

When an image content block contains invalid base64 data, the Anthropic API rejects the request:

LLM request rejected: messages.116.content.1.image.source.base64: invalid base64 data

The session becomes permanently broken because the corrupted content is persisted in the session JSONL and replayed on every subsequent API call. Node.js Buffer.from(s, 'base64') silently ignores invalid characters, so the existing sanitization pipeline doesn't catch the issue before it hits the API.

Changes

src/agents/tool-images.ts:

  • Add isStrictBase64() — RFC 4648 §4 compliant validator (correct charset + padding)
  • Add stripDataUrlPrefix() — strips data:image/...;base64, prefixes that some code paths may leave in the data field
  • Validate base64 strictly in sanitizeContentBlocksImages() and log+omit invalid blocks gracefully (replaced with a text placeholder) instead of passing them to the API
  • Use detected MIME type from data URL prefix when available

src/agents/tool-images.e2e.test.ts:

  • Test: invalid base64 data is rejected gracefully (replaced with text block)
  • Test: data URL prefixes are stripped and image is processed normally
  • Test: empty/whitespace-only data is handled

Why this matters

This is defense-in-depth. The upstream @mariozechner/pi-ai Anthropic provider has a catch-all else clause in convertContentBlocks() and convertMessages() that treats any non-text block as an image without validating the data field. Once bad data slips through, the session is stuck in a permanent 400 error loop.

Fixes #18212
Related: #11475 (session stuck in permanent 400 error loop)

Greptile Summary

Added strict RFC 4648 base64 validation to prevent invalid image data from crashing sessions when sent to LLM APIs

  • Implemented isStrictBase64() validator to catch malformed base64 before API submission
  • Added stripDataUrlPrefix() to handle data URL prefixes that may be present in image blocks
  • Invalid base64 blocks are now gracefully replaced with text placeholders instead of causing permanent session failures
  • Comprehensive test coverage for invalid base64, data URL stripping, and empty data edge cases

Confidence Score: 4/5

  • Safe to merge with one minor consideration about whitespace handling
  • The implementation correctly solves the stated problem of preventing invalid base64 from reaching the API. The validation logic is sound, test coverage is comprehensive, and the defensive approach (replacing bad data with text placeholders) prevents session corruption. One style suggestion about RFC 4648 whitespace handling prevents the score from being a 5, but this is a minor enhancement rather than a blocking issue.
  • No files require special attention - the implementation is straightforward and well-tested

Last reviewed commit: 298c909

(5/5) You can turn off certain types of comments like style here!

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 17, 2026

Additional Comments (1)

src/agents/tool-images.ts
base64 strings with embedded whitespace (newlines, spaces) are valid in many contexts but will be rejected here

RFC 4648 allows whitespace to be ignored during decoding, and many base64 encoders insert newlines every 76 characters. If upstream code passes base64 with embedded whitespace, this will incorrectly reject it.

consider normalizing whitespace before validation:

function isStrictBase64(str: string): boolean {
  if (!str || str.length === 0) {
    return false;
  }
  // Strip whitespace that RFC 4648 allows decoders to ignore
  const normalized = str.replace(/\s/g, '');
  // Must be a multiple of 4 and only contain valid chars
  if (normalized.length % 4 !== 0) {
    return false;
  }
  return /^[A-Za-z0-9+/]*={0,2}$/.test(normalized);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/tool-images.ts
Line: 47:56

Comment:
base64 strings with embedded whitespace (newlines, spaces) are valid in many contexts but will be rejected here

RFC 4648 allows whitespace to be ignored during decoding, and many base64 encoders insert newlines every 76 characters. If upstream code passes base64 with embedded whitespace, this will incorrectly reject it.

consider normalizing whitespace before validation:
```suggestion
function isStrictBase64(str: string): boolean {
  if (!str || str.length === 0) {
    return false;
  }
  // Strip whitespace that RFC 4648 allows decoders to ignore
  const normalized = str.replace(/\s/g, '');
  // Must be a multiple of 4 and only contain valid chars
  if (normalized.length % 4 !== 0) {
    return false;
  }
  return /^[A-Za-z0-9+/]*={0,2}$/.test(normalized);
}
```

How can I resolve this? If you propose a fix, please make it concise.

Anthropic's API rejects base64 data that Node.js Buffer.from(s,'base64')
silently accepts. This causes 'invalid base64 data' errors that crash
sessions.

Changes to sanitizeContentBlocksImages in tool-images.ts:
- Add validateAndCleanBase64() that strips embedded whitespace (MIME line
  breaks), validates length is multiple of 4, and checks for valid chars
- Add stripDataUrlPrefix() to handle data URLs (data:image/...;base64,)
  that may arrive from some code paths
- Gracefully replace invalid image blocks with text placeholders instead
  of letting them propagate to the API

New test cases:
- Invalid base64 data (special chars) → replaced with text
- Data URL prefix → stripped and processed normally
- MIME-style line breaks → cleaned and processed normally
- Truncated base64 (e.g. from session cleanup) → replaced with text
- Empty/whitespace-only data → replaced with text
@Grynn Grynn force-pushed the fix/validate-base64-image-data-18212 branch from 298c909 to 9659a03 Compare February 19, 2026 11:33
@Grynn
Copy link
Contributor Author

Grynn commented Feb 19, 2026

Rebased & improved

Rebased on current main (was 1178 commits behind due to upstream refactoring that had removed the original changes).

Changes in this update:

  1. validateAndCleanBase64() replaces the old isStrictBase64() — now also strips embedded whitespace (MIME-style line breaks from RFC 2045 encoders) before validating, returning the cleaned string for downstream use.

  2. stripDataUrlPrefix() — unchanged; handles data:image/...;base64, prefixes.

  3. Two new test cases:

    • MIME-style line breaks in base64 → cleaned and processed normally
    • Truncated base64 (e.g. from session cleanup scripts) → gracefully replaced with text
  4. All 9 tests passing on vitest.e2e.config.ts

Root cause analysis

The error messages.164.content.0.tool_result.content.1.image.source.base64: invalid base64 data occurs because:

  • sanitizeContentBlocksImages (which runs on ALL session history via sanitizeSessionHistory) currently passes image blocks through without validating base64 content
  • The Anthropic provider's convertContentBlocks in @mariozechner/pi-ai has a catch-all else clause that wraps ANY non-text block as {type: 'image', source: {type: 'base64', ...}}
  • If the base64 data is malformed (truncated, has data URL prefix, contains whitespace, or has invalid chars), Anthropic's API rejects it — unlike Node.js Buffer.from(s, 'base64') which silently tolerates such data

This fix adds validation and cleaning at the sanitization layer, gracefully replacing invalid images with text placeholders before they reach the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Invalid base64 image data crashes session — no validation before Anthropic API call

2 participants