Skip to content

Media: reject spoofed input_image MIME payloads#38289

Merged
vincentkoc merged 3 commits intomainfrom
vincentkoc-code/input-image-mime-spoofing-fix
Mar 6, 2026
Merged

Media: reject spoofed input_image MIME payloads#38289
vincentkoc merged 3 commits intomainfrom
vincentkoc-code/input-image-mime-spoofing-fix

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

Summary

  • Problem: normalizeInputImage trusted caller-declared non-HEIC image MIME types before the allowlist check, so a request could claim image/png while supplying concrete non-image bytes.
  • Why it matters: that let spoofed input_image payloads bypass the intended image MIME validation boundary on Gateway HTTP APIs.
  • What changed: image inputs now sniff bytes again before allowlist enforcement, reject concrete non-image detections for declared image/* payloads, and still keep HEIC/HEIF normalization scoped to actual HEIC inputs.
  • What did NOT change (scope boundary): this does not expand accepted MIME types, relax URL fetching policy, or change the existing HEIC -> JPEG normalization path.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Gateway input_image requests now reject spoofed non-image payloads even when the request declares an allowed image MIME type.

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 22 / pnpm workspace
  • Model/provider: n/a
  • Integration/channel (if any): Gateway HTTP APIs
  • Relevant config (redacted): defaults

Steps

  1. Send an input_image base64 or URL request that declares image/png.
  2. Provide bytes that detect as application/pdf instead of an image.
  3. Observe Gateway validation.

Expected

  • Spoofed payloads are rejected before provider delivery.

Actual

  • The previous merged follow-up trusted declared non-HEIC MIME types and could accept spoofed payloads.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: HEIC base64 normalization, HEIC URL normalization, spoofed base64 image rejection, spoofed URL image rejection, OpenAI chat completions gateway path, OpenResponses parity validation.
  • Edge cases checked: non-HEIC images keep declared MIME after validation; HEIC image budget tests remain green.
  • What you did not verify: install smoke locally in Docker.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert this PR.
  • Files/config to restore: src/media/input-files.ts, src/media/input-files.fetch-guard.test.ts, CHANGELOG.md
  • Known bad symptoms reviewers should watch for: valid image inputs getting rejected due to incorrect MIME detection.

Risks and Mitigations

  • Risk: MIME sniffing could reintroduce non-HEIC behavior drift.
    • Mitigation: tests assert non-HEIC images keep their declared MIME after validation while concrete non-image detections are rejected.

AI Assistance

  • AI-assisted: yes
  • Testing: fully tested on the focused Gateway/media suites listed below

Verification:

  • pnpm vitest run src/media/input-files.fetch-guard.test.ts src/gateway/openai-http.test.ts src/gateway/openresponses-parity.test.ts src/gateway/openai-http.image-budget.test.ts
  • pnpm exec oxfmt --check src/media/input-files.ts src/media/input-files.fetch-guard.test.ts CHANGELOG.md

@vincentkoc vincentkoc self-assigned this Mar 6, 2026
@openclaw-barnacle openclaw-barnacle bot added size: S maintainer Maintainer-authored PR labels Mar 6, 2026
@vincentkoc vincentkoc marked this pull request as ready for review March 6, 2026 19:34
@vincentkoc vincentkoc merged commit 084dfd2 into main Mar 6, 2026
29 of 30 checks passed
@vincentkoc vincentkoc deleted the vincentkoc-code/input-image-mime-spoofing-fix branch March 6, 2026 19:34
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ffeeba6938

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +245 to +247
(detectedMime && HEIC_INPUT_IMAGE_MIMES.has(detectedMime)) ||
(HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && !detectedMime)
? (detectedMime ?? declaredMime)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor detected non-HEIC image types before conversion

The new sourceMime selection ignores detectedMime whenever it is an image type other than HEIC/HEIF, so a payload declared as image/heic but sniffed as image/png/image/jpeg is still treated as HEIC and routed through convertHeicToJpeg. In practice (for both base64 and URL inputs with incorrect mediaType/Content-Type), this can recompress or even fail valid non-HEIC images, which regresses the prior behavior and contradicts the intent to scope HEIC normalization to actual HEIC bytes.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR hardens normalizeInputImage in src/media/input-files.ts by unconditionally calling detectMime on all image inputs and rejecting payloads where the declared MIME is image/* but the detected bytes are a concrete non-image type. The security hardening goal is achieved by the guard on line 241, which correctly prevents non-image bytes from bypassing validation.

However, there is a logic regression in the sourceMime selection (lines 244–248). When declaredMime is "image/heic"/"image/heif" but detectedMime is a different image type (e.g., "image/jpeg"), neither ternary condition is satisfied and the code falls back to sourceMime = declaredMime. This causes convertHeicToJpeg to be called on non-HEIC image bytes, which will fail at runtime. The previous code used detectedMime ?? declaredMime for all HEIC-declared inputs, gracefully handling this case. A test case for this "declared HEIC + detected non-HEIC image" scenario is also missing.

The spoofing-rejection guard is correct and new tests verify the intended behaviour for spoofed base64 and URL payloads. The CHANGELOG.md update is clear and appropriately scoped.

Confidence Score: 2/5

  • The security hardening goal for spoofed MIME payloads is achieved, but a logic regression will cause runtime errors when HEIC is declared but non-HEIC image bytes are detected.
  • The spoofing-rejection guard (line 241) correctly prevents non-image bytes from slipping through, and new tests verify this behaviour. However, the sourceMime ternary (lines 244–248) has a regression: when declaredMime is HEIC but detectedMime is a different image type, the code incorrectly sets sourceMime = declaredMime and proceeds to call convertHeicToJpeg on non-HEIC bytes. This will cause runtime failures for valid images with mismatched type declarations. The fix is surgical but critical before merging.
  • src/media/input-files.ts (lines 244–248) requires the sourceMime ternary fix to handle the "declared HEIC + detected non-HEIC image" case. A test case for this scenario should also be added.

Last reviewed commit: ffeeba6

Comment on lines +244 to +248
const sourceMime =
(detectedMime && HEIC_INPUT_IMAGE_MIMES.has(detectedMime)) ||
(HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && !detectedMime)
? (detectedMime ?? declaredMime)
: declaredMime;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HEIC conversion triggered on non-HEIC image bytes when declared MIME is HEIC

When declaredMime is "image/heic" (or "image/heif") but detectedMime is a non-HEIC image type like "image/jpeg", the current logic sets sourceMime = declaredMime = "image/heic" and proceeds to call convertHeicToJpeg on non-HEIC bytes. This is a regression from the previous behavior.

Trace for declaredMime = "image/heic", detectedMime = "image/jpeg":

  • Guard check (line 241): both are image/* → no rejection (correct)
  • Part 1: "image/jpeg" && HEIC_INPUT_IMAGE_MIMES.has("image/jpeg")false
  • Part 2: HEIC_INPUT_IMAGE_MIMES.has("image/heic") && !"image/jpeg"false
  • Overall condition → false, takes the else branch
  • sourceMime = declaredMime = "image/heic"
  • Line 253: tries to convert JPEG bytes as HEIC

The old code handled this gracefully by using detectedMime ?? declaredMime for all HEIC-declared inputs, so a JPEG claimed as HEIC would be returned directly without conversion.

Fix: also use detectedMime when declaredMime is HEIC but a non-HEIC image was actually detected:

Suggested change
const sourceMime =
(detectedMime && HEIC_INPUT_IMAGE_MIMES.has(detectedMime)) ||
(HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && !detectedMime)
? (detectedMime ?? declaredMime)
: declaredMime;
const sourceMime =
(detectedMime && HEIC_INPUT_IMAGE_MIMES.has(detectedMime)) ||
(HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && !detectedMime)
? (detectedMime ?? declaredMime)
: HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && detectedMime
? detectedMime
: declaredMime;

This restores the old behaviour of using the detected MIME for HEIC-declared inputs while preserving all the new spoofing-rejection logic. A corresponding test case for "declared HEIC + detected non-HEIC image" is also missing.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media/input-files.ts
Line: 244-248

Comment:
**HEIC conversion triggered on non-HEIC image bytes when declared MIME is HEIC**

When `declaredMime` is `"image/heic"` (or `"image/heif"`) but `detectedMime` is a non-HEIC image type like `"image/jpeg"`, the current logic sets `sourceMime = declaredMime = "image/heic"` and proceeds to call `convertHeicToJpeg` on non-HEIC bytes. This is a regression from the previous behavior.

Trace for `declaredMime = "image/heic"`, `detectedMime = "image/jpeg"`:
- Guard check (line 241): both are `image/*` → no rejection (correct)
- Part 1: `"image/jpeg" && HEIC_INPUT_IMAGE_MIMES.has("image/jpeg")``false`
- Part 2: `HEIC_INPUT_IMAGE_MIMES.has("image/heic") && !"image/jpeg"``false`
- Overall condition → **`false`**, takes the else branch
- **`sourceMime = declaredMime = "image/heic"`**
- Line 253: tries to convert JPEG bytes as HEIC

The old code handled this gracefully by using `detectedMime ?? declaredMime` for all HEIC-declared inputs, so a JPEG claimed as HEIC would be returned directly without conversion.

Fix: also use `detectedMime` when `declaredMime` is HEIC but a non-HEIC image was actually detected:

```suggestion
  const sourceMime =
    (detectedMime && HEIC_INPUT_IMAGE_MIMES.has(detectedMime)) ||
    (HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && !detectedMime)
      ? (detectedMime ?? declaredMime)
      : HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && detectedMime
        ? detectedMime
        : declaredMime;
```

This restores the old behaviour of using the detected MIME for HEIC-declared inputs while preserving all the new spoofing-rejection logic. A corresponding test case for "declared HEIC + detected non-HEIC image" is also missing.

How can I resolve this? If you propose a fix, please make it concise.

mrosmarin added a commit to mrosmarin/openclaw that referenced this pull request Mar 6, 2026
* main: (37 commits)
  feat(gateway): add channel-backed readiness probes (openclaw#38285)
  CI: enable report-only Knip deadcode job
  Tooling: wire deadcode scripts to Knip
  Tooling: add Knip workspace config
  CI: skip detect-secrets on main temporarily
  Install Smoke: fetch docs base on demand
  CI: fetch base history on demand
  CI: add base-commit fetch helper
  Docs: clarify main secret scan behavior
  CI: keep full secret scans on main
  Docs: update secret scan reproduction steps
  CI: scope secret scans to changed files
  Media: reject spoofed input_image MIME payloads (openclaw#38289)
  chore: code/dead tests cleanup (openclaw#38286)
  Install Smoke: cache docker smoke builds
  Install Smoke: allow reusing prebuilt test images
  Install Smoke: shallow docs-scope checkout
  CI: shallow scope checkouts
  feat(onboarding): add web search to onboarding flow (openclaw#34009)
  chore: disable contributor labels
  ...
vincentkoc added a commit to BryanTegomoh/openclaw-fork that referenced this pull request Mar 8, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening
Saitop pushed a commit to NomiciAI/openclaw that referenced this pull request Mar 8, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening
jenawant pushed a commit to jenawant/openclaw that referenced this pull request Mar 10, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening
dhoman pushed a commit to dhoman/chrono-claw that referenced this pull request Mar 11, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening
senw-developers pushed a commit to senw-developers/va-openclaw that referenced this pull request Mar 17, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening
V-Gutierrez pushed a commit to V-Gutierrez/openclaw-vendor that referenced this pull request Mar 17, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 20, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening

(cherry picked from commit 084dfd2)
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 20, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening

(cherry picked from commit 084dfd2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintainer Maintainer-authored PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant