Skip to content

Validate actual base64 data size in FileUpload, not client-reported size#3816

Merged
jlowin merged 2 commits intomainfrom
fix/file-upload-size-bypass
Apr 11, 2026
Merged

Validate actual base64 data size in FileUpload, not client-reported size#3816
jlowin merged 2 commits intomainfrom
fix/file-upload-size-bypass

Conversation

@strawgate
Copy link
Copy Markdown
Collaborator

Summary

The store_files tool in FileUpload enforced max_file_size by checking the client-provided size field, which is untrusted input. A client could set size: 1 while sending a multi-megabyte base64 payload, completely bypassing the limit.

This PR computes the actual data size from the base64 payload length (len(data) * 3 // 4) and uses that for enforcement instead.

provider = FileUpload(max_file_size=100)

# Before: this bypassed the limit
spoofed = {"name": "big.bin", "size": 1, "data": base64.b64encode(b"x" * 500).decode(), ...}
await server.call_tool("Files___store_files", {"files": [spoofed]})  # accepted!

# After: rejected based on actual data size
await server.call_tool("Files___store_files", {"files": [spoofed]})  # ValueError: exceeds max size

Note: since MCP delivers all tool arguments as a single JSON payload, the data is already in server memory by the time the tool runs. This fix prevents oversized files from being stored, but transport-level request size limits (e.g. uvicorn's --limit-request-line) are needed to prevent receipt entirely.

Closes #3815

🤖 Generated with Claude Code

@marvin-context-protocol marvin-context-protocol Bot added bug Something isn't working. Reports of errors, unexpected behavior, or broken functionality. security Security fixes: input validation, SSRF/LFI prevention, auth hardening, injection defenses. labels Apr 10, 2026
chatgpt-codex-connector[bot]

This comment was marked as resolved.

@strawgate strawgate force-pushed the fix/file-upload-size-bypass branch from 249ade9 to 40dff01 Compare April 10, 2026 16:02
chatgpt-codex-connector[bot]

This comment was marked as resolved.

@strawgate strawgate force-pushed the fix/file-upload-size-bypass branch from 40dff01 to 0730b1a Compare April 10, 2026 16:24
The store_files tool checked the client-provided `size` field to enforce
max_file_size, but this field is untrusted input. A client could set
size=1 while sending a multi-megabyte payload, bypassing the limit.

Now computes actual size from the base64 data length instead.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@strawgate strawgate force-pushed the fix/file-upload-size-bypass branch from 0730b1a to aec827c Compare April 10, 2026 16:25
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aec827c528

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +79 to +80
padding = b64.count("=", max(0, n - 2))
return n * 3 // 4 - padding
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce limits from strict decoded byte length

_b64_decoded_size uses n * 3 // 4 - trailing '=', which is only exact for canonical base64 without extra formatting, but this provider later decodes with permissive base64.b64decode(...) semantics. As a result, folded payloads (for example from base64.encodebytes, which inserts newlines) are overcounted and can be rejected even when their decoded size is within max_file_size (e.g. 100 decoded bytes can be computed as 102), while some non-canonical trailing = forms are undercounted by 1 byte. Because store_files now depends on this helper for enforcement, the gate is not checking the true decoded size for decodable inputs.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data field arrives as a JSON string value (it is a property inside a tool call arguments dict), so it cannot contain raw newlines or other formatting — JSON string escaping guarantees canonical base64. base64.encodebytes() line-wrapping is not a concern here since the MCP transport serializes everything through JSON.

For non-canonical trailing = padding: standard base64 always produces 0, 1, or 2 = characters, and the count("=", max(0, n-2)) window handles all three cases exactly. A payload would need to be malformed (e.g. ====) to produce an off-by-one, and base64.b64decode would reject that anyway downstream in on_store.

Copy link
Copy Markdown
Member

@jlowin jlowin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct fix — computing actual base64 payload size instead of trusting client-reported values closes the bypass cleanly. Tests are thorough.

@jlowin jlowin merged commit c946664 into main Apr 11, 2026
10 of 11 checks passed
@jlowin jlowin deleted the fix/file-upload-size-bypass branch April 11, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working. Reports of errors, unexpected behavior, or broken functionality. security Security fixes: input validation, SSRF/LFI prevention, auth hardening, injection defenses.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FileUpload max_file_size check trusts client-reported size field

2 participants