Skip to content

fix: update fromDataURI regex to match RFC 2397#10829

Merged
jasonsaayman merged 6 commits into
axios:v1.xfrom
abhu85:fix/10808-data-uri-rfc2397-v1x
May 5, 2026
Merged

fix: update fromDataURI regex to match RFC 2397#10829
jasonsaayman merged 6 commits into
axios:v1.xfrom
abhu85:fix/10808-data-uri-rfc2397-v1x

Conversation

@abhu85
Copy link
Copy Markdown
Contributor

@abhu85 abhu85 commented Apr 30, 2026

Summary

Update the fromDataURI regex to correctly match all valid RFC 2397 data URIs.

Problem

The current DATA_URL_PATTERN regex requires a semicolon-terminated media type segment, which rejects valid data URIs like data:;base64,MTIz and data:application/octet-stream,123 per RFC 2397.

Solution

Replace the regex with the stricter RFC 2397-compliant pattern from #10808 that validates type/subtype format, requires name=value parameters, and keeps ;base64 as a separate capture group. Omitted mediatypes are normalized to text/plain per RFC 2397 §3.

Blob.type behavior change

For inputs like data:text/plain;charset=utf-8;base64,..., the produced Blob.type changes from "text/plain" (the old regex truncated at the first ;) to "text/plain;charset=utf-8". The old behaviour silently dropped charset and other parameters.

Test Plan

  • Added tests for RFC 2397 canonical example (text/plain;charset=US-ASCII)
  • Added test for URL-encoded body decoding (hello%20world)
  • Added test for Blob type preserving full content type with parameters
  • Added test for omitted mediatype normalization to text/plain
  • Added test for datax: protocol rejection (Unsupported protocol)
  • Added test for missing comma (data:hi) rejection (ERR_INVALID_URL)
  • All 614 unit tests pass

Fixes #10808

Update the DATA_URL_PATTERN regex to correctly match all valid RFC 2397
data URIs. The previous regex required a semicolon-terminated media type
segment, which rejected valid data URIs like `data:;base64,MTIz` and
`data:application/octet-stream,123`.

Fixes axios#10808
@abhu85 abhu85 requested a review from jasonsaayman as a code owner April 30, 2026 08:07
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Confidence score: 3/5

  • There is a concrete regression risk in lib/helpers/fromDataURI.js: RFC 2397 shorthand inputs like data:;charset=UTF-8,... can now match, but produce a Blob with an invalid type because the implicit text/plain default is not normalized.
  • Given the medium severity (6/10) and high confidence (8/10), this is likely user-facing for consumers that rely on correct Blob MIME types, so the merge carries some functional risk.
  • Pay close attention to lib/helpers/fromDataURI.js - ensure shorthand data URI parsing normalizes omitted media types to RFC defaults before constructing the Blob type.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="lib/helpers/fromDataURI.js">

<violation number="1" location="lib/helpers/fromDataURI.js:36">
P2: Valid RFC 2397 shorthand data URIs (e.g. `data:;charset=UTF-8,...`) are accepted by the new regex but produce an invalid Blob `type` because the omitted `text/plain` default is not normalized.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread lib/helpers/fromDataURI.js Outdated
When a data URI has parameters but no mediatype (e.g. data:;charset=UTF-8,...),
prepend text/plain as the default per RFC 2397 section 3.
Copy link
Copy Markdown
Member

@jasonsaayman jasonsaayman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @abhu85, a couple of things to sort out before this can land.

The regex is looser than the one drafted in #10808:

lib/helpers/fromDataURI.js:7. The form you have, /^([^,]*?)(;base64)?,(.*)$/s, accepts shapes that are not valid mediatypes. data:not-a-mime,hello matches with mime="not-a-mime", and data:;notparam,x gets normalized to mime="text/plain;notparam". The issue lies in the stricter form for a reason:

const DATA_URL_PATTERN = /^([^,;]+\/[^,;]+)?((?:;[^,;=]+=[^,;]+)*)(;base64)?,([\s\S]*)$/;

That requires type/subtype to actually contain a /, requires parameters to be name=value shaped, and keeps ;base64 as its own group. Please switch to that.

The test matrix from #10808 is not fully ported:

tests/unit/fromDataURI.test.js. The issue listed a specific set of cases that should be fixed. Currently missing:

  • data:text/plain;charset=US-ASCII,123, the canonical RFC example (charset param with a mediatype, no base64)
  • URL encoded body decoding, e.g. data:text/plain,hello%20world resolving to "hello world"
  • Blob output preserving the full content type including parameters, e.g. data:text/plain;charset=utf-8;base64,... giving blob.type === "text/plain;charset=utf-8". The existing normalisation test only covers the parameter-only case
  • datax:,hi throws Unsupported protocol. The current notadata:uri test exercises a different code path in parseProtocol
  • data:hi (no comma) throws Invalid URL

###mall out the Blob.type behaviour shift in the PR body:
For inputs like data:text/plain;charset=utf-8;base64,... the produced Blob.type changes from "text/plain" (the old regex truncated at the ;) to "text/plain;charset=utf-8". This is a fix; the old behaviour was silently dropping the charset, but anyone who asserted on the old truncated value will see a diff. One line in the description is enough, so it ends up in the changelog cleanly at release time.

Two small nits:

  • lib/helpers/fromDataURI.js:38. const isBase64 = match[2] ? 'base64' : ''; does not really earn the conversion. The downstream use only needs truthiness. Either drop it or rename to encoding and assign 'base64' or 'utf8' directly so the next ternary collapses.
  • lib/helpers/fromDataURI.js:36. The dense one-liner is correct, but a small if/else with a one-line comment pointing at RFC 2397 §3 reads cleaner.

@jasonsaayman jasonsaayman added priority::medium A medium priority commit::fix The PR is related to a bugfix status::changes-requested A reviewer requested changes to the PR labels May 1, 2026
@Qodo-Free-For-OSS
Copy link
Copy Markdown

Hi, fromDataURI only treats ;base64 as base64 when it is lowercase and immediately before the comma; other forms (e.g. ;BASE64 or ;base64 not-final) are accepted but decoded as UTF-8, producing incorrect output and potentially bypassing maxContentLength protections that estimate size using base64 semantics.

Severity: action required | Category: security

How to fix: Parse base64 token robustly

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

fromDataURI base64 detection is stricter/different than estimateDataURLDecodedBytes (case-sensitive and positional), which can lead to (1) incorrect decoding (UTF-8 instead of base64) for inputs that include a base64 token but not in the exact expected form, and (2) under-enforcement of maxContentLength due to estimator/decoder disagreement.

Issue Context

  • estimateDataURLDecodedBytes uses /;base64/i over the meta portion.
  • fromDataURI currently only treats the payload as base64 when match[2] is set by (;base64)?.

Fix Focus Areas

  • Make base64 detection case-insensitive and consistent with the estimator.

  • Enforce RFC2397 semantics: if a base64 token is present, it must be the final ;-segment before the comma; if base64 appears but is not final, reject as invalid rather than silently decoding as UTF-8.

  • Add unit tests for:

    • data:text/plain;BASE64,SGVsbG8= decoding correctly
    • data:text/plain;base64;charset=utf-8,SGVsbG8= being rejected (or handled explicitly)
  • lib/helpers/fromDataURI.js[7-40]

  • lib/helpers/estimateDataURLDecodedBytes.js[17-20]

  • lib/adapters/http.js[457-492]

  • lib/adapters/fetch.js[195-209]

  • tests/unit/fromDataURI.test.js[10-61]

We noticed a couple of other issues in this PR as well - happy to share if helpful.


Found by Qodo code review

- Switch to type/subtype-aware regex from axios#10808
- Require name=value parameters, separate ;base64 group
- Add tests: charset param, URL-encoded body, Blob type
  preservation, datax: rejection, missing comma rejection
- Normalize omitted mediatype to text/plain per RFC 2397 §3
@abhu85
Copy link
Copy Markdown
Contributor Author

abhu85 commented May 4, 2026

All points addressed in e2cbe05:

Regex: Switched to the stricter form from #10808 — requires type/subtype with /, parameters as name=value, and ;base64 as its own group. data:not-a-mime,hello and data:;notparam,x no longer match.

Test matrix: Added the missing cases:

  • text/plain;charset=US-ASCII canonical RFC example
  • hello%20world URL-encoded body decoding
  • text/plain;charset=utf-8;base64,... Blob type preserving full content type
  • datax:,hi throws Unsupported protocol
  • data:hi (no comma) throws ERR_INVALID_URL

PR body: Added Blob.type behavior change note for changelog.

Nits:

  • isBase64 replaced with encoding — assigned 'base64' or 'utf8' directly, collapsing the downstream ternary
  • One-liner replaced with if/else + RFC 2397 §3 comment

All 614 unit tests pass.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="lib/helpers/fromDataURI.js">

<violation number="1" location="lib/helpers/fromDataURI.js:47">
P2: RFC 2397’s default `text/plain;charset=US-ASCII` is not applied for `data:,`, so Blob metadata stays empty instead of using the RFC default.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Comment thread lib/helpers/fromDataURI.js
@jasonsaayman jasonsaayman merged commit 5061879 into axios:v1.x May 5, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commit::fix The PR is related to a bugfix priority::medium A medium priority status::changes-requested A reviewer requested changes to the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix fromDataURI regex to match RFC 2397 (no runtime dep)

3 participants