fix: update fromDataURI regex to match RFC 2397#10829
Conversation
Update the DATA_URL_PATTERN regex to correctly match all valid RFC 2397 data URIs. The previous regex required a semicolon-terminated media type segment, which rejected valid data URIs like `data:;base64,MTIz` and `data:application/octet-stream,123`. Fixes axios#10808
There was a problem hiding this comment.
1 issue found across 2 files
Confidence score: 3/5
- There is a concrete regression risk in
lib/helpers/fromDataURI.js: RFC 2397 shorthand inputs likedata:;charset=UTF-8,...can now match, but produce a Blob with an invalidtypebecause the implicittext/plaindefault is not normalized. - Given the medium severity (6/10) and high confidence (8/10), this is likely user-facing for consumers that rely on correct Blob MIME types, so the merge carries some functional risk.
- Pay close attention to
lib/helpers/fromDataURI.js- ensure shorthand data URI parsing normalizes omitted media types to RFC defaults before constructing the Blob type.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="lib/helpers/fromDataURI.js">
<violation number="1" location="lib/helpers/fromDataURI.js:36">
P2: Valid RFC 2397 shorthand data URIs (e.g. `data:;charset=UTF-8,...`) are accepted by the new regex but produce an invalid Blob `type` because the omitted `text/plain` default is not normalized.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
When a data URI has parameters but no mediatype (e.g. data:;charset=UTF-8,...), prepend text/plain as the default per RFC 2397 section 3.
jasonsaayman
left a comment
There was a problem hiding this comment.
Thanks for the patch @abhu85, a couple of things to sort out before this can land.
The regex is looser than the one drafted in #10808:
lib/helpers/fromDataURI.js:7. The form you have, /^([^,]*?)(;base64)?,(.*)$/s, accepts shapes that are not valid mediatypes. data:not-a-mime,hello matches with mime="not-a-mime", and data:;notparam,x gets normalized to mime="text/plain;notparam". The issue lies in the stricter form for a reason:
const DATA_URL_PATTERN = /^([^,;]+\/[^,;]+)?((?:;[^,;=]+=[^,;]+)*)(;base64)?,([\s\S]*)$/;That requires type/subtype to actually contain a /, requires parameters to be name=value shaped, and keeps ;base64 as its own group. Please switch to that.
The test matrix from #10808 is not fully ported:
tests/unit/fromDataURI.test.js. The issue listed a specific set of cases that should be fixed. Currently missing:
data:text/plain;charset=US-ASCII,123, the canonical RFC example (charset param with a mediatype, no base64)- URL encoded body decoding, e.g.
data:text/plain,hello%20worldresolving to"hello world" - Blob output preserving the full content type including parameters, e.g.
data:text/plain;charset=utf-8;base64,...givingblob.type === "text/plain;charset=utf-8". The existing normalisation test only covers the parameter-only case datax:,hithrowsUnsupported protocol. The currentnotadata:uritest exercises a different code path inparseProtocoldata:hi(no comma) throwsInvalid URL
###mall out the Blob.type behaviour shift in the PR body:
For inputs like data:text/plain;charset=utf-8;base64,... the produced Blob.type changes from "text/plain" (the old regex truncated at the ;) to "text/plain;charset=utf-8". This is a fix; the old behaviour was silently dropping the charset, but anyone who asserted on the old truncated value will see a diff. One line in the description is enough, so it ends up in the changelog cleanly at release time.
Two small nits:
lib/helpers/fromDataURI.js:38.const isBase64 = match[2] ? 'base64' : '';does not really earn the conversion. The downstream use only needs truthiness. Either drop it or rename toencodingand assign'base64'or'utf8'directly so the next ternary collapses.lib/helpers/fromDataURI.js:36. The dense one-liner is correct, but a small if/else with a one-line comment pointing at RFC 2397 §3 reads cleaner.
|
Hi, Severity: action required | Category: security How to fix: Parse base64 token robustly Agent prompt to fix - you can give this to your LLM of choice:
We noticed a couple of other issues in this PR as well - happy to share if helpful. Found by Qodo code review |
- Switch to type/subtype-aware regex from axios#10808 - Require name=value parameters, separate ;base64 group - Add tests: charset param, URL-encoded body, Blob type preservation, datax: rejection, missing comma rejection - Normalize omitted mediatype to text/plain per RFC 2397 §3
|
All points addressed in e2cbe05: Regex: Switched to the stricter form from #10808 — requires Test matrix: Added the missing cases:
PR body: Added Blob.type behavior change note for changelog. Nits:
All 614 unit tests pass. |
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="lib/helpers/fromDataURI.js">
<violation number="1" location="lib/helpers/fromDataURI.js:47">
P2: RFC 2397’s default `text/plain;charset=US-ASCII` is not applied for `data:,`, so Blob metadata stays empty instead of using the RFC default.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Summary
Update the
fromDataURIregex to correctly match all valid RFC 2397 data URIs.Problem
The current
DATA_URL_PATTERNregex requires a semicolon-terminated media type segment, which rejects valid data URIs likedata:;base64,MTIzanddata:application/octet-stream,123per RFC 2397.Solution
Replace the regex with the stricter RFC 2397-compliant pattern from #10808 that validates
type/subtypeformat, requiresname=valueparameters, and keeps;base64as a separate capture group. Omitted mediatypes are normalized totext/plainper RFC 2397 §3.Blob.type behavior change
For inputs like
data:text/plain;charset=utf-8;base64,..., the producedBlob.typechanges from"text/plain"(the old regex truncated at the first;) to"text/plain;charset=utf-8". The old behaviour silently dropped charset and other parameters.Test Plan
Fixes #10808