Skip to content

fix: Harden input handling#2163

Merged
dcodeIO merged 3 commits intomasterfrom
patch/input-hardening
Apr 27, 2026
Merged

fix: Harden input handling#2163
dcodeIO merged 3 commits intomasterfrom
patch/input-hardening

Conversation

@dcodeIO
Copy link
Copy Markdown
Member

@dcodeIO dcodeIO commented Apr 27, 2026

Tightens validation and code generation handling for input edge cases, and refreshes generated fixtures and TypeScript definitions.

@Xvush
Copy link
Copy Markdown

Xvush commented Apr 27, 2026

reviewed the utf8 hardening. the 2-byte, 3-byte, and 4-byte overlong cases are all rejected correctly against the GHSA scenarios -- path traversal (0xC0 0xAF), NUL injection (0xC0 0x80, 0xE0 0x80 0x80), overlong slash via E0/F0 lead (0xE0 0x81 0xAF, 0xF0 0x80 0x80 0xAF) all return U+FFFD now.

two observations on edge cases that the current diff does not cover:

1. lone surrogates in the 3-byte branch. input 0xED 0xA0 0x80 decodes to U+D800 (a high surrogate that should not appear standalone in well-formed UTF-8). RFC 3629 §3 forbids encoding U+D800-U+DFFF. the current c3 >= 0x800 check rejects below-U+0800 overlongs but does not reject the surrogate range. concrete:

const utf8 = require('@protobufjs/utf8');
const t = Buffer.from([0xED, 0xA0, 0x80]);
console.log(utf8.read(t, 0, t.length).charCodeAt(0).toString(16)); // "d800"

minimal patch alongside the existing c3 >= 0x800:

-            str += c3 >= 0x800 ? String.fromCharCode(c3) : replacementChar;
+            str += (c3 >= 0x800 && (c3 < 0xD800 || c3 > 0xDFFF))
+                ? String.fromCharCode(c3)
+                : replacementChar;

2. above-U+10FFFF in the 4-byte branch. input 0xF4 0x90 0x80 0x80 decodes to U+110000, which is outside the Unicode codespace (max U+10FFFF). the current t2 < 0x10000 check catches 4-byte-encoded values that should have been encoded in fewer bytes, but does not catch values that exceed the Unicode max. the bit math at line 64 (String.fromCharCode(0xDC00 + (t2 & 0x3FF))) still emits surrogates without bounds checking.

minimal patch:

-            if (t2 < 0x10000)
+            if (t2 < 0x10000 || t2 > 0x10FFFF)
                 str += replacementChar;
             else {

i can send the regression test cases i drafted in the private fork (commit 366d608 on fix/overlong-utf8-decode, 53/53 pass via tape tests/util_utf8.js) as a follow-up commit if useful for coverage. the suite covers the four GHSA scenarios (path traversal, XSS, SQLi, NUL injection), per-class lead-byte assertions (0xC0/0xC1, 3-byte overlong, surrogates 0xED, > U+10FFFF), 7-case cross-conformance with Buffer.toString("utf8"), and a separate fallback-path test that forces the manual decoder by passing plain number[]. the existing fixture at tests/data/util_utf8/utf8.txt covers roundtrip on valid input but does not exercise these invalid-byte paths.

nothing in this PR contradicts the report's findings on the 2/3/4-byte overlong rejection -- the approach is cleaner than my draft. just flagging the two edge cases above for completeness before merge.

if you'd prefer, i can open a follow-up PR against patch/input-hardening with these two edge-case fixes + the regression test suite -- your call on whether it's cleaner to land here or in a separate PR.

@dcodeIO dcodeIO removed the request for review from alexander-fenster April 27, 2026 21:50
@dcodeIO dcodeIO merged commit 6eb3a3b into master Apr 27, 2026
4 checks passed
@github-actions github-actions Bot mentioned this pull request Apr 27, 2026
@Tofandel
Copy link
Copy Markdown

Tofandel commented Apr 28, 2026

This seems to be broken

protos:build:protos: > pbjs --force-message --null-semantics -t static-module -w wrapper.js --force-number --dependency protobufjs/minimal.js --es6 -o ./protos.js ../../../protos/*
protos:build:protos:
node_modules/protobufjs-cli/pbjs.js:256
protos:build:protos: throw err;
protos:build:protos: ^
protos:build:protos:
protos:build:protos: TypeError: Cannot read properties of undefined (reading 'reservedRe')

Edit: version mismatch between protobufjs-cli and protobufjs. Some version constraints when making incompatible changes would be welcome

@dcodeIO dcodeIO deleted the patch/input-hardening branch April 29, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants