fix: Harden input handling by dcodeIO · Pull Request #2163 · protobufjs/protobuf.js

dcodeIO · 2026-04-27T10:28:09Z

Tightens validation and code generation handling for input edge cases, and refreshes generated fixtures and TypeScript definitions.

Xvush · 2026-04-27T11:07:31Z

reviewed the utf8 hardening. the 2-byte, 3-byte, and 4-byte overlong cases are all rejected correctly against the GHSA scenarios -- path traversal (0xC0 0xAF), NUL injection (0xC0 0x80, 0xE0 0x80 0x80), overlong slash via E0/F0 lead (0xE0 0x81 0xAF, 0xF0 0x80 0x80 0xAF) all return U+FFFD now.

two observations on edge cases that the current diff does not cover:

1. lone surrogates in the 3-byte branch. input 0xED 0xA0 0x80 decodes to U+D800 (a high surrogate that should not appear standalone in well-formed UTF-8). RFC 3629 §3 forbids encoding U+D800-U+DFFF. the current c3 >= 0x800 check rejects below-U+0800 overlongs but does not reject the surrogate range. concrete:

const utf8 = require('@protobufjs/utf8');
const t = Buffer.from([0xED, 0xA0, 0x80]);
console.log(utf8.read(t, 0, t.length).charCodeAt(0).toString(16)); // "d800"

minimal patch alongside the existing c3 >= 0x800:

-            str += c3 >= 0x800 ? String.fromCharCode(c3) : replacementChar;
+            str += (c3 >= 0x800 && (c3 < 0xD800 || c3 > 0xDFFF))
+                ? String.fromCharCode(c3)
+                : replacementChar;

2. above-U+10FFFF in the 4-byte branch. input 0xF4 0x90 0x80 0x80 decodes to U+110000, which is outside the Unicode codespace (max U+10FFFF). the current t2 < 0x10000 check catches 4-byte-encoded values that should have been encoded in fewer bytes, but does not catch values that exceed the Unicode max. the bit math at line 64 (String.fromCharCode(0xDC00 + (t2 & 0x3FF))) still emits surrogates without bounds checking.

minimal patch:

-            if (t2 < 0x10000)
+            if (t2 < 0x10000 || t2 > 0x10FFFF)
                 str += replacementChar;
             else {

i can send the regression test cases i drafted in the private fork (commit 366d608 on fix/overlong-utf8-decode, 53/53 pass via tape tests/util_utf8.js) as a follow-up commit if useful for coverage. the suite covers the four GHSA scenarios (path traversal, XSS, SQLi, NUL injection), per-class lead-byte assertions (0xC0/0xC1, 3-byte overlong, surrogates 0xED, > U+10FFFF), 7-case cross-conformance with Buffer.toString("utf8"), and a separate fallback-path test that forces the manual decoder by passing plain number[]. the existing fixture at tests/data/util_utf8/utf8.txt covers roundtrip on valid input but does not exercise these invalid-byte paths.

nothing in this PR contradicts the report's findings on the 2/3/4-byte overlong rejection -- the approach is cleaner than my draft. just flagging the two edge cases above for completeness before merge.

if you'd prefer, i can open a follow-up PR against patch/input-hardening with these two edge-case fixes + the regression test suite -- your call on whether it's cleaner to land here or in a separate PR.

Tofandel · 2026-04-28T15:00:13Z

This seems to be broken

protos:build:protos: > pbjs --force-message --null-semantics -t static-module -w wrapper.js --force-number --dependency protobufjs/minimal.js --es6 -o ./protos.js ../../../protos/*
protos:build:protos:
node_modules/protobufjs-cli/pbjs.js:256
protos:build:protos: throw err;
protos:build:protos: ^
protos:build:protos:
protos:build:protos: TypeError: Cannot read properties of undefined (reading 'reservedRe')

Edit: version mismatch between protobufjs-cli and protobufjs. Some version constraints when making incompatible changes would be welcome

fix: Harden input handling

1b026ce

dcodeIO requested a review from alexander-fenster April 27, 2026 10:28

dcodeIO added 2 commits April 27, 2026 16:21

Replace code points above U+10FFFF

2537371

Merge remote-tracking branch 'origin/master' into patch/input-hardening

db554f8

dcodeIO removed the request for review from alexander-fenster April 27, 2026 21:50

dcodeIO merged commit 6eb3a3b into master Apr 27, 2026
4 checks passed

github-actions Bot mentioned this pull request Apr 27, 2026

chore: release master #2147

Merged

This was referenced Apr 28, 2026

fix: Update CLI peer dependency #2188

Merged

fix: Merge singular message fields while decoding #2195

Merged

dcodeIO deleted the patch/input-hardening branch April 29, 2026 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Harden input handling#2163

fix: Harden input handling#2163
dcodeIO merged 3 commits intomasterfrom
patch/input-hardening

dcodeIO commented Apr 27, 2026

Uh oh!

Xvush commented Apr 27, 2026

Uh oh!

Uh oh!

Tofandel commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dcodeIO commented Apr 27, 2026

Uh oh!

Xvush commented Apr 27, 2026

Uh oh!

Uh oh!

Tofandel commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tofandel commented Apr 28, 2026 •

edited

Loading