Skip to content

Commit de95cca

Browse files
Fix potentially cubic-time regex in parsePatch (kpdecker#647)
* Add a test for the existing handling of Mercurial patches (This is the feature involved in kpdecker#644, and I want to try to avoid regressions. It previously had no automated tests.) * Fix ReDOS * Add release notes
1 parent b9aeede commit de95cca

3 files changed

Lines changed: 149 additions & 4 deletions

File tree

release-notes.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
- [#631](https://github.com/kpdecker/jsdiff/pull/631) - **fix support for using an `Intl.Segmenter` with `diffWords`**. This has been almost completely broken since the feature was added in v6.0.0, since it would outright crash on any text that featured two consecutive newlines between a pair of words (a very common case).
66
- [#635](https://github.com/kpdecker/jsdiff/pull/635) - **small tweaks to tokenization behaviour of `diffWords`** when used *without* an `Intl.Segmenter`. Specifically, the soft hyphen (U+00AD) is no longer considered to be a word break, and the multiplication and division signs (`×` and `÷`) are now treated as punctuation instead of as letters / word characters.
77
- [#641](https://github.com/kpdecker/jsdiff/pull/641) - **the format of file headers in `createPatch` etc. patches can now be customised somewhat**. It now takes a `headerOptions` option that can be used to disable the file headers entirely, or omit the `Index:` line and/or the underline. In particular, this was motivated by a request to make jsdiff patches compatible with react-diff-view, which they now are if produced with `headerOptions: FILE_HEADERS_ONLY`.
8+
- [#647](https://github.com/kpdecker/jsdiff/pull/647) and [#TODO] - **fix ReDOS vulnerabilities in `parsePatch`**. Previously, adversarially-crafted patch headers could take cubic time to parse; now, `parsePatch` should reliably take linear time. (Handling of headers that include the line break characters `\r`, `\u2028`, or `\u2029` in non-trailing positions is also now more reasonable as side effect of the fix.)
89

910
## 8.0.2
1011

src/patch/parse.ts

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,27 @@ export function parsePatch(uniDiff: string): StructuredPatch[] {
2323
break;
2424
}
2525

26-
// Diff index
27-
const header = (/^(?:Index:|diff(?: -r \w+)+)\s+(.+?)\s*$/).exec(line);
28-
if (header) {
29-
index.index = header[1];
26+
// Try to parse the line as a diff header, like
27+
// Index: README.md
28+
// or
29+
// diff -r 9117c6561b0b -r 273ce12ad8f1 .hgignore
30+
// or
31+
// Index: something with multiple words
32+
// and extract the filename (or whatever else is used as an index name)
33+
// from the end (i.e. 'README.md', '.hgignore', or
34+
// 'something with multiple words' in the examples above).
35+
//
36+
// TODO: It seems awkward that we indiscriminately trim off trailing
37+
// whitespace here. Theoretically, couldn't that be meaningful -
38+
// e.g. if the patch represents a diff of a file whose name ends
39+
// with a space? Seems wrong to nuke it.
40+
// But this behaviour has been around since v2.2.1 in 2015, so if
41+
// it's going to change, it should be done cautiously and in a new
42+
// major release, for backwards-compat reasons.
43+
// -- ExplodingCabbage
44+
const headerMatch = (/^(?:Index:|diff(?: -r \w+)+)\s+/).exec(line);
45+
if (headerMatch) {
46+
index.index = line.substring(headerMatch[0].length).trim();
3047
}
3148

3249
i++;

test/patch/parse.js

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,133 @@ Index: test2
172172
}]);
173173
});
174174

175+
it('should parse the header format used by Mercurial', function() {
176+
// (At least, Mercurial is the only tool I could find that uses this
177+
// format. Claude was unable to suggest any other tool that would produce
178+
// this format, and I don't know of any either. See
179+
// https://claude.ai/share/51e202d0-9da0-4dfa-a4a4-d6c6b476b300.)
180+
//
181+
// Support for this got added by Kevin in commit:
182+
// 0c9dd6d0e622d8a32b441b45baa797a7e86a4c55
183+
//
184+
// I find it a bit odd that (at the time of adding this test) our header
185+
// parsing has special handling for Mercurial's diff format but does not
186+
// support Git's format (given Git is much more popular). I also find it
187+
// a bit odd that we discard the information in the header about what
188+
// revisions are being diffed and preserve only the filename (which is
189+
// available anyway via the lines below, and exposed by us in the
190+
// oldFileName and newFileName fields). But for now I am just trying to
191+
// document and test the current state of things.
192+
//
193+
// -- ExplodingCabbage
194+
195+
// (Patch below was produced by running `hg diff -r 0 -r 1` in the
196+
// Mercurial repo for Mercurial itself.)
197+
const patchStr = `diff -r 9117c6561b0b -r 273ce12ad8f1 .hgignore
198+
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
199+
+++ b/.hgignore Tue May 03 13:27:13 2005 -0800
200+
@@ -0,0 +1,1 @@
201+
+.*~
202+
diff -r 9117c6561b0b -r 273ce12ad8f1 README
203+
--- a/README Tue May 03 13:16:10 2005 -0800
204+
+++ b/README Tue May 03 13:27:13 2005 -0800
205+
@@ -69,6 +69,10 @@
206+
207+
Network support (highly experimental):
208+
209+
+ # pull the self-hosting hg repo
210+
+ foo$ hg init
211+
+ foo$ hg merge http://selenic.com/hg/
212+
+
213+
# export your .hg directory as a directory on your webserver
214+
foo$ ln -s .hg ~/public_html/hg-linux
215+
216+
@@ -76,5 +80,10 @@
217+
bar$ hg merge http://foo/~user/hg-linux
218+
219+
This is just a proof of concept of grabbing byte ranges, and is not
220+
- expected to perform well.
221+
+ expected to perform well. Fixing this needs some pipelining to reduce
222+
+ the number of round trips. See zsync for a similar approach.
223+
224+
+ Another approach which does perform well right now is to use rsync.
225+
+ Simply rsync the remote repo to a read-only local copy and then do a
226+
+ local pull.
227+
+
228+
`;
229+
230+
const patchObj = parsePatch(patchStr);
231+
expect(patchObj).to.deep.equals([
232+
{
233+
// Parsed from line `diff -r 9117c6561b0b -r 273ce12ad8f1 .hgignore`:
234+
index: '.hgignore',
235+
// Parsed from line `--- /dev/null Thu Jan 01 00:00:00 1970 +0000`:
236+
oldFileName: '/dev/null',
237+
oldHeader: 'Thu Jan 01 00:00:00 1970 +0000',
238+
// Parsed from line `+++ b/.hgignore Tue May 03 13:27:13 2005 -0800`:
239+
newFileName: 'b/.hgignore',
240+
newHeader: 'Tue May 03 13:27:13 2005 -0800',
241+
hunks: [
242+
{
243+
oldStart: 1,
244+
oldLines: 0,
245+
newStart: 1,
246+
newLines: 1,
247+
lines: [
248+
'+.*~'
249+
]
250+
}
251+
]
252+
},
253+
{
254+
index: 'README',
255+
oldFileName: 'a/README',
256+
oldHeader: 'Tue May 03 13:16:10 2005 -0800',
257+
newFileName: 'b/README',
258+
newHeader: 'Tue May 03 13:27:13 2005 -0800',
259+
hunks: [
260+
{
261+
oldStart: 69,
262+
oldLines: 6,
263+
newStart: 69,
264+
newLines: 10,
265+
lines: [
266+
'',
267+
' Network support (highly experimental):',
268+
'',
269+
'+ # pull the self-hosting hg repo',
270+
'+ foo$ hg init',
271+
'+ foo$ hg merge http://selenic.com/hg/',
272+
'+',
273+
' # export your .hg directory as a directory on your webserver',
274+
' foo$ ln -s .hg ~/public_html/hg-linux',
275+
''
276+
]
277+
},
278+
{
279+
oldStart: 76,
280+
oldLines: 5,
281+
newStart: 80,
282+
newLines: 10,
283+
lines: [
284+
' bar$ hg merge http://foo/~user/hg-linux',
285+
'',
286+
' This is just a proof of concept of grabbing byte ranges, and is not',
287+
'- expected to perform well.',
288+
'+ expected to perform well. Fixing this needs some pipelining to reduce',
289+
'+ the number of round trips. See zsync for a similar approach.',
290+
'',
291+
'+ Another approach which does perform well right now is to use rsync.',
292+
'+ Simply rsync the remote repo to a read-only local copy and then do a',
293+
'+ local pull.',
294+
'+'
295+
]
296+
}
297+
]
298+
}
299+
]);
300+
});
301+
175302
it('should parse multiple files without the Index line', function() {
176303
expect(parsePatch(
177304
`--- from\theader1

0 commit comments

Comments
 (0)