Improve regular expression for removing trailing slashes#103
Improve regular expression for removing trailing slashes#103rvagg merged 2 commits intoisaacs:masterfrom ericcornelissen:patch-1
Conversation
st.js
Outdated
| // trailing slash removal to fix Node.js v23 bug | ||
| // https://github.com/nodejs/node/pull/55527 | ||
| // can be removed when this is resolved and released | ||
| return path.join(this.path, u.replace(/\/+$/, '')) | ||
| return path.join(this.path, u.replace(/(?<!\/)\/+$/, '')) |
There was a problem hiding this comment.
I think the negative look-behind gets us in the wrong place here because it'll not match multiple slashes, voiding the + all together. I think the right answer here is to just avoid regexes entirely; plus the original comment should be changed since we've now essentially baked in normalised paths that help us get consistent rendering when returning a directory listing.
| // trailing slash removal to fix Node.js v23 bug | |
| // https://github.com/nodejs/node/pull/55527 | |
| // can be removed when this is resolved and released | |
| return path.join(this.path, u.replace(/\/+$/, '')) | |
| return path.join(this.path, u.replace(/(?<!\/)\/+$/, '')) | |
| // Normalise paths by removing trailing slashes | |
| // This ensures consistent paths for directory content rendering | |
| while (u.length > 0 && u[u.length - 1] === '/') { | |
| u = u.slice(0, -1) | |
| } | |
| return path.join(this.path, u) |
There was a problem hiding this comment.
I think the negative look-behind gets us in the wrong place here because it'll not match multiple slashes, voiding the
+all together.
That is incorrect, you can test it with let u = "test//"; u.replace(/(?<!\/)\/+$/, '') which will return "test". This is because it will match 1-or-more / starting from a / that is preceded by a non-/. The negative look-behind only applies to the start of \/+, not every \/ in the sequence (though I understand the confusion).
I think the right answer here is to just avoid regexes entirely;
Having said the above, happy to adopt your change if that's what you prefer.
plus the original comment should be changed since we've now essentially baked in normalised paths that help us get consistent rendering when returning a directory listing.
And I'll update this regardless of whether we keep the regex or not.
There was a problem hiding this comment.
sorry, i didn't realise I needed a follow-up here -- tbh I'd prefer to just avoid the regex, I think it's going to be much quicker anyway and it's much more self-documenting than making the regex more complicated. I appreciate the effort though!
|
Well, thanks for the education, I didn't realise we didn't have $ anchored optimisations in V8, so your critique makes sense in that light. But as per the comment, I think the solution is to just not use regexes for something this simple. |
Never heard this term before 🤔 Are you referring to V8's non-backtracking RegExp engine? |
|
No, just the fact that there's a |
Update the regular expression that is used to remove trailing slashes
from the path in `Mount#getPath` to avoid super-linear runtimes. I
believe this does not affect the `st` export because the only path to
this API is through `Mount#serve` which sanitizes the `req.url` using
`path.normalize` (in `Mount#getUriPath`). However, the `Mount` class
is also exported directly, through which `getPath` could be called on
unsanitized inputs.
The super-linear runtime could occur because the regex engine will
backtrack through a repetition of '/'s if it is not at the end. This
is resolved by anchoring a repetition '/' on a non-'/' with a negative
lookbehind.
The issue can be tested and reproduced using this script:
const st = require('st')
const path = require('path')
const mount = new st.Mount({
path: path.join(__dirname, '/static'),
url: '/static'
})
for (let i = 0; i < 90_000; i += 10_000) {
console.time('timer')
mount.getPath(`a${"/".repeat(i)}a`)
console.timeEnd('timer')
}
Co-authored-by: Rod Vagg <[email protected]>
Update the regular expression that is used to remove trailing slashes from the path in
Mount#getPath- introduced in #98 - to avoid super-linear runtimes1. I believe this does not affect thestexport because the only path to this API is throughMount#servewhich sanitizes thereq.urlusingpath.normalize(inMount#getUriPath). However, theMountclass is also exported directly, through whichgetPathcould be called on unsanitized inputs.The super-linear runtime could occur because the regex engine will backtrack through a repetition of '/'s if it is not at the end. This is resolved by anchoring a repetition '/' on a non-'/' with a negative lookbehind.
The issue can be tested and reproduced using this script:
Footnotes
while some consider this an availability problem, I decided to contribute this update in the open because 1) it only affects an undocumented API, and 2) this project lacks a security policy. ↩