gh-69589, gh-84774: Fix path normalization in urllib.parse.urljoin()#126679
gh-69589, gh-84774: Fix path normalization in urllib.parse.urljoin()#126679serhiy-storchaka wants to merge 2 commits intopython:mainfrom
Conversation
…e.urljoin() * Preserve double slashes in path. * Fix the case when the base path is relative and the relative reference path starts with '..'.
659a910 to
26f0b9e
Compare
| self.checkJoin('http://a', scheme + './//', 'http://a//') | ||
|
|
||
| self.checkJoin('b/c', '', 'b/c') | ||
| self.checkJoin('b/c', '//', 'b/c') |
There was a problem hiding this comment.
RFC 3986 carefully distinguishes between undefined and empty, and // has an empty authority, not undefined, so we should hit the if defined(R.authority) branch in §5.2.2. The result should be //.
(This is independent of the discussion of #96015, I think.)
There was a problem hiding this comment.
Yes, I know. I left them non-distinguished for compatibility. We will likely change this in a separate issue.
| self.checkJoin('b/c', '//v', '//v') | ||
| self.checkJoin('b/c', '//v/w', '//v/w') | ||
| self.checkJoin('b/c', '/w', '/w') | ||
| self.checkJoin('b/c', '///w', '/w') |
There was a problem hiding this comment.
Same; the result should be ///w.
| self.checkJoin('b/c', '../../w', 'w') | ||
| self.checkJoin('b/c', '../../../w', 'w') | ||
| self.checkJoin('b/c', 'w/.', 'b/w/') | ||
| self.checkJoin('b/c', '../w/.', 'w/') | ||
| self.checkJoin('b/c', '../../w/.', 'w/') | ||
| self.checkJoin('b/c', '../../../w/.', 'w/') | ||
| self.checkJoin('b/c', '..', '') | ||
| self.checkJoin('b/c', '../..', '') | ||
| self.checkJoin('b/c', '../../..', '') |
There was a problem hiding this comment.
Although these fall outside the direct scope of the pseudocode defined in RFC 3986 because b/c is not an absolute base URI, they violate the obvious expectation that urljoin should be associative. See
Given non–RFC 3986 input where the base URI is path-relative (undefined scheme, undefined authority, and path not beginning with /), we should preserve extra initial .. components in the output:
| self.checkJoin('b/c', '../../w', 'w') | |
| self.checkJoin('b/c', '../../../w', 'w') | |
| self.checkJoin('b/c', 'w/.', 'b/w/') | |
| self.checkJoin('b/c', '../w/.', 'w/') | |
| self.checkJoin('b/c', '../../w/.', 'w/') | |
| self.checkJoin('b/c', '../../../w/.', 'w/') | |
| self.checkJoin('b/c', '..', '') | |
| self.checkJoin('b/c', '../..', '') | |
| self.checkJoin('b/c', '../../..', '') | |
| self.checkJoin('b/c', '../../w', '../w') | |
| self.checkJoin('b/c', '../../../w', '../../w') | |
| self.checkJoin('b/c', 'w/.', 'b/w/') | |
| self.checkJoin('b/c', '../w/.', 'w/') | |
| self.checkJoin('b/c', '../../w/.', '../w/') | |
| self.checkJoin('b/c', '../../../w/.', '../../w/') | |
| self.checkJoin('b/c', '..', '') | |
| self.checkJoin('b/c', '../..', '..') | |
| self.checkJoin('b/c', '../../..', '../..') |
Uh oh!
There was an error while loading. Please reload this page.