GH-123599: url2pathname(): handle authority section in file URL#126844
GH-123599: url2pathname(): handle authority section in file URL#126844barneygale merged 26 commits intopython:mainfrom
url2pathname(): handle authority section in file URL#126844Conversation
… on POSIX Adjust `urllib.request.url2pathname()` to parse the URL authority and path with `urlsplit()` on POSIX. If the authority is empty or resolves to the current host, it is ignored and the URL path is used as the pathname. If not, we raise `URLError`.
url2pathname(): handle non-empty authority section on POSIXurl2pathname(): handle non-empty authority section on POSIX
url2pathname(): handle non-empty authority section on POSIXurl2pathname(): handle authority section on POSIX
url2pathname(): handle authority section on POSIXurl2pathname(): handle authority section in file URL
AA-Turner
left a comment
There was a problem hiding this comment.
Two reviews for the price of one!
Co-authored-by: Adam Turner <[email protected]>
|
Quick note on timing for this PR and #125866. If possible I'd like to land this PR in time for 3.14 beta 1, in ~6 weeks time. It would mean we restrict backwards-incompatible changes to 3.14, with none planned for 3.15. I think that would be better for users who might be affected by the changes - e.g. folks who previously wrapped the I'll wait until 3.15 before I look at #125866 so I don't overload 3.14. That change will be 100% backwards-compatible. Happy to re-think if anyone is unhappy with this plan. Cheers |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Some changes may be not so innocent:
url2pathname()now performs network requests (and hang for a time).url2pathname()now can raise URLError.- The result of
gethostbyname_ex()andgethostname()is cached. Previously, you could reset this by settingFileHandler.names = None. - There may be a difference in handling authorities with port. It is not covered by tests.
Co-authored-by: Bénédikt Tran <[email protected]>
|
Thank you for the reviews!
That seems to be needed per RFC 8089 section 3 (ref) and section 2 less explicitly. And it only comes up if the hostname isn't empty or "localhost" Even so, perhaps we should put the new behaviour behind an argument like
I think this is preferable to returning a nonsense local path, e.g.
Fixed!
I've added the test cases you suggested. |
|
Thank you both :) |
|
In
urllib.request.url2pathname(), if the authority resolves to the current host, discard it. If an authority is present but resolves somewhere else, then on Windows we return a UNC path (as before), and on other platforms we raiseURLError.Affects
pathlib.Path.from_uri()in the same way.I'm indebted to Eryk Sun for his analysis.
Path.from_uri()doesn't work if the URI contains host component #123599