Revert "Revert "Use urllib handler for s3:// and gs://, improve url_exists through HEAD requests (#34324)""#34498
Conversation
5ca84bc to
6affc28
Compare
6affc28 to
cfb9ff1
Compare
… `url_exists` through HEAD requests (spack#34324)"" This reverts commit 8035eeb.
In the past 4 years the initial HEAD request was never used to prevent a GET request on wrong content-type. So let's just remove the HEAD request then.
cfb9ff1 to
8cf56eb
Compare
|
@spackbot run pipeline |
|
I've started that pipeline for you! |
| # It would be nice to do this with the HTTP Accept header to avoid | ||
| # one round-trip. However, most servers seem to ignore the header | ||
| # if you ask for a tarball with Accept: text/html. | ||
| req.get_method = lambda: "HEAD" |
There was a problem hiding this comment.
In the commit you said broke stuff, is it the check if accept_content_type: that was the problem? I think that was just to avoid always checking for text/html, but I'm not following what the problem was. At any rate, if you say there's no need for this anymore anyway, I'm fine with this particular change.
There was a problem hiding this comment.
How it's supposed to work: in certain cases (spack checksum), it should only send GET requests if the content-type is something specific, which is obtained through a cheap, initial HEAD request. It was broken cause the GET request was no longer conditional. In the general case it's not necessary to send a HEAD request first, and I think this is how this function is used almost always -- this was not broken.
scottwittenburg
left a comment
There was a problem hiding this comment.
Other than the bit I commented on (the bit that's not just reverting a revert), this all looks as good to me today as it did last week 😆 Thanks @haampie!
… `url_exists` through HEAD requests (spack#34324)"" (spack#34498) This reverts commit 8035eeb. And also removes logic around an additional HEAD request to prevent a more expensive GET request on wrong content-type. Since large files are typically an attachment and only downloaded when reading the stream, it's not an optimization that helps much, and in fact the logic was broken since the GET request was done unconditionally.
… `url_exists` through HEAD requests (spack#34324)"" (spack#34498) This reverts commit 8035eeb. And also removes logic around an additional HEAD request to prevent a more expensive GET request on wrong content-type. Since large files are typically an attachment and only downloaded when reading the stream, it's not an optimization that helps much, and in fact the logic was broken since the GET request was done unconditionally.
revert-revert-final-FINAL-2.docx
This reverts commit 8035eeb.
It also now removes a dubious initial
HEADrequest inread_from_url, which used to prevent a follow-upGETrequest, but was broken 4 years ago in fce1c41. The idea there was to prevent a GET request to potentially large files; this is typically not an issue since attachments are often downloaded only once you start reading from the stream. So, let's just remove.