Always explicitly disable gzip automatic decompression on reqwest client used by object_store#6843
Conversation
|
|
||
| // Reqwest will remove the `Content-Length` header if it is configured to | ||
| // transparently decompress the body via the non-default `gzip` feature. | ||
| builder = builder.no_gzip(); |
There was a problem hiding this comment.
does this mean that gzipped data content will be left gzipped?
So if I request a resource that the server gzip's in response, that the result I get from ObjectStore::get would also be gzipped 🤔
There was a problem hiding this comment.
Yes, that is correct - sorry I could have made this clearer.
All this affects is what happens when the response has the header Content-Encoding: gzip, which HTTP servers will usually only do when the request has the header Accept-Encoding: gzip. If that is the case, then reqwest will transparently decode the body as a gzip stream and remove the Content-Length header (if the gzip feature is enabled - this no_gzip function explicitly disables that behavior even if the feature is)
For object store APIs, it will just return the bytes of the object as they are (including objects that are gzipped).
…lient used by object_store (apache#6843) * Explicitly disable gzip on reqwest client used by object_store * Add comment * Add integration test for checking reqwest gzip feature * Fix lint * Add comment explaining why gzip feature is enabled
Which issue does this PR close?
Closes apache/arrow-rs-object-store#32
Rationale for this change
Fixes an issue where enabling a non-default feature (
gzip) forreqwestwould causeobject_storeto stop working if using the HTTP store against an HTTP server that supports gzip response compression.What changes are included in this PR?
Call the
no_gzipmethod on the reqest ClientBuilder to ensure that even if thegzipfeature is enabled, theobject_storeclient will not use the transparent decompression logic.I considered making this an option instead of always setting it, but since this is such a frustrating footgun to encounter and debug, I think its better to always set it unless there is a compelling reason not to.
Are there any user-facing changes?
My understanding is that most/all users interacting with object stores do not want the
gzipcompression logic (since none of the major cloud object store providers support it), so this change should not be breaking.