Skip to content

Comments

Always explicitly disable gzip automatic decompression on reqwest client used by object_store#6843

Merged
tustvold merged 5 commits intoapache:mainfrom
phillipleblanc:phillip/241206-disable-gzip
Dec 11, 2024
Merged

Always explicitly disable gzip automatic decompression on reqwest client used by object_store#6843
tustvold merged 5 commits intoapache:mainfrom
phillipleblanc:phillip/241206-disable-gzip

Conversation

@phillipleblanc
Copy link
Contributor

@phillipleblanc phillipleblanc commented Dec 6, 2024

Which issue does this PR close?

Closes apache/arrow-rs-object-store#32

Rationale for this change

Fixes an issue where enabling a non-default feature (gzip) for reqwest would cause object_store to stop working if using the HTTP store against an HTTP server that supports gzip response compression.

What changes are included in this PR?

Call the no_gzip method on the reqest ClientBuilder to ensure that even if the gzip feature is enabled, the object_store client will not use the transparent decompression logic.

I considered making this an option instead of always setting it, but since this is such a frustrating footgun to encounter and debug, I think its better to always set it unless there is a compelling reason not to.

Are there any user-facing changes?

My understanding is that most/all users interacting with object stores do not want the gzip compression logic (since none of the major cloud object store providers support it), so this change should not be breaking.


// Reqwest will remove the `Content-Length` header if it is configured to
// transparently decompress the body via the non-default `gzip` feature.
builder = builder.no_gzip();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that gzipped data content will be left gzipped?

So if I request a resource that the server gzip's in response, that the result I get from ObjectStore::get would also be gzipped 🤔

Copy link
Contributor Author

@phillipleblanc phillipleblanc Dec 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct - sorry I could have made this clearer.

All this affects is what happens when the response has the header Content-Encoding: gzip, which HTTP servers will usually only do when the request has the header Accept-Encoding: gzip. If that is the case, then reqwest will transparently decode the body as a gzip stream and remove the Content-Length header (if the gzip feature is enabled - this no_gzip function explicitly disables that behavior even if the feature is)

For object store APIs, it will just return the bytes of the object as they are (including objects that are gzipped).

@tustvold tustvold merged commit 50cf8bd into apache:main Dec 11, 2024
alamb pushed a commit to alamb/arrow-rs that referenced this pull request Mar 20, 2025
…lient used by object_store (apache#6843)

* Explicitly disable gzip on reqwest client used by object_store

* Add comment

* Add integration test for checking reqwest gzip feature

* Fix lint

* Add comment explaining why gzip feature is enabled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

object_store errors when reqwest gzip feature is enabled

3 participants