treq unable to determine the length of a response
When using treq i've noticed that often times it is unable to get the length of the response.
Reproducible Using:
import requests
import treq
def done(response):
print(response.length)
reactor.stop()
treq.get("https://travis-ci.org/pypa/warehouse.png?branch=master").addCallback(done)
from twisted.internet import reactor
reactor.run()
print(requests.get("https://travis-ci.org/pypa/warehouse.png?branch=master").headers["Content-Length"])
Output:
twisted.web.iweb.UNKNOWN_LENGTH
1492
Curl:
$ curl https://travis-ci.org/pypa/warehouse.png\?branch\=master --location -I
HTTP/1.1 301 Moved Permanently
Content-length: 0
Content-Type: text/html;charset=utf-8
Location: https://api.travis-ci.org/pypa/warehouse.png?branch=master
Connection: keep-alive
HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Content-Type, Cache-Control, Expires, Etag, Last-Modified
Age: 0
Cache-Control: no-cache
Content-Disposition: inline; filename="passing.png"
Content-length: 1461
Content-Type: image/png
Date: Mon, 03 Mar 2014 22:35:18 GMT
Etag: "33e721b0e117a07064572eb8537344a6"
Expires: Mon, 03 Mar 2014 22:35:17 GMT
Last-Modified: Mon, 03 Mar 2014 06:27:20 GMT
Pragma: no-cache
Server: nginx/1.5.7
Status: 200 OK
Strict-Transport-Security: max-age=31536000
Vary: Accept,Accept-Encoding
X-Accepted-Oauth-Scopes: public
X-Content-Digest: aba9e7b121a52e3fdbbfd0b060dba6a3bbcf1bed
X-Endpoint: Travis::Api::App::Endpoint::Repos
X-Oauth-Scopes: public
X-Pattern: /:owner_name/:name
X-Rack-Cache: miss, store
Connection: keep-alive
Do you want treq to lie to you in the same way requests does?
import requests
r = requests.get("https://travis-ci.org/pypa/warehouse.png?branch=master")
print r.headers['content-length']
print len(r.content)
↪︎ python r.py
1492
1461
gzip is... not good apparently?
Requests isn't lying. Content-Length is the transfer length of the body, not the actual size of the body. In requests r.content will be after it's been ungziped. If you use the raw you get the same values:
import requests
r = requests.get("https://travis-ci.org/pypa/warehouse.png?branch=master", stream=True)
print r.headers['content-length']
print len(r.raw.read())
1492
1492
The only reason len(r.content) and r.headers['Content-Length'] don't match is because requests ungzip'd it for us.
Agent makes a distinction between connection headers and end-to-end headers and doesn't expose them connection headers via IResponse.headers
In this case response.length is UNKNOWN_LENGTH because you actually have a _GzipDecoder.
(Unfortunately making a _GzipDecoder actually overwrites the original response's length attribute, https://github.com/twisted/twisted/blob/trunk/twisted/web/client.py#L1505)
Probably what you want can be achieved by disabling content decoding in treq?
Will that allow me to still fetch the content with Content-Encoding: gzip? I suppose so since I could just add that header myself.