treq icon indicating copy to clipboard operation
treq copied to clipboard

treq unable to determine the length of a response

Open dstufft opened this issue 12 years ago • 5 comments

When using treq i've noticed that often times it is unable to get the length of the response.

Reproducible Using:

import requests
import treq


def done(response):
    print(response.length)
    reactor.stop()

treq.get("https://travis-ci.org/pypa/warehouse.png?branch=master").addCallback(done)

from twisted.internet import reactor
reactor.run()

print(requests.get("https://travis-ci.org/pypa/warehouse.png?branch=master").headers["Content-Length"])

Output:

twisted.web.iweb.UNKNOWN_LENGTH
1492

Curl:

$ curl https://travis-ci.org/pypa/warehouse.png\?branch\=master --location -I
HTTP/1.1 301 Moved Permanently
Content-length: 0
Content-Type: text/html;charset=utf-8
Location: https://api.travis-ci.org/pypa/warehouse.png?branch=master
Connection: keep-alive

HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Content-Type, Cache-Control, Expires, Etag, Last-Modified
Age: 0
Cache-Control: no-cache
Content-Disposition: inline; filename="passing.png"
Content-length: 1461
Content-Type: image/png
Date: Mon, 03 Mar 2014 22:35:18 GMT
Etag: "33e721b0e117a07064572eb8537344a6"
Expires: Mon, 03 Mar 2014 22:35:17 GMT
Last-Modified: Mon, 03 Mar 2014 06:27:20 GMT
Pragma: no-cache
Server: nginx/1.5.7
Status: 200 OK
Strict-Transport-Security: max-age=31536000
Vary: Accept,Accept-Encoding
X-Accepted-Oauth-Scopes: public
X-Content-Digest: aba9e7b121a52e3fdbbfd0b060dba6a3bbcf1bed
X-Endpoint: Travis::Api::App::Endpoint::Repos
X-Oauth-Scopes: public
X-Pattern: /:owner_name/:name
X-Rack-Cache: miss, store
Connection: keep-alive

dstufft avatar Mar 03 '14 22:03 dstufft

Do you want treq to lie to you in the same way requests does?

import requests

r = requests.get("https://travis-ci.org/pypa/warehouse.png?branch=master")

print r.headers['content-length']
print len(r.content)
↪︎ python r.py
1492
1461

dreid avatar Mar 03 '14 22:03 dreid

gzip is... not good apparently?

alex avatar Mar 03 '14 22:03 alex

Requests isn't lying. Content-Length is the transfer length of the body, not the actual size of the body. In requests r.content will be after it's been ungziped. If you use the raw you get the same values:

import requests

r = requests.get("https://travis-ci.org/pypa/warehouse.png?branch=master", stream=True)

print r.headers['content-length']
print len(r.raw.read())
1492
1492

The only reason len(r.content) and r.headers['Content-Length'] don't match is because requests ungzip'd it for us.

dstufft avatar Mar 03 '14 23:03 dstufft

Agent makes a distinction between connection headers and end-to-end headers and doesn't expose them connection headers via IResponse.headers

In this case response.length is UNKNOWN_LENGTH because you actually have a _GzipDecoder.

(Unfortunately making a _GzipDecoder actually overwrites the original response's length attribute, https://github.com/twisted/twisted/blob/trunk/twisted/web/client.py#L1505)

Probably what you want can be achieved by disabling content decoding in treq?

dreid avatar Mar 03 '14 23:03 dreid

Will that allow me to still fetch the content with Content-Encoding: gzip? I suppose so since I could just add that header myself.

dstufft avatar Mar 03 '14 23:03 dstufft