aiohttp/http_parser.py: pure-python fix for big chunk decoding (#1899)#1900
aiohttp/http_parser.py: pure-python fix for big chunk decoding (#1899)#1900Anvil wants to merge 4 commits intoaio-libs:masterfrom
Conversation
|
I can rebase on 2.0 if required. |
|
could you add test? |
|
I'm trying... |
…ibs#1899) Signed-off-by: Damien Nadé <[email protected]>
Signed-off-by: Damien Nadé <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #1900 +/- ##
==========================================
- Coverage 97.06% 97.05% -0.02%
==========================================
Files 37 37
Lines 7560 7560
Branches 1314 1314
==========================================
- Hits 7338 7337 -1
- Misses 100 101 +1
Partials 122 122
Continue to review full report at Codecov.
|
Signed-off-by: Damien Nadé <[email protected]>
Signed-off-by: Damien Nadé <[email protected]>
|
OK, this is the best I could come up with. I know it does the job, I dont know if it's minimalist enough. |
|
The failure in the tests in the C parser failing, not the python one. |
|
I agree with concern about memory footprint. And I guess this feature should be done for both parsers in anyway in order to preserve similar behaviour. As about the problem, I think here is the server who should be fixed in order to not send too big chunks. Otherwise, there could be some magic constant which user explicitly will change in order to receive big chunks. But certainly, that's not good idea to have support of those by default for everyone without any way to disable it. |
|
I don't think this is the bug at all. From my perspective partial chunks are totally correct. |
|
Moreover proxy could change transfer encoding, it even could compress/decompress the body. |
|
I agree, developer should not rely on transfer-encoding |
|
@Anvil do you have a strong reason for supporting this behavior. |
|
(Ok for reference and reminder, chunk encoding is I think that you cannot "fix" servers and tell them "dont send big chunk". No offense, but the choice is yours to make. Now about the bug itself. Because it is a bug. This is not about big chunks, this is about partial chunks. Thing is, aiohttp client/parser detects content is chunk-encoded (through HTTP headers) and starts interpreting the data by catching it and by stripping the size information from the communication. In case two chunks are received at once, aiohttp knows how many bytes to expect, nicely performs the split and the Seeing that, we thought that it was aiohttp job to handle all cases and to return us chunks. Not raw data. Sounded like a very nice feature. Turned out that when a chunk is partially received, iter_any yields partial content. Annnd I think this defeats the purpose of the generator. This is not exactly a reliable behavior. IMHO, this patch brings consistency back. At the cost a potentially big memory footprint, yes. My opinion is that if someone wants not to have the whole chunk in memory, (s)he should disable chunk parsing in aiohttp and should implement his/her own way to handle chunks. Also, if I should not rely in transfer-encoding, then, with all due respect, neither should you, nor the aiohttp client. (If my opinion matters on the subject, yes, both parsers should implement this.) Regards. (Edit: fixed |
|
I wouldn't mind adding this feature if it would be possible to limit to one use case, this would affect everyone. Probably in your case it is ok to load everything into memory, but I am sure for most users that would be very big surprise. Also discovering that anyone can kill your service without much effort would be not very pleasant experience with framework |
|
What about add a max-read-size parameter as the ones in read() functions? We may set it to something like 1MB by default, if the chunk size exceeds the limit, partial result is returned. And if the user set it to None, it means that he is willing to accept the risk. That's the same as open("xxx","r").read(). |
|
And also I would like to remind that for usages that yields multiple JSON objects through chunked-encoded data, it is not necessary to depend on the chunk boundary to split it. JSON objects (data surrounded by {}) can be splitted by a state machine e.g. counting the braces in a smart way, this is used in JSON-RPC protocol. |
|
I could live with chunk splitting by default but supporting opt-in flag for brave guys who have unlimited memory and don't care about attacks. |
What do these changes do?
It should be a pure-python fix #1899
The .pyx implementation is not done. Not sure I'll be able to do that.
Are there changes in behavior for the user?
response.content.iter_any() will not yield partial chunk
Related issue number
#1899
Checklist
CONTRIBUTORS.txtCHANGES.rst#issue_numberformat at the end of changelog message. Use Pull Request number if there are no issues for PR or PR covers the issue only partially.