Skip to content

make HTTPResponse.stream use read1 when amt=None#3216

Open
smason wants to merge 4 commits intourllib3:mainfrom
smason:stream-via-read1
Open

make HTTPResponse.stream use read1 when amt=None#3216
smason wants to merge 4 commits intourllib3:mainfrom
smason:stream-via-read1

Conversation

@smason
Copy link
Contributor

@smason smason commented Nov 27, 2023

Following up on #3186 (and suggested in #2125) this uses the new read1 method when streaming a non-chunked response and amt=None.

This would mean that psf/requests#5536 could also be closed.

@smason
Copy link
Contributor Author

smason commented Nov 27, 2023

@illia-v A hopefully simple followup to my last PR. This makes the actual change to the behavior of stream(None) — hopefully in a way that people expect

Not sure why all those tests timed out! I rebased to main branch before making this, I see there's been a bit of a rework of the test infrastructure recently could that be related?

@illia-v
Copy link
Member

illia-v commented Nov 29, 2023

@saschpe thanks! The test failures look to be a result of this PR.

The case is pretty similar to hanging you described in python/cpython#112064.
I guess this while loop iterates many times needlessly because self._fp is not closed when all data is read:

while not is_fp_closed(self._fp) or len(self._decoded_buffer) > 0:

@smason
Copy link
Contributor Author

smason commented Nov 30, 2023

@illia-v I was only running the few tests I cared about locally as the whole suite takes a long time and missed that unintended breakage, should be fixed now.

I tried to rework stream to look more like a conventional read loop, but the behavior of read auto-closing the stream in a caller observable manner seems to go against this. AFAIK the modern idiomatic way would look like:

while data := fp.read(amt):
    yield data

but this raises on the final read due to the response having been implicitly closed in the read before, which seems awkward.

Am wondering whether this is worth trying to clean up closed handling separately?

Comment on lines 1029 to 1032
if not data:
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the loop is broken here after read1 was used, self._fp remains not closed even though all data has been read. Closing it may be a better way to break the loop. What do you think about checking self.length_remaining after calling read1 and, if it is 0 and not self.closed, calling self._fp.close()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the loop is broken here after read1 was used, self._fp remains not closed even though all data has been read.

Hadn't realised the semantics around what gets closed and when, have done a bit more reading now. Have had a bit of a refactor of the change.

Closing it may be a better way to break the loop. What do you think about checking self.length_remaining after calling read1 and, if it is 0 and not self.closed, calling self._fp.close()?

It feels as though it could raise an IncompleteRead if it finishes and length_remaining indicates there's missing data (as _raw_read does), any preferences on this / other changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels as though it could raise an IncompleteRead if it finishes and length_remaining indicates there's missing data (as _raw_read does), any preferences on this / other changes

Good point!
Looking at CPython's code, it's possible to get in the situation at least if self._fp is closed between read1 calls. However, you have to respect self.enforce_content_length as _raw_read does.

@smason
Copy link
Contributor Author

smason commented Dec 6, 2023

rebased due to all the bumping in dependencies and test changes

Comment on lines 1029 to 1032
if not data:
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels as though it could raise an IncompleteRead if it finishes and length_remaining indicates there's missing data (as _raw_read does), any preferences on this / other changes

Good point!
Looking at CPython's code, it's possible to get in the situation at least if self._fp is closed between read1 calls. However, you have to respect self.enforce_content_length as _raw_read does.

@illia-v
Copy link
Member

illia-v commented Dec 15, 2023

Since we discovered some problems with http.client.HTTPResponse.read1 while working on this PR, more confidence needs to be gained to make the change to HTTPResponse.stream.

In the meanwhile, I created #3235 and #3236 to fix two blocking issues.

@illia-v illia-v mentioned this pull request Jan 3, 2024
@nateprewitt
Copy link
Member

I wanted to check in on this PR. It looks like the Python landscape and urllib3 have changed a bit since there was last discussion. I'm seeing the previously outstanding concerns as:

Raising IncompleteRead correctly: Handled in this PR
Closing connection when read1 used: #3235
Memory efficiency concerns: #3236

Is there anything else outstanding I'm missing?

@illia-v
Copy link
Member

illia-v commented Feb 13, 2026

@nateprewitt IIRC this was blocked by python/cpython#115998

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants