Skip to content

[python] feat: Allow streaming large response bodies#16789

Closed
robertschweizer wants to merge 2 commits intoOpenAPITools:masterfrom
robertschweizer:api-response-improvements
Closed

[python] feat: Allow streaming large response bodies#16789
robertschweizer wants to merge 2 commits intoOpenAPITools:masterfrom
robertschweizer:api-response-improvements

Conversation

@robertschweizer
Copy link
Copy Markdown
Contributor

Closes #16640

I split off a first refactoring commit, the commit messages might be helpful for review

PR checklist

  • Read the contribution guidelines.
  • Pull Request title clearly describes the work in the pull request and Pull Request description provides details about how to validate the work. Missing information here may result in delayed response from the community.
  • Run the following to build the project and update samples:
    ./mvnw clean package 
    ./bin/generate-samples.sh ./bin/configs/*.yaml
    ./bin/utils/export_docs_generators.sh
    
    Commit all changed files.
    This is important, as CI jobs will verify all generator outputs of your HEAD commit as it would merge with master.
    These must match the expectations made by your contribution.
    You may regenerate an individual generator by passing the relevant config(s) as an argument to the script, for example ./bin/generate-samples.sh bin/configs/java*.
    For Windows users, please run the script in Git BASH.
  • File the PR against the correct branch: master (upcoming 7.1.0 minor release - breaking changes with fallbacks), 8.0.x (breaking changes without fallbacks)
  • If your PR is targeting a particular programming language, @mention the technical committee members, so they are more likely to review the pull request.

@wing328 @multani

@robertschweizer
Copy link
Copy Markdown
Contributor Author

This is still missing a test.

Please let me know if you like the approach, then I will add a test.

@robertschweizer robertschweizer changed the title feat: Allow streaming large response bodies [python] feat: Allow streaming large response bodies Oct 11, 2023
Comment on lines 65 to 72
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ApiResponse.raw_data should be None if _preload_content=True, but that would be a (avoidable) breaking change. So I added this ugly code for backwards-compatibility.

We could eventually offer an extra ApiResponse.read() method, which can be properly async.

Should I raise a DeprecationWarning in this code path? Then we should provide an ApiResponse.read() method or tell users to read from the backend objects.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read() method would be needed for the asyncio client, it's not possible to have an async property and "hide" the loading logic into the property code itself.

So yeah, I would say the deprecation makes sense to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will set raw_data to None if _preload_content=False in the aiohttp client (see #16789 (comment)).

For the urllib3 client, I will keep returning self.urllib3_response.data if _preload_content=False, but raise a deprecation warning from now on.

And I'll add the read() method.

On the next major release, we can make raw_data always be None if _preload_content=False, so users have to consciously load data with the new read() method.

@robertschweizer robertschweizer force-pushed the api-response-improvements branch from 91dd0c4 to af0de02 Compare October 11, 2023 10:55
@robertschweizer
Copy link
Copy Markdown
Contributor Author

In the CI, a python-pydantic-v1-aiohttp example is being modified. For me locally, regenerating example clients does not modify python-pydantic-v1-aiohttp.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should pass the "body" parameter from the deserialize function here too, and use it instead of response.data below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a file should not be decoded, but written as it is received from the server. I changed this to write the binary response, before it was opening the file with "wb" mode, but writing the decoded response I believe.

Deserialization should maybe be split into _deserialize and _write_file methods to make this clearer, but maybe in a later PR?

Copy link
Copy Markdown
Contributor Author

@robertschweizer robertschweizer Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "file" responses are still decoded if _preload_content=True, which is probably a separate bug. Decoding might fail on some response bodies, even though writing the file is not a problem.

Copy link
Copy Markdown
Contributor

@multani multani Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If needed, it's probably worth adding more dedicated tests on this specific feature: the parsing of the response and how the files are loaded afterward is a bit intricate and it would best to make sure there are no (involuntary) regressions introduced.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"file" responses are an OpenAPI v2 feature (https://swagger.io/docs/specification/2-0/describing-responses/#response-that-returns-a-file). OpenAPI v3 only has "bytearrays": https://swagger.io/docs/specification/describing-responses/#response-that-returns-a-file

I cannot find any OpenAPI v2 example schema to test this with, but the openapi-generator Github page states that OpenAPI v2 is still supported.

Comment on lines 66 to 69
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should really not do that here: if a client is using asyncio, I think it will expect to have full control over the async requests. Calling run_until_complete here makes the code synchronous which can really affect how asynchronous clients work.

Either load the data before during the caller asynchronous method, or transform the property into a method that would be marked async for the asyncio client.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit I haven't used asyncio myself so far, so I don't have a clue what I'm doing here :)

I added this to provide backwards compatibility for users accessing ApiResponse.raw_data when using _preload_content=False. I just realized that setting _preload_content=False is not working for the aiohttp generator on current master, so I guess there is no backwards-compatibility needed here.

I will remove this code.

Comment on lines 65 to 72
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read() method would be needed for the asyncio client, it's not possible to have an async property and "hide" the loading logic into the property code itself.

So yeah, I would say the deprecation makes sense to me.

* Stop modifying RESTResponse instances to clarify data flow
* Improve type annotations
* Clarify that _preload_content must be True if _return_http_data_only is True
@robertschweizer robertschweizer force-pushed the api-response-improvements branch 2 times, most recently from c297106 to b0c011e Compare October 13, 2023 09:37
@robertschweizer robertschweizer force-pushed the api-response-improvements branch from b0c011e to 7b056db Compare October 13, 2023 09:59
@robertschweizer
Copy link
Copy Markdown
Contributor Author

This is still missing a test.

Please let me know if you like the approach, then I will add a test.

I added tests for the new response streaming support.

@robertschweizer robertschweizer force-pushed the api-response-improvements branch from 7b056db to 36fafe1 Compare October 13, 2023 10:07
@wing328
Copy link
Copy Markdown
Member

wing328 commented Oct 15, 2023

FYI. The java client (okhttp-gson) supports streaming response via x-streaming: true extension in the operation together with the supportStreaming option enabled (ideally this option could be eliminated when x-streaming is found in the openapi spec)

@wing328
Copy link
Copy Markdown
Member

wing328 commented Oct 15, 2023

FYI. We also have a Slack channel for python related discussion: https://openapi-generator.slack.com/archives/C05TAA87CCX/p1697337580770109

Please join us to discuss upcoming breaking changes to the Python client generator

@robertschweizer
Copy link
Copy Markdown
Contributor Author

Currently, this PR does not introduce any breaking changes. The failing CI is probably doing something wrong. I think it can be merged as-is if the CI is fixed.

I'm not sure if that makes sense though, because most of the code touched here is also severely changed in #16802. If no release is tagged before merging #16802, it probably doesn't make sense to merge this PR here at all.

@robertschweizer
Copy link
Copy Markdown
Contributor Author

Fixed in #16802

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[REQ][python] Allow streaming large response bodies

3 participants