refactor: check endpoint handler is async only once#2536
Merged
Kludex merged 3 commits intoKludex:masterfrom Apr 20, 2024
Merged
refactor: check endpoint handler is async only once#2536Kludex merged 3 commits intoKludex:masterfrom
Kludex merged 3 commits intoKludex:masterfrom
Conversation
We improve the dispatch in the routing module to only check once whether the handler is async. This gives an improvement of 2.5% (sync), 1.82% (async) in the number of requests/s. The average latency decreased 1.6% (sync) and 1.5% (async).
Note that we had to use a cast in the helper function, as the typeguard does not work for the negative case. In the main branch the code is working without a cast, because the typeguard return type is in practice `AwaitableCAllable[Any]`, which end up swallowing the other types in the union.
Benchmarking
We use a simple json app, with both a sync and async endpoint, and the wrk tool to get the measurements.
The measuerements were done on a Macbook Pro with M1 chip, 16GB of memory and macOS 12.3. The Python version used for the tests is Python 3.12.2, and the uvicorn version is 0.27.1
Before
```
$ wrk http://localhost:8000/sync
Running 10s test @ http://localhost:8000/sync
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 733.77us 55.57us 3.21ms 78.35%
Req/Sec 6.84k 147.06 7.15k 87.13%
137474 requests in 10.10s, 18.35MB read
Requests/sec: 13610.69
Transfer/sec: 1.82MB
$ wrk http://localhost:8000/async
Running 10s test @ http://localhost:8000/async
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 717.14us 49.05us 1.83ms 71.11%
Req/Sec 7.00k 112.14 7.36k 76.24%
140613 requests in 10.10s, 18.77MB read
Requests/sec: 13922.97
Transfer/sec: 1.86MB
````
After
```
$ wrk http://localhost:8000/sync
Running 10s test @ http://localhost:8000/sync
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 721.34us 202.40us 11.13ms 99.32%
Req/Sec 7.01k 230.04 7.62k 94.00%
139558 requests in 10.00s, 18.63MB read
Requests/sec: 13956.14
Transfer/sec: 1.86MB
$ wrk http://localhost:8000/async
Running 10s test @ http://localhost:8000/async
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 706.04us 109.90us 7.46ms 98.30%
Req/Sec 7.12k 136.09 7.39k 90.59%
143188 requests in 10.10s, 19.12MB read
Requests/sec: 14176.95
Transfer/sec: 1.89MB
```
The app used for the test is as follows
```python
from starlette.applications import Starlette
from starlette.responses import JSONResponse
from starlette.routing import Route
import uvicorn
async def async_page(request):
return JSONResponse({'status': 'ok'})
async def sync_page(request):
return JSONResponse({'status': 'ok'})
app = Starlette(routes=[
Route('/async', async_page),
Route('/sync', sync_page),
])
if __name__ == "__main__":
uvicorn.run("app:app", port=8000, log_level="critical")
```
Kludex
requested changes
Mar 16, 2024
Owner
Kludex
left a comment
There was a problem hiding this comment.
I'm not sure if those changes are worth it, but...
Contributor
Author
Could you share why you think the changes are not worth it? Any improvements on removing blocking code would help all applications, as by removing unneeded operations we reduce the amount of blocking code, which affects (tail) latencies. This is a no brainer if the code in question is in fact unneeded because it's doing the same thing over an over again. |
Kludex
approved these changes
Apr 20, 2024
|
Same optimization can be applied here: But is this snippet executed more than once? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
We improve the dispatch in the routing module to only check once whether the handler is async. This gives an improvement of 2.5% (sync), 1.82% (async) in the number of requests/s. The average latency decreased 1.6% (sync) and 1.5% (async).
Note that we had to use a cast in the helper function, as the typeguard does not work for the negative case. In the main branch the code is working without a cast, because the typeguard return type is in practice
AwaitableCallable[Any], which end up swallowing the other types in the union.Benchmarking
We use a simple json app, with both a sync and async endpoint, and the
wrktool to get the measurements. WE also show the profile shown by thesnakeviztool.The measurements were done on a Macbook Pro with M1 chip, 16GB of memory and macOS 12.3. The Python version used for the tests is Python 3.12.2, and the uvicorn version is 0.27.1
Before
After
Test App
The app used for the test is as follows
Checklist
The two points above are not checked as this is just refactoring code, and it does not require any new tests or docs. If required I could add tests for the added helper function
_to_async.