Fix infinite callback loop when time is not moving forward by bmerry · Pull Request #10151 · aio-libs/aiohttp

bmerry · 2024-12-10T07:52:31Z

What do these changes do?

If the keepalive handler is called too soon, it reschedules itself. The test used now <= close_time, which means that an exactly on-time notification is treated as "too soon", causing an automatic rescheduling. For real systems the time will eventually advance and break the loop, but with async-solipsism, time doesn't advance until there is some reason to sleep and the loop is infinite.

Are there changes in behavior for the user?

This will fix infinite loops when using async-solipsism.

Is it a substantial burden for the maintainers to support this?

No. This does not increase the amount of code at all.

Related issue number

Fixes #10149.

Checklist

I think the code is well written
Unit tests for the changes exist (I am assuming that the keepalive functionality has existing test coverage)
Documentation reflects the changes (no documentation change needed)
If you provide code modification, please add yourself to CONTRIBUTORS.txt (already there)
- The format is <Name> <Surname>.
- Please keep alphabetical order, the file is sorted by names.
Add a new news fragment into the CHANGES/ folder
- name it <issue_or_pr_num>.<type>.rst (e.g. 588.bugfix.rst)
- if you don't have an issue number, change it to the pull request
  number after creating the PR
  - .bugfix: A bug fix for something the maintainers deemed an
    improper undesired behavior that got corrected to match
    pre-agreed expectations.
  - .feature: A new behavior, public APIs. That sort of stuff.
  - .deprecation: A declaration of future API removals and breaking
    changes in behavior.
  - .breaking: When something public is removed in a breaking way.
    Could be deprecated in an earlier release.
  - .doc: Notable updates to the documentation structure or build
    process.
  - .packaging: Notes for downstreams about unobvious side effects
    and tooling. Changes in the test invocation considerations and
    runtime assumptions.
  - .contrib: Stuff that affects the contributor experience. e.g.
    Running tests, building the docs, setting up the development
    environment.
  - .misc: Changes that are hard to assign to any of the above
    categories.
- Make sure to use full sentences with correct case and punctuation,
  for example:
```
Fixed issue with non-ascii contents in doctest text files
-- by :user:`contributor-gh-handle`.
```
  Use the past tense or the present tense a non-imperative mood,
  referring to what's changed compared to the last released version
  of this project.

If the keepalive handler is called too soon, it reschedules itself. The test used `now <= close_time`, which means that an exactly on-time notification is treated as "too soon", causing an automatic rescheduling. For real systems the time will eventually advance and break the loop, but with async-solipsism, time doesn't advance until there is some reason to sleep and the loop is infinite. Closes aio-libs#10149.

codecov · 2024-12-10T07:57:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.75%. Comparing base (6200513) to head (03f7a3c).
Report is 365 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #10151   +/-   ##
=======================================
  Coverage   98.75%   98.75%           
=======================================
  Files         122      122           
  Lines       36954    36997   +43     
  Branches     4411     4413    +2     
=======================================
+ Hits        36494    36538   +44     
  Misses        313      313           
+ Partials      147      146    -1

Flag	Coverage Δ
CI-GHA	`98.64% <100.00%> (+<0.01%)`	⬆️
OS-Linux	`98.33% <95.45%> (+<0.01%)`	⬆️
OS-Windows	`96.18% <100.00%> (+<0.01%)`	⬆️
OS-macOS	`97.44% <95.45%> (+<0.01%)`	⬆️
Py-3.10.11	`97.28% <100.00%> (+<0.01%)`	⬆️
Py-3.10.15	`97.82% <95.45%> (-0.05%)`	⬇️
Py-3.11.10	`?`
Py-3.11.11	`97.85% <95.45%> (?)`
Py-3.11.9	`97.33% <100.00%> (-0.01%)`	⬇️
Py-3.12.7	`?`
Py-3.12.8	`98.38% <100.00%> (?)`
Py-3.13.0	`?`
Py-3.13.1	`98.38% <95.45%> (+0.02%)`	⬆️
Py-3.9.13	`97.20% <100.00%> (+<0.01%)`	⬆️
Py-3.9.20	`97.78% <95.45%> (+0.04%)`	⬆️
Py-pypy7.3.16	`97.35% <95.45%> (+<0.01%)`	⬆️
VM-macos	`97.44% <95.45%> (+<0.01%)`	⬆️
VM-ubuntu	`98.33% <95.45%> (+<0.01%)`	⬆️
VM-windows	`96.18% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codspeed-hq · 2024-12-10T07:58:13Z

CodSpeed Performance Report

Merging #10151 will not alter performance

_{Comparing bmerry:fix-repeated-keepalive (03f7a3c) with master (0070045)}

Summary

✅ 47 untouched benchmarks

bdraco · 2024-12-10T08:51:14Z

We need to make this very clear it's not a production problem so we don't end up with unexpected questions or discussions.

Additionally we haven't committed to supporting async-solipsism at this time so I changed the title to better reflect that.

bdraco · 2024-12-10T08:54:24Z

I think we should change the fragment to packaging instead of bugfix since it's for testing and doesn't affect production

Dreamsorcerer · 2024-12-10T11:06:44Z

Is there any chance of a regression test without adding a new dependency?

bdraco · 2024-12-10T11:08:18Z

I think a test could be added using freezegun

bmerry · 2024-12-10T13:48:27Z

I think we should change the fragment to packaging instead of bugfix since it's for testing and doesn't affect production

Sure, I'll do that. But are you sure aiohttp isn't used in any environment in which the timer precision might be low enough that it causes the event to be re-scheduled unnecessarily (which won't be an infinite loop, but will cause more work than necessary)? I'm thinking of environments that deliberately degrade timer precision to prevent Spectra-type side-channel attacks.

I think a test could be added using freezegun

From what I've seen, freezegun only mocks queries of time, but doesn't provide any mechanisms to fake time passing in select/epoll and similar functions. I don't know how easy it'll be to mock that reliably without falling down a rabbit-hole and re-inventing async-solipsism. It's certainly not something I'll have time to do before I go on leave until January.

I've never tested async-solipsism on non-Linux OSes, so it might not be something you want the test suite to depend on at this point.

bdraco · 2024-12-10T13:51:13Z

I think we should change the fragment to packaging instead of bugfix since it's for testing and doesn't affect production

Sure, I'll do that. But are you sure aiohttp isn't used in any environment in which the timer precision might be low enough that it causes the event to be re-scheduled unnecessarily (which won't be an infinite loop, but will cause more work than necessary)? I'm thinking of environments that deliberately degrade timer precision to prevent Spectra-type side-channel attacks.

That seems unlikely as we haven't had any issue reports, and even CLOCK_MONOTONIC_COARSE should tick forward enough that it would not be a problem.

I think a test could be added using freezegun

From what I've seen, freezegun only mocks queries of time, but doesn't provide any mechanisms to fake time passing in select/epoll and similar functions. I don't know how easy it'll be to mock that reliably without falling down a rabbit-hole and re-inventing async-solipsism. It's certainly not something I'll have time to do before I go on leave until January.

I've never tested async-solipsism on non-Linux OSes, so it might not be something you want the test suite to depend on at this point.

We don't have any maintainers familiar with it either so its not something we could effectively troubleshoot so we wouldn't want to depend on it right now.

@bdraco

At request of @bdraco

bdraco · 2024-12-10T13:58:16Z

From what I've seen, freezegun only mocks queries of time, but doesn't provide any mechanisms to fake time passing in select/epoll and similar functions. I don't know how easy it'll be to mock that reliably without falling down a rabbit-hole and re-inventing async-solipsism. It's certainly not something I'll have time to do before I go on leave until January.

I think we only need to patch loop.time for this case

webknjaz · 2024-12-10T17:03:19Z

I think we should change the fragment to packaging

I'm not so sure — it doesn't touch any packaging metadata / mechanisms / downstream expectations. This would be misleading, in my opinion. I suppose, we could go for misc instead. If not, then it might be contrib.

But yeah, a regression test would be most welcome here...

bdraco · 2024-12-10T22:49:40Z

misc works for me

bmerry · 2024-12-11T06:38:43Z

Ok, renamed to misc. I might know how to write a unit test using only freezegun, but I'll probably only have time to experiment with it next year.

bdraco · 2024-12-14T02:04:50Z

I added a test to ensure the keep alive expires on time. I don't love it, but since RequestHandler isn't patchable, I couldn't come up with something that didn't access the internals.

patchback · 2024-12-17T18:16:34Z

Backport to 3.11: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.11/7c12b1a9c8b2a9e33fb559229a4c4695de39f08c/pr-10151

Backported as #10173

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Co-authored-by: J. Nick Koston <[email protected]> (cherry picked from commit 7c12b1a)

patchback · 2024-12-17T18:16:45Z

Backport to 3.12: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.12/7c12b1a9c8b2a9e33fb559229a4c4695de39f08c/pr-10151

Backported as #10174

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Co-authored-by: J. Nick Koston <[email protected]> (cherry picked from commit 7c12b1a)

…ime is not moving forward (#10173) Co-authored-by: Bruce Merry <[email protected]> Fixes #123'). --> Fixes #10149.

…ime is not moving forward (#10174) Co-authored-by: Bruce Merry <[email protected]> Fixes #123'). --> Fixes #10149.

sternenseemann · 2025-07-29T16:55:39Z

Hi, I have a single machine (x86_64-linux) where test_keepalive_expires_on_time which has been added here consistently fails. All other machines I have tried so far passed the test successfully.

Any ideas what could cause this? Given that everything timing related should be mocked and the build is done via a Nix derivation which produces a consistent environment with the same precise dependency versions across multiple machines, I'm a bit at a loss to explain the failure.

=================================== FAILURES ===================================
____________________ test_keepalive_expires_on_time[pyloop] ____________________
[gw1] linux -- Python 3.13.5 /nix/store/djck7mx6jad1w0yy6zings96dyxanls6-python3-3.13.5/bin/python3.13

aiohttp_client = <function aiohttp_client.<locals>.go at 0x7fffc05bd300>

    async def test_keepalive_expires_on_time(aiohttp_client: AiohttpClient) -> None:
        """Test that the keepalive handle expires on time."""
    
        async def handler(request: web.Request) -> web.Response:
            body = await request.read()
            assert b"" == body
            return web.Response(body=b"OK")
    
        app = web.Application()
        app.router.add_route("GET", "/", handler)
    
        connector = aiohttp.TCPConnector(limit=1)
        client = await aiohttp_client(app, connector=connector)
    
        loop = asyncio.get_running_loop()
        now = loop.time()
    
        # Patch loop time so we can control when the keepalive timeout is processed
        with mock.patch.object(loop, "time") as loop_time_mock:
            loop_time_mock.return_value = now
            resp1 = await client.get("/")
            await resp1.read()
            request_handler = client.server.handler.connections[0]
    
            # Ensure the keep alive handle is set
            assert request_handler._keepalive_handle is not None
    
            # Set the loop time to exactly the keepalive timeout
            loop_time_mock.return_value = request_handler._next_keepalive_close_time
    
            # sleep twice to ensure the keep alive timeout is processed
            await asyncio.sleep(0)
            await asyncio.sleep(0)
    
            # Ensure the keep alive handle expires
>           assert request_handler._keepalive_handle is None
E           assert <TimerHandle when=22137692.994616266 RequestHandler._process_keepalive()> is None
E            +  where <TimerHandle when=22137692.994616266 RequestHandler._process_keepalive()> = <RequestHandler connected>._keepalive_handle

aiohttp_client = <function aiohttp_client.<locals>.go at 0x7fffc05bd300>
app        = <Application 0x7fffee213e30>
client     = <aiohttp.test_utils.TestClient object at 0x7fffc0562d50>
connector  = <aiohttp.connector.TCPConnector object at 0x7fffede76d50>
handler    = <function test_keepalive_expires_on_time.<locals>.handler at 0x7fffc05bc180>
loop       = <_UnixSelectorEventLoop running=False closed=False debug=False>
loop_time_mock = <MagicMock name='time' id='140737188871344'>
now        = 22134062.994616266
request_handler = <RequestHandler connected>
resp1      = <ClientResponse(http://127.0.0.1:41051/) [200 OK]>
<CIMultiDictProxy('Content-Length': '2', 'Content-Type': 'application/octet-stream', 'Date': 'Tue, 29 Jul 2025 16:48:28 GMT', 'Server': 'Python/3.13 aiohttp/3.12.13')>


tests/test_web_functional.py:2378: AssertionError

Not sure what to add without some ideas. As said, I can't really reproduce the failure elsewhere myself so far…

Dreamsorcerer · 2025-07-29T17:22:56Z

No idea, you'd have to do some debugging.

bmerry · 2025-07-29T19:56:12Z

Hi, I have a single machine (x86_64-linux) where test_keepalive_expires_on_time which has been added here consistently fails. All other machines I have tried so far passed the test successfully.

Any ideas what could cause this? Given that everything timing related should be mocked

I haven't touched aiohttp in quite a while, but does this test use real TCP sockets? If so, timing isn't going to be deterministic: even with localhost connections, there is a delay between userspace sending data into one end of the socket and the kernel signalling that it's ready on the other end.

Dreamsorcerer · 2025-07-30T12:04:50Z

The code waits for the response to be read, so the socket should already be idle. It's then just yielding to the loop twice in order to trigger the timeout code which should reset it. So, I wouldn't expect there to be a timing dependency at a glance..

CaptainFlint · 2026-02-25T01:59:19Z

I'm trying to build the aiohttp package, and having exactly the same issue as @sternenseemann . I tried debugging and found out that RequestHandler._process_keepalive() is not called at all when running this test.

If I modify the test and shift the patched loop.time return value even a microsecond further:
loop_time_mock.return_value = request_handler._next_keepalive_close_time + 0.001
then _process_keepalive() is called, verifies that the (fake) current time has passed beyond the keepalive timeout, and the keepalive handler is nullified. So the test passes successfully. But when the patched return value is exactly _next_keepalive_close_time, the _process_keepalive() is simply not called at all, and therefore the keepalive handler remains active, which causes the test to fail.

I can't understand why this is happening. From what I've seen in the code, _keepalive_handle is initialized at the end of the RequestHandler.start() method, and is set up as a timer to run _process_keepalive() at the time _next_keepalive_close_time = loop.time() + keepalive_timeout. And this is where I got lost, since I have no experience with asyncio, let alone its internal clockwork. The test clearly expects the _process_keepalive() to be called during one of the asyncio.sleep(0) calls, but I can't tell why it doesn't happen.

The only possible reason I could think of, is that when the test was written, asyncio performed the timer handler calls at exactly the specified time. But later python changed this behavior, and the timer is now called only when the timer exceeds the specified time, but not when it's exactly equal. But I don't know how to verify this hypothesis. Can you suggest any ideas I could try?

I'm using python 3.12.9 on a RHEL10-derived OS.

Dreamsorcerer · 2026-02-25T11:31:04Z

loop_time_mock.return_value = request_handler._next_keepalive_close_time + 0.001

I think this is fine if that gets the test working for everyone.

But later python changed this behavior

That's clearly not possible as we are running every commit on Python 3.10-3.14 on Window, Mac and Ubuntu, and the test is still passing on all of them (plus PyPy as well). Maybe RHEL has some patches to Python that affect the behaviour (previous poster did not mention which distro they used).

CaptainFlint · 2026-02-25T14:52:12Z

I think this is fine if that gets the test working for everyone.

But that would be completely against the original idea of the test: checking how the library behaves when the time is exactly identical. Would it not? That was what the pull request changed: now <= close_time to now < close_time.

we are running every commit on Python 3.10-3.14 on Window, Mac and Ubuntu, and the test is still passing on all of them

Hm... Thanks for the info, that's really puzzling, but useful to know. I actually wanted to ask what the target python version was, but completely forgot.
OK, I can try building vanilla python and/or trying another distro, then.

Dreamsorcerer · 2026-02-25T16:49:06Z

But that would be completely against the original idea of the test: checking how the library behaves when the time is exactly identical. Would it not? That was what the pull request changed: now <= close_time to now < close_time.

Oh yes, you're right, that is exactly what it's trying to test.

You said the function wasn't getting called at all, which is down to the asyncio loop implementation. I guess it's really an implementation detail as to whether the loop calls the function at the recorded time, or immediately after...

So, actually looking through the implementation, it actually breaks and doesn't run the callback if the time is == end_time:
https://github.com/python/cpython/blob/80b2b8833830194fd18cf4e111c5d6d8a3e27b30/Lib/asyncio/base_events.py#L2017
That has clock_resolution added to the current time, so I guess your problem is that time.get_clock_info('monotonic').resolution returns 0 on your machine?
https://github.com/python/cpython/blob/80b2b8833830194fd18cf4e111c5d6d8a3e27b30/Lib/asyncio/base_events.py#L430

Dreamsorcerer · 2026-02-25T16:53:31Z

That has clock_resolution added to the current time, so I guess your problem is that time.get_clock_info('monotonic').resolution returns 0 on your machine?

Or possibly such a small number that it gets lost due to floating point precision during the addition (for reference, my machine returns 1e-09).

This suggests the original issue reported with async-solipsism isn't actually resolved for some machines. I'd suggest trying to get a similar change into asyncio to remove the = in that check. It doesn't make sense to me that it doesn't run a callback immediately when it's already the time it's been asked to run it at..

sternenseemann · 2026-02-25T17:04:24Z

My two machines report 1e-09 as well from Python 3.13.12 (from Nixpkgs). On one of them the test suite succeeds, on one it fails (with the same dependencies, aiohttp version etc.). So doesn't seem to be the resolution.

bmerry · 2026-02-25T17:32:29Z

My two machines report 1e-09 as well from Python 3.13.12 (from Nixpkgs). On one of them the test suite succeeds, on one it fails (with the same dependencies, aiohttp version etc.). So doesn't seem to be the resolution.

I haven't followed the entire context of the discussion, but on my machine it looks like time.monotonic() counts from boot. If the uptime is more than 194 days (2^24 seconds), then adding 1e-9 to a time.monotonic() value will not change the value at all due to float rounding.

CaptainFlint · 2026-02-25T18:44:00Z

That has clock_resolution added to the current time, so I guess your problem is that time.get_clock_info('monotonic').resolution returns 0 on your machine?

No, just like for the others who just posted above, it returns 1e-09 on the machine where I perform the build.

Dreamsorcerer · 2026-02-25T18:49:24Z

If the uptime is more than 194 days (2^24 seconds), then adding 1e-9 to a time.monotonic() value will not change the value at all due to float rounding.

It's only using the resolution value and then adding to loop.time(). So unless someone is taking 194 days to execute the tests...

No, just like for the others who just posted above, it returns 1e-09 on the machine where I perform the build.

Hmm, then I have no idea. Try hacking those asyncio lines and check what the values are, and whether removing the = works or not.

CaptainFlint · 2026-02-25T19:08:11Z

OK, that is interesting. I don't know why the event loop timer has this value, and what time it's expected to be, but when I run the test, loop.time() returns the current uptime of my machine. Right now this is over 260 days, so the returned value from the last try was 22538602.881994113. And when I added the clock tick to the current value, it didn't change. So, yes, it is indeed the floating point resolution issue.

Now the big question is, why the loop timer is using the machine's uptime, whether it is interntional or not.

P. S. I can see in the code that loop.time() just returns time.monotonic(), but I can't see in the python documentation any restrictions on the possible values that this particular timer should return, except that, obviously, it's expected to be monotonic.

bmerry · 2026-02-25T19:11:46Z

Now the big question is, why the loop timer is using the machine's uptime, whether it is interntional or not.

loop.time() ultimately uses the clock_gettime system call with CLOCK_MONOTONIC (or something equivalent), for which the man page says

A nonsettable system-wide clock that represents monotonic time
since—as described by POSIX—"some unspecified point in the
past". On Linux, that point corresponds to the number of seconds
that the system has been running since it was booted.

CaptainFlint · 2026-02-25T19:14:10Z

So, basically, the test depends on how long the machine was running since last reboot...

Dreamsorcerer · 2026-02-25T19:33:00Z

Oh, for some reason I thought loop.time() counted from when the loop started running.

My suggestion to try and get the = removed from asyncio is still valid then and seems like the best approach.

sternenseemann · 2026-02-25T20:04:51Z

I haven't followed the entire context of the discussion, but on my machine it looks like time.monotonic() counts from boot. If the uptime is more than 194 days (2^24 seconds), then adding 1e-9 to a time.monotonic() value will not change the value at all due to float rounding.

That would fit. It is succeeding on the machine with 12 days of uptime and failing on the machine with 467 days of uptime.

CaptainFlint · 2026-02-25T21:05:10Z

To be honest, I have no idea why they even implemented the comparison like this. I found a commit when it was done, and originally they did have just handle._when > self.time(), but then replaced it with a "greater-or-equal" comparison with the "next tick".

Here is the commit, made back in 2014 (Python 3.4):
python/cpython@ed1654f
The issue link is broken since it refers to the old pre-GitHub system, the actual link is this: https://bugs.python.org/issue20505

In fact, after I looked through git blame, that part has been done and redone multiple times. Initially there was no extra shift, then it was added, then removed, then added in a slightly different way, then removed again… And unfortunately, the discussions about the reasons of these changes are a bit above my level of knowledge. But since it was discussed and redesigned multiple times, I think they know what they were doing, and there probably are serious reasons to keep the asyncio behavior the way it's implemented now.

However, I concocted a dirty fix for this aiohttp test to overcome the floating point resolution issue:

diff --git a/tests/test_web_functional.py b/tests/test_web_functional.py
index e49798513..bbb0d8b7c 100644
--- a/tests/test_web_functional.py
+++ b/tests/test_web_functional.py
@@ -2353,6 +2353,11 @@ async def test_keepalive_expires_on_time(aiohttp_client: AiohttpClient) -> None:
         # Ensure the keep alive handle is set
         assert request_handler._keepalive_handle is not None
 
+        # Adjust clock resolution to make sure asyncio loop triggers the event
+        old_clock_resolution = request_handler._loop._clock_resolution
+        while request_handler._next_keepalive_close_time + request_handler._loop._clock_resolution == request_handler._next_keepalive_close_time:
+            request_handler._loop._clock_resolution *= 2
+
         # Set the loop time to exactly the keepalive timeout
         loop_time_mock.return_value = request_handler._next_keepalive_close_time
 
@@ -2360,5 +2365,8 @@ async def test_keepalive_expires_on_time(aiohttp_client: AiohttpClient) -> None:
         await asyncio.sleep(0)
         await asyncio.sleep(0)
 
+        # Restore the original resolution
+        request_handler._loop._clock_resolution = old_clock_resolution
+
         # Ensure the keep alive handle expires
         assert request_handler._keepalive_handle is None

Can you think of any possible issues or side effects of this approach?

Actually, maybe asyncio itself could adopt some similar fix, but I'm not ready to go to the Python developers with such a suggestion yet…

CaptainFlint · 2026-02-26T01:37:25Z

P. S. I found the relevant suggestion for asyncio fix in Python's GitHub:
python/cpython#116342
But it has been declined as unlikely to occur in real life scenarios. Apparently, high-uptime servers are not considered realistic scenarios anymore. There goes that idea...

Dreamsorcerer · 2026-02-26T20:21:17Z

P. S. I found the relevant suggestion for asyncio fix in Python's GitHub: python/cpython#116342 But it has been declined as unlikely to occur in real life scenarios. Apparently, high-uptime servers are not considered realistic scenarios anymore. There goes that idea...

They don't appear to have done the maths in that issue. They are saying that it'd likely become an issue at some point if it was epoch time. They don't seem to realise that it's about 6 months before it breaks...

CaptainFlint · 2026-02-27T14:47:33Z

I guess it's worth trying to create another issue for them, where the exact details will be explained.

The only thing is: can you suggest any real life scenarios that are affected by this asyncio's issue? Right now I have the impression that the only effect is, that with high uptime the scheduled timers will be called just a little bit later than they could have been (one extra "tick" will be skipped). But timers never guaranteed exact time of the calls. I don't think asyncio is designed for RTOS applications.

We only have the failure because the test verifies very specific conditions. I'm not sure that the Python developers would consider changing the standard library just because "one third-party module in combination with another third-party module used to fail, and now one of the tests rely on something it should have never relied upon, and fails, so please modify asyncio's behavior that has been there for 8 years and did not bother anybody"...

Dreamsorcerer · 2026-02-27T16:02:39Z

Yeah, it's probably only a minor efficiency loss on long running machines. But, I don't see any advantage to the alternative of not running tasks immediately when they are literally scheduled to run now.

Dreamsorcerer · 2026-02-27T16:05:02Z

I mean the reason to add the clock resolution is surely to help callbacks get run before their scheduled time passes, so it really makes no sense to have some things not run at the scheduled time, when you're literally trying to make them run slightly early in most cases.

CaptainFlint · 2026-03-14T16:25:06Z

Good news! I've sent the fix suggestion to the Python development team, and they have accepted it. I backported the patch to our version of Python 3.12, and with it applied the aiohttp tests all pass successfully.

For reference, here is the link to the accepted pull request: python/cpython#145706

bmerry added 2 commits December 9, 2024 17:03

Add changelog entry for aio-libs#10149

ec911ed

bmerry requested review from asvetlov and webknjaz as code owners December 10, 2024 07:52

psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Dec 10, 2024

bdraco changed the title ~~Fix infinite callback loop when used with async-solipsism~~ Fix infinite callback loop when time is not moving forward Dec 10, 2024

bdraco added backport-3.11 labels Dec 10, 2024

Reclassify aio-libs#10149 as "packaging" in CHANGES

0783b4d

At request of @bdraco

bdraco approved these changes Dec 10, 2024

View reviewed changes

Rename 10149.packaging -> 10149.misc

c8e502b

Add test to ensure keep alive expires on time

03f7a3c

bdraco merged commit 7c12b1a into aio-libs:master Dec 17, 2024

patchback bot pushed a commit that referenced this pull request Dec 17, 2024

Fix infinite callback loop when time is not moving forward (#10151)

d226c05

Co-authored-by: J. Nick Koston <[email protected]> (cherry picked from commit 7c12b1a)

patchback bot mentioned this pull request Dec 17, 2024

[PR #10151/7c12b1a9 backport][3.11] Fix infinite callback loop when time is not moving forward #10173

Merged

5 tasks

patchback bot pushed a commit that referenced this pull request Dec 17, 2024

Fix infinite callback loop when time is not moving forward (#10151)

49f05c0

Co-authored-by: J. Nick Koston <[email protected]> (cherry picked from commit 7c12b1a)

patchback bot mentioned this pull request Dec 17, 2024

[PR #10151/7c12b1a9 backport][3.12] Fix infinite callback loop when time is not moving forward #10174

Merged

5 tasks

bdraco pushed a commit that referenced this pull request Dec 17, 2024

[PR #10151/7c12b1a9 backport][3.11] Fix infinite callback loop when t…

c80be67

…ime is not moving forward (#10173) Co-authored-by: Bruce Merry <[email protected]> Fixes #123'). --> Fixes #10149.

bdraco pushed a commit that referenced this pull request Dec 17, 2024

[PR #10151/7c12b1a9 backport][3.12] Fix infinite callback loop when t…

5afac55

…ime is not moving forward (#10174) Co-authored-by: Bruce Merry <[email protected]> Fixes #123'). --> Fixes #10149.

Uh oh!

Conversation

bmerry commented Dec 10, 2024

What do these changes do?

Are there changes in behavior for the user?

Is it a substantial burden for the maintainers to support this?

Related issue number

Checklist

Uh oh!

codecov bot commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #10151 will not alter performance

Summary

Uh oh!

bdraco commented Dec 10, 2024

Uh oh!

bdraco commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dreamsorcerer commented Dec 10, 2024

Uh oh!

bdraco commented Dec 10, 2024

Uh oh!

bmerry commented Dec 10, 2024

Uh oh!

bdraco commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdraco commented Dec 10, 2024

Uh oh!

webknjaz commented Dec 10, 2024

Uh oh!

bdraco commented Dec 10, 2024

Uh oh!

bmerry commented Dec 11, 2024

Uh oh!

bdraco commented Dec 14, 2024

Uh oh!

patchback bot commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backport to 3.11: 💚 backport PR created

Uh oh!

patchback bot commented Dec 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backport to 3.12: 💚 backport PR created

Uh oh!

sternenseemann commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dreamsorcerer commented Jul 29, 2025

Uh oh!

bmerry commented Jul 29, 2025

Uh oh!

Dreamsorcerer commented Jul 30, 2025

Uh oh!

CaptainFlint commented Feb 25, 2026

Uh oh!

Dreamsorcerer commented Feb 25, 2026

Uh oh!

CaptainFlint commented Feb 25, 2026

Uh oh!

Dreamsorcerer commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dreamsorcerer commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sternenseemann commented Feb 25, 2026

Uh oh!

bmerry commented Feb 25, 2026

Uh oh!

CaptainFlint commented Feb 25, 2026

Uh oh!

Dreamsorcerer commented Feb 25, 2026

Uh oh!

codecov bot commented Dec 10, 2024 •

edited

Loading

codspeed-hq bot commented Dec 10, 2024 •

edited

Loading

bdraco commented Dec 10, 2024 •

edited

Loading

bdraco commented Dec 10, 2024 •

edited

Loading

patchback bot commented Dec 17, 2024 •

edited

Loading

patchback bot commented Dec 17, 2024 •

edited

Loading

sternenseemann commented Jul 29, 2025 •

edited

Loading

Dreamsorcerer commented Feb 25, 2026 •

edited

Loading

Dreamsorcerer commented Feb 25, 2026 •

edited

Loading

CaptainFlint commented Feb 25, 2026 •

edited

Loading

CaptainFlint commented Feb 25, 2026 •

edited

Loading

CaptainFlint commented Feb 26, 2026 •

edited

Loading

CaptainFlint commented Mar 14, 2026 •

edited

Loading