[WIP] Implement file descriptor removal and waiting in asyncio hub by 4383 · Pull Request #924 · eventlet/eventlet

4383 · 2024-03-01T11:22:30Z

We recently observed (#912) difference in the behaviors between the asyncio hub and the other eventlet hubs.

The asyncio hub behaved differenly because bad file descriptor were not properly removed from the list of readers and writers listeners.

To clean these bad file descriptors we have to implement an awaitable wait method specific to the asyncio hub

This patch should fix #912.

I prefer to propose this patch as a separated patch to isolate topics and simplify reviews.

#912

codecov · 2024-03-01T11:24:26Z

Codecov Report

Attention: Patch coverage is 19.56522% with 37 lines in your changes are missing coverage. Please review.

Project coverage is 54%. Comparing base (0a130e2) to head (b13a262).

Files	Patch %	Lines
eventlet/hubs/asyncio.py	19%	37 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff           @@
##           master   #924    +/-   ##
======================================
- Coverage      56%    54%    -2%     
======================================
  Files          89     89            
  Lines        9728   9773    +45     
  Branches     1812   1822    +10     
======================================
- Hits         5461   5347   -114     
- Misses       3892   4054   +162     
+ Partials      375    372     -3

Flag	Coverage Δ
ipv6	`23% <19%> (-1%)`	⬇️
py310asyncio	`?`
py310epolls	`53% <19%> (-1%)`	⬇️
py310poll	`53% <19%> (-1%)`	⬇️
py310selects	`53% <19%> (-1%)`	⬇️
py311asyncio	`?`
py311epolls	`53% <19%> (-1%)`	⬇️
py312asyncio	`?`
py312epolls	`51% <19%> (-1%)`	⬇️
py37asyncio	`?`
py37epolls	`51% <19%> (-1%)`	⬇️
py38asyncio	`?`
py38epolls	`53% <19%> (-1%)`	⬇️
py38openssl	`51% <19%> (-1%)`	⬇️
py38poll	`53% <19%> (-1%)`	⬇️
py38selects	`53% <19%> (-1%)`	⬇️
py39asyncio	`?`
py39dnspython1	`51% <19%> (-1%)`	⬇️
py39epolls	`53% <19%> (-1%)`	⬇️
py39poll	`53% <19%> (-1%)`	⬇️
py39selects	`52% <19%> (-1%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

We recently observed (eventlet#912) difference in the behaviors between the asyncio hub and the other eventlet hubs. The asyncio hub behaved differenly because bad file descriptor were not properly removed from the list of readers and writers listeners. To clean these bad file descriptors we have to implement an awaitable wait method specific to the asyncio hub This patch should fix eventlet#912. I prefer to propose this patch as a separated patch to isolate topics and simplify reviews. eventlet#912

4383 · 2024-03-01T13:50:37Z

I need to address the job errors. Will have a look soon.
Do not hesitate to share your feeling/suggestions.

4383 · 2024-03-04T09:52:40Z

The test in error (test_cancelling_future_kills_greenthread) only fails if it is launched between other tests.

If I run this (failing) test in isolated way, it is successful each time I run it...

.tox/py312-asyncio/bin/py.test \
tests/asyncio_test.py::test_cancelling_future_kills_greenthread
============================================ test session starts ======================================================
platform linux -- Python 3.12.2, pytest-8.0.2, pluggy-1.4.0
rootdir: /home/dev/app
plugins: cov-4.1.0
collected 1 item                                                                                                                                                                                                                              

tests/asyncio_test.py .                                                                                                                                                                                                                 [100%]

================================ 1 passed in 0.21s ==========================================

4383 · 2024-03-04T16:08:11Z

Again I observes two different behaviors depending on if I run the test (`test_cancelling_future_kills_greenthread``) alone [1] or not [2].

If I run the problematic test among other tests, the test do not seems to run the go inner method, and, so the CancelledError thrown by the greenthread is not properly catched up, and then the test is broken...


  File "/home/dev/app/eventlet/event.py", line 173, in _do_send
    waiter.switch(result)
  File "/home/dev/app/eventlet/greenthread.py", line 267, in main
    self._resolve_links()
  File "/home/dev/app/eventlet/greenthread.py", line 283, in _resolve_links
    f(self, *ca, **ckw)
  File "/home/dev/app/eventlet/asyncio.py", line 42, in _got_result
    gthread.wait()
  File "/home/dev/app/eventlet/greenthread.py", line 224, in wait
    return self._exit_event.wait()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/app/eventlet/event.py", line 131, in wait
    current.throw(*self._exc)
  File "/home/dev/app/eventlet/greenthread.py", line 264, in main
    result = function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/app/eventlet/asyncio.py", line 54, in _run
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/dev/app/tests/asyncio_test.py", line 236, in go
    async def go():
asyncio.exceptions.CancelledError

eventlet/tests/asyncio_test.py

Lines 236 to 242 in 0a130e2

    
           async def go(): 
        
               try: 
        
                   await gthread 
        
               except asyncio.CancelledError: 
        
                   return "good" 
        
               else: 
        
                   return "bad"

eventlet/tests/asyncio_test.py

Lines 227 to 232 in 0a130e2

    
           def green(): 
        
               phases.append(1) 
        
               future.cancel() 
        
               eventlet.sleep(1) 
        
               # This should never be reached: 
        
               phases.append(2)

This strange behavior seems introduced by the changes I proposed in this pull request.

If I run the same test independently of the other tests, then this test is successful... See my previous comment.

[1] .tox/py312-asyncio/bin/py.test tests/asyncio_test.py::test_cancelling_future_kills_greenthread
[2] .tox/py312-asyncio/bin/py.test tests/asyncio_test.py

4383 · 2024-03-05T17:45:46Z

Here is a reproducer by entering a PDB post mortem on the broken test.
We can observe that the go task have been canceled. However we can also observe that a task remains pending. The pending task is related to the clock used by hubs (default to monotonic).

https://github.com/eventlet/eventlet/blob/0a130e2077ae7f5fd4f250b0bf8a9037000fc6bd/eventlet/hubs/hub.py#L124C12-L126

And which is called in the asyncio hub module:

eventlet/eventlet/hubs/asyncio.py

Line 141 in b13a262

self.fire_timers(self.clock())

$ .tox/py312-asyncio/bin/py.test --pdb  tests/asyncio_test.py
...
        phases = []
        def green():
            phases.append(1)
            future.cancel()
            eventlet.sleep(1)
            # This should never be reached:
            phases.append(2)
    
        gthread = eventlet.spawn(green)
    
        async def go():
            try:
                await gthread
            except asyncio.CancelledError:
                return "good"
            else:
                return "bad"
    
        future = asyncio.ensure_future(go())
>       assert spawn_for_awaitable(future).wait() == "good"

tests/asyncio_test.py:244: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
eventlet/greenthread.py:224: in wait
    return self._exit_event.wait()
eventlet/event.py:124: in wait
    result = hub.switch()
eventlet/hubs/hub.py:310: in switch
    return self.greenlet.switch()
eventlet/greenthread.py:264: in main
    result = function(*args, **kwargs)
eventlet/asyncio.py:56: in _run
    return future.result()
_ _ _ _ _ _ _ _ _ _ _ _ _ _

>   async def go():
E   asyncio.exceptions.CancelledError

tests/asyncio_test.py:235: CancelledError
>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>
> /home/dev/app/tests/asyncio_test.py(235)go()
-> async def go():
(Pdb) asyncio.all_tasks()
{<Task pending name='Task-1' coro=<Hub.run.<locals>.async_run() running at /home/dev/app/eventlet/hubs/asyncio.py:141> cb=[_run_until_complete_cb() at /usr/local/lib/python3.12/asyncio/base_events.py:181] created at /usr/local/lib/python3.12/asyncio/tasks.py:695>}
(Pdb) u
> /home/dev/app/eventlet/asyncio.py(56)_run()
-> return future.result()
(Pdb) future
<Task cancelled name='Task-17' coro=<test_cancelling_future_kills_greenthread.<locals>.go() done, defined at /home/dev/app/tests/asyncio_test.py:235> created at /usr/local/lib/python3.12/asyncio/tasks.py:695>

itamarst

I think this approach is fundamentally not the way to do it.

itamarst · 2024-03-07T13:42:17Z

eventlet/hubs/asyncio.py

+        all_fds = list(self.listeners[self.READ]) + list(self.listeners[self.WRITE])
+        for fd in all_fds:
+            try:
+                select.select([fd], [], [], 0)


Using select() is problematic. It has a fixed max numeric value for file descriptors. So if the max is e.g. 512 and you pass in file descriptor 637, it won't be able to do anything with that descriptor.

To avoid this you want to use poll() if available, and only fall back to select() if poll() is not available (which effectively means Windows).

oh interesting, I was not aware of that threshold. Indeed poll seems more appropriate.
To be honest, I copy/pasted this logic from the other hubs and I tried to adapt it for asyncio. That mean that all the other hubs are impacted by the limitation of select.

itamarst · 2024-03-07T13:46:10Z

eventlet/hubs/asyncio.py


        self.loop.run_until_complete(async_run())
+
+    async def wait(self, seconds=None):


You're duplicating a whole bunch of asyncio logic, and moreover doing it in a way that makes it much less scalable because e.g. you're not using epoll on Linux. And the issues with select()'s max file descriptor limit. This does not seem like the way to solve this problem.

Or, let me rephrase. You are completely breaking the asyncio event loop by blocking. So now any normal asyncio code will only be run intermittently.

That could explain the failure I observed on the asyncio tests you originally implemented. See the job failure.

itamarst · 2024-03-07T13:50:06Z

I can see if I can implement this a different way.

4383 · 2024-03-07T14:13:31Z

I can see if I can implement this a different way.

Sure, do not hesitate, you are far well skilled than me on asyncio things. I think we need this file descriptor management, but I surely implemented it in a bad way. So do not hesitate to go wild! ;)

itamarst · 2024-03-13T17:50:48Z

I think we decided not to go down this route.

4383 force-pushed the asyncio-wait branch from acc165e to bd64755 Compare March 1, 2024 11:23

4383 requested review from itamarst and tipabu March 1, 2024 11:25

4383 force-pushed the asyncio-wait branch from bd64755 to b13a262 Compare March 1, 2024 11:29

4383 changed the title ~~Implement file descriptor removal and waiting in asyncio hub~~ [WIP] Implement file descriptor removal and waiting in asyncio hub Mar 1, 2024

4383 mentioned this pull request Mar 1, 2024

Fix wsgi.server shutdown for in-flight requests #912

Merged

itamarst reviewed Mar 7, 2024

View reviewed changes

itamarst mentioned this pull request Mar 7, 2024

Asyncio hub doesn't clean up bad file descriptors #927

Closed

itamarst closed this Mar 13, 2024


		self.loop.run_until_complete(async_run())

		async def wait(self, seconds=None):

Conversation

4383 commented Mar 1, 2024

Uh oh!

codecov bot commented Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

4383 commented Mar 1, 2024

Uh oh!

4383 commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

4383 commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

4383 commented Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

itamarst left a comment

Choose a reason for hiding this comment

Uh oh!

itamarst Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

4383 Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

itamarst Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

itamarst Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

4383 Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

itamarst commented Mar 7, 2024

Uh oh!

4383 commented Mar 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

itamarst commented Mar 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 1, 2024 •

edited

Loading

4383 commented Mar 4, 2024 •

edited

Loading

4383 commented Mar 4, 2024 •

edited

Loading

4383 commented Mar 5, 2024 •

edited

Loading

4383 commented Mar 7, 2024 •

edited

Loading