Skip to content

[WIP] Implement file descriptor removal and waiting in asyncio hub#924

Closed
4383 wants to merge 1 commit intoeventlet:masterfrom
4383:asyncio-wait
Closed

[WIP] Implement file descriptor removal and waiting in asyncio hub#924
4383 wants to merge 1 commit intoeventlet:masterfrom
4383:asyncio-wait

Conversation

@4383
Copy link
Copy Markdown
Member

@4383 4383 commented Mar 1, 2024

We recently observed (#912) difference in the behaviors between the asyncio hub and the other eventlet hubs.

The asyncio hub behaved differenly because bad file descriptor were not properly removed from the list of readers and writers listeners.

To clean these bad file descriptors we have to implement an awaitable wait method specific to the asyncio hub

This patch should fix #912.

I prefer to propose this patch as a separated patch to isolate topics and simplify reviews.

#912

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 1, 2024

Codecov Report

Attention: Patch coverage is 19.56522% with 37 lines in your changes are missing coverage. Please review.

Project coverage is 54%. Comparing base (0a130e2) to head (b13a262).

Files Patch % Lines
eventlet/hubs/asyncio.py 19% 37 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #924    +/-   ##
======================================
- Coverage      56%    54%    -2%     
======================================
  Files          89     89            
  Lines        9728   9773    +45     
  Branches     1812   1822    +10     
======================================
- Hits         5461   5347   -114     
- Misses       3892   4054   +162     
+ Partials      375    372     -3     
Flag Coverage Δ
ipv6 23% <19%> (-1%) ⬇️
py310asyncio ?
py310epolls 53% <19%> (-1%) ⬇️
py310poll 53% <19%> (-1%) ⬇️
py310selects 53% <19%> (-1%) ⬇️
py311asyncio ?
py311epolls 53% <19%> (-1%) ⬇️
py312asyncio ?
py312epolls 51% <19%> (-1%) ⬇️
py37asyncio ?
py37epolls 51% <19%> (-1%) ⬇️
py38asyncio ?
py38epolls 53% <19%> (-1%) ⬇️
py38openssl 51% <19%> (-1%) ⬇️
py38poll 53% <19%> (-1%) ⬇️
py38selects 53% <19%> (-1%) ⬇️
py39asyncio ?
py39dnspython1 51% <19%> (-1%) ⬇️
py39epolls 53% <19%> (-1%) ⬇️
py39poll 53% <19%> (-1%) ⬇️
py39selects 52% <19%> (-1%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@4383 4383 requested review from itamarst and tipabu March 1, 2024 11:25
We recently observed (eventlet#912) difference in the behaviors between
the asyncio hub and the other eventlet hubs.

The asyncio hub behaved differenly because bad file descriptor
were not properly removed from the list of readers and writers
listeners.

To clean these bad file descriptors we have to implement
an awaitable wait method specific to the asyncio hub

This patch should fix eventlet#912.

I prefer to propose this patch as a separated patch
to isolate topics and simplify reviews.

eventlet#912
@4383 4383 changed the title Implement file descriptor removal and waiting in asyncio hub [WIP] Implement file descriptor removal and waiting in asyncio hub Mar 1, 2024
@4383
Copy link
Copy Markdown
Member Author

4383 commented Mar 1, 2024

I need to address the job errors. Will have a look soon.
Do not hesitate to share your feeling/suggestions.

@4383
Copy link
Copy Markdown
Member Author

4383 commented Mar 4, 2024

The test in error (test_cancelling_future_kills_greenthread) only fails if it is launched between other tests.

If I run this (failing) test in isolated way, it is successful each time I run it...

.tox/py312-asyncio/bin/py.test \
tests/asyncio_test.py::test_cancelling_future_kills_greenthread
============================================ test session starts ======================================================
platform linux -- Python 3.12.2, pytest-8.0.2, pluggy-1.4.0
rootdir: /home/dev/app
plugins: cov-4.1.0
collected 1 item                                                                                                                                                                                                                              

tests/asyncio_test.py .                                                                                                                                                                                                                 [100%]

================================ 1 passed in 0.21s ==========================================

@4383
Copy link
Copy Markdown
Member Author

4383 commented Mar 4, 2024

Again I observes two different behaviors depending on if I run the test (`test_cancelling_future_kills_greenthread``) alone [1] or not [2].

If I run the problematic test among other tests, the test do not seems to run the go inner method, and, so the CancelledError thrown by the greenthread is not properly catched up, and then the test is broken...


  File "/home/dev/app/eventlet/event.py", line 173, in _do_send
    waiter.switch(result)
  File "/home/dev/app/eventlet/greenthread.py", line 267, in main
    self._resolve_links()
  File "/home/dev/app/eventlet/greenthread.py", line 283, in _resolve_links
    f(self, *ca, **ckw)
  File "/home/dev/app/eventlet/asyncio.py", line 42, in _got_result
    gthread.wait()
  File "/home/dev/app/eventlet/greenthread.py", line 224, in wait
    return self._exit_event.wait()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/app/eventlet/event.py", line 131, in wait
    current.throw(*self._exc)
  File "/home/dev/app/eventlet/greenthread.py", line 264, in main
    result = function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/app/eventlet/asyncio.py", line 54, in _run
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/dev/app/tests/asyncio_test.py", line 236, in go
    async def go():
asyncio.exceptions.CancelledError

async def go():
try:
await gthread
except asyncio.CancelledError:
return "good"
else:
return "bad"

def green():
phases.append(1)
future.cancel()
eventlet.sleep(1)
# This should never be reached:
phases.append(2)

This strange behavior seems introduced by the changes I proposed in this pull request.

If I run the same test independently of the other tests, then this test is successful... See my previous comment.

[1] .tox/py312-asyncio/bin/py.test tests/asyncio_test.py::test_cancelling_future_kills_greenthread
[2] .tox/py312-asyncio/bin/py.test tests/asyncio_test.py

@4383
Copy link
Copy Markdown
Member Author

4383 commented Mar 5, 2024

Here is a reproducer by entering a PDB post mortem on the broken test.
We can observe that the go task have been canceled. However we can also observe that a task remains pending. The pending task is related to the clock used by hubs (default to monotonic).

https://github.com/eventlet/eventlet/blob/0a130e2077ae7f5fd4f250b0bf8a9037000fc6bd/eventlet/hubs/hub.py#L124C12-L126

And which is called in the asyncio hub module:

self.fire_timers(self.clock())

$ .tox/py312-asyncio/bin/py.test --pdb  tests/asyncio_test.py
...
        phases = []
        def green():
            phases.append(1)
            future.cancel()
            eventlet.sleep(1)
            # This should never be reached:
            phases.append(2)
    
        gthread = eventlet.spawn(green)
    
        async def go():
            try:
                await gthread
            except asyncio.CancelledError:
                return "good"
            else:
                return "bad"
    
        future = asyncio.ensure_future(go())
>       assert spawn_for_awaitable(future).wait() == "good"

tests/asyncio_test.py:244: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
eventlet/greenthread.py:224: in wait
    return self._exit_event.wait()
eventlet/event.py:124: in wait
    result = hub.switch()
eventlet/hubs/hub.py:310: in switch
    return self.greenlet.switch()
eventlet/greenthread.py:264: in main
    result = function(*args, **kwargs)
eventlet/asyncio.py:56: in _run
    return future.result()
_ _ _ _ _ _ _ _ _ _ _ _ _ _

>   async def go():
E   asyncio.exceptions.CancelledError

tests/asyncio_test.py:235: CancelledError
>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>
> /home/dev/app/tests/asyncio_test.py(235)go()
-> async def go():
(Pdb) asyncio.all_tasks()
{<Task pending name='Task-1' coro=<Hub.run.<locals>.async_run() running at /home/dev/app/eventlet/hubs/asyncio.py:141> cb=[_run_until_complete_cb() at /usr/local/lib/python3.12/asyncio/base_events.py:181] created at /usr/local/lib/python3.12/asyncio/tasks.py:695>}
(Pdb) u
> /home/dev/app/eventlet/asyncio.py(56)_run()
-> return future.result()
(Pdb) future
<Task cancelled name='Task-17' coro=<test_cancelling_future_kills_greenthread.<locals>.go() done, defined at /home/dev/app/tests/asyncio_test.py:235> created at /usr/local/lib/python3.12/asyncio/tasks.py:695>

Copy link
Copy Markdown
Contributor

@itamarst itamarst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is fundamentally not the way to do it.

all_fds = list(self.listeners[self.READ]) + list(self.listeners[self.WRITE])
for fd in all_fds:
try:
select.select([fd], [], [], 0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using select() is problematic. It has a fixed max numeric value for file descriptors. So if the max is e.g. 512 and you pass in file descriptor 637, it won't be able to do anything with that descriptor.

To avoid this you want to use poll() if available, and only fall back to select() if poll() is not available (which effectively means Windows).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh interesting, I was not aware of that threshold. Indeed poll seems more appropriate.
To be honest, I copy/pasted this logic from the other hubs and I tried to adapt it for asyncio. That mean that all the other hubs are impacted by the limitation of select.


self.loop.run_until_complete(async_run())

async def wait(self, seconds=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're duplicating a whole bunch of asyncio logic, and moreover doing it in a way that makes it much less scalable because e.g. you're not using epoll on Linux. And the issues with select()'s max file descriptor limit. This does not seem like the way to solve this problem.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, let me rephrase. You are completely breaking the asyncio event loop by blocking. So now any normal asyncio code will only be run intermittently.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could explain the failure I observed on the asyncio tests you originally implemented. See the job failure.

@itamarst
Copy link
Copy Markdown
Contributor

itamarst commented Mar 7, 2024

I can see if I can implement this a different way.

@4383
Copy link
Copy Markdown
Member Author

4383 commented Mar 7, 2024

I can see if I can implement this a different way.

Sure, do not hesitate, you are far well skilled than me on asyncio things. I think we need this file descriptor management, but I surely implemented it in a bad way. So do not hesitate to go wild! ;)

@itamarst
Copy link
Copy Markdown
Contributor

I think we decided not to go down this route.

@itamarst itamarst closed this Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants