Fix wsgi.server shutdown for in-flight requests by dbishop · Pull Request #590 · eventlet/eventlet

dbishop · 2019-10-31T01:52:08Z

While doing some work on OpenStack Swift, I noticed that in-flight
requests were getting their sockets closed by eventlet when the listen
socket was closed. In that case, there is a server process which bound
the listen socket and N fork'ed off worker processes which all run the
wsgi.server loop calling accept() and handling requests.

I noticed that nothing was ever setting connection state to
STATE_REQUEST (i.e. unused constant in previous patch). Upon closer
inspection, it turns out that the socket cleanup code for STATE_IDLE
sockets got applied to the socket with an in-progress request. This
caused code using the socket to get an EPIPE and exit the request's
greenthread without completing the request.

With some instrumentation in the code, without this patch's fix, the new
test observed the code doing this:

(7544) wsgi starting up on http://127.0.0.1:36745
(7544) calling accept
(7544) accept got <eventlet.greenio.base.GreenSocket object at 0x7f7650d73290>, ('127.0.0.1', 50194)
(7544) accepted ('127.0.0.1', 50194)
(7544) completed req ('127.0.0.1', 50194)
(7544) calling accept
(7544) got exception 22
(7544) re-raising!
(7544) wsgi exiting, calling pool.waitall()
Traceback (most recent call last):
File "/tmp/swiftstack-eventlet/eventlet/wsgi.py", line 602, in handle_one_response
write(b'')
File "/tmp/swiftstack-eventlet/eventlet/wsgi.py", line 541, in write
wfile.flush()
File "/usr/lib64/python2.7/socket.py", line 303, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 403, in sendall
tail = self.send(data, flags)
File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 397, in send
return self._send_loop(self.fd.send, data, flags)
File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 384, in _send_loop
return send_method(data, *args)
error: [Errno 32] Broken pipe

127.0.0.1 - - [31/Oct/2019 01:08:05] "PUT /foo-bar HTTP/1.1" 200 0 0.000696
(7544) wsgi exited, is_accepting=True

In the new tests, not having the fix manifests as all greenthreads
exiting before the client finishes the in-flight request.

For the WebsocketWSGI app, it has a private class attribute that tells
the wsgi.server() code to consider its "reqeusts" always idle, which I
believe was the primary intent of the original patch.

While doing some work on OpenStack Swift, I noticed that in-flight requests were getting their sockets closed by eventlet when the listen socket was closed. In that case, there is a server process which bound the listen socket and N fork'ed off worker processes which all run the wsgi.server loop calling accept() and handling requests. I noticed that nothing was ever setting connection state to STATE_REQUEST (i.e. unused constant in previous patch). Upon closer inspection, it turns out that the socket cleanup code for STATE_IDLE sockets got applied to the socket with an in-progress request. This caused code using the socket to get an EPIPE and exit the request's greenthread without completing the request. With some instrumentation in the code, without this patch's fix, the new test observed the code doing this: (7544) wsgi starting up on http://127.0.0.1:36745 (7544) calling accept (7544) accept got <eventlet.greenio.base.GreenSocket object at 0x7f7650d73290>, ('127.0.0.1', 50194) (7544) accepted ('127.0.0.1', 50194) (7544) completed req ('127.0.0.1', 50194) (7544) calling accept (7544) got exception 22 (7544) re-raising! (7544) wsgi exiting, calling pool.waitall() Traceback (most recent call last): File "/tmp/swiftstack-eventlet/eventlet/wsgi.py", line 602, in handle_one_response write(b'') File "/tmp/swiftstack-eventlet/eventlet/wsgi.py", line 541, in write wfile.flush() File "/usr/lib64/python2.7/socket.py", line 303, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 403, in sendall tail = self.send(data, flags) File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 397, in send return self._send_loop(self.fd.send, data, flags) File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 384, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe 127.0.0.1 - - [31/Oct/2019 01:08:05] "PUT /foo-bar HTTP/1.1" 200 0 0.000696 (7544) wsgi exited, is_accepting=True In the new tests, not having the fix manifests as all greenthreads exiting before the client finishes the in-flight request. For the WebsocketWSGI app, it has a private class attribute that tells the wsgi.server() code to consider its "reqeusts" always idle, which I believe was the primary intent of the original patch.

codecov-io · 2019-10-31T07:03:00Z

Codecov Report

Merging #590 into master will decrease coverage by <1%.
The diff coverage is n/a.

@@          Coverage Diff          @@
##           master   #590   +/-   ##
=====================================
- Coverage      46%    46%   -1%     
=====================================
  Files          81     81           
  Lines        7978   7978           
  Branches     1366   1366           
=====================================
- Hits         3701   3682   -19     
- Misses       4019   4034   +15     
- Partials      258    262    +4

Flag	Coverage Δ
#ipv6	`?`
#py27epolls	`49% <ø> (-1%)`	⬇️
#py27poll	`?`
#py27selects	`48% <ø> (-1%)`	⬇️
#py34epolls	`42% <ø> (-1%)`	⬇️
#py34poll	`?`
#py34selects	`?`
#py35epolls	`42% <ø> (-1%)`	⬇️
#py35poll	`42% <ø> (-1%)`	⬇️
#py35selects	`42% <ø> (-1%)`	⬇️
#py36epolls	`42% <ø> (-1%)`	⬇️
#py36poll	`42% <ø> (-1%)`	⬇️
#py36selects	`42% <ø> (-1%)`	⬇️
#py37epolls	`42% <ø> (-1%)`	⬇️
#py37poll	`?`
#py37selects	`42% <ø> (-1%)`	⬇️

Impacted Files	Coverage Δ
eventlet/backdoor.py	`87% <0%> (-7%)`	⬇️
eventlet/hubs/__init__.py	`76% <0%> (-6%)`	⬇️
eventlet/green/ssl.py	`71% <0%> (-2%)`	⬇️
eventlet/hubs/hub.py	`90% <0%> (-1%)`	⬇️
eventlet/greenio/base.py	`87% <0%> (-1%)`	⬇️
eventlet/green/http/client.py	`35% <0%> (-1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac7917c...2f157d3. Read the comment docs.

...which apparently won't let you interpolate ints into bytestrings. Also remove some extraneous comments in the test.

codecov-io · 2019-10-31T15:51:02Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ac7917c) 46% compared to head (a905bf8) 46%.
Report is 114 commits behind head on master.

Additional details and impacted files

@@          Coverage Diff          @@
##           master   #590   +/-   ##
=====================================
- Coverage      46%    46%   -1%     
=====================================
  Files          81     81           
  Lines        7978   7977    -1     
  Branches     1366   1366           
=====================================
- Hits         3701   3690   -11     
- Misses       4019   4026    +7     
- Partials      258    261    +3

Flag	Coverage Δ
ipv6	`15% <ø> (-1%)`	⬇️
py27epolls	`49% <ø> (-1%)`	⬇️
py27poll	`49% <ø> (-1%)`	⬇️
py27selects	`48% <ø> (-1%)`	⬇️
py34epolls	`42% <ø> (-1%)`	⬇️
py34poll	`42% <ø> (-1%)`	⬇️
py34selects	`42% <ø> (-1%)`	⬇️
py35epolls	`42% <ø> (-1%)`	⬇️
py35poll	`42% <ø> (-1%)`	⬇️
py35selects	`42% <ø> (-1%)`	⬇️
py36epolls	`42% <ø> (-1%)`	⬇️
py36poll	`42% <ø> (-1%)`	⬇️
py36selects	`?`
py37epolls	`42% <ø> (-1%)`	⬇️
py37poll	`42% <ø> (-1%)`	⬇️
py37selects	`42% <ø> (-1%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

clayg

I'm not sure about the websocket case at all - but I'll see about a repo clientside script based off these tests to try and understand the problem for wsgi servers on master better at least

tests/wsgi_test.py

Reduce chance for regression in future and show that client socket is unusable after an in-flight requet is finished.

* Update swift from branch 'master' - Merge "Seamlessly reload servers with SIGUSR1" - Seamlessly reload servers with SIGUSR1 Swift servers can now be seamlessly reloaded by sending them a SIGUSR1 (instead of a SIGHUP). The server forks off a synchronized child to wait to close the old listen socket(s) until the new server has started up and bound its listen socket(s). The new server is exec'ed from the old one so its PID doesn't change. This makes Systemd happier, so a ReloadExec= stanza can now be used. The seamless part means that incoming connections will alwyas get accepted either by the old server or the new one. This eliminates client-perceived "downtime" during server reloads, while allowing the server to fully reload, re-reading configuration, becoming a fresh Python interpreter instance, etc. The SO_REUSEPORT socket option has already been getting used, so nothing had to change there. This patch also includes a non-invasive fix for a current eventlet bug; see eventlet/eventlet#590 That bug prevents a SIGHUP "reload" from properly servicing existing requests before old worker processes close sockets and exit. The existing probtests missed this, but the new ones, in this patch, caught it. New probe tests cover both old SIGHUP "reload" behavior as well as the new SIGUSR1 seamless reload behavior. Change-Id: I3e5229d2fb04be67e53533ff65b0870038accbb7

Swift servers can now be seamlessly reloaded by sending them a SIGUSR1 (instead of a SIGHUP). The server forks off a synchronized child to wait to close the old listen socket(s) until the new server has started up and bound its listen socket(s). The new server is exec'ed from the old one so its PID doesn't change. This makes Systemd happier, so a ReloadExec= stanza can now be used. The seamless part means that incoming connections will alwyas get accepted either by the old server or the new one. This eliminates client-perceived "downtime" during server reloads, while allowing the server to fully reload, re-reading configuration, becoming a fresh Python interpreter instance, etc. The SO_REUSEPORT socket option has already been getting used, so nothing had to change there. This patch also includes a non-invasive fix for a current eventlet bug; see eventlet/eventlet#590 That bug prevents a SIGHUP "reload" from properly servicing existing requests before old worker processes close sockets and exit. The existing probtests missed this, but the new ones, in this patch, caught it. New probe tests cover both old SIGHUP "reload" behavior as well as the new SIGUSR1 seamless reload behavior. Change-Id: I3e5229d2fb04be67e53533ff65b0870038accbb7

tipabu

Yeah, this seems much more in line with #188's goal of having some

mechanism to ensure that in progress requests are serviced and cleanly disconnected, but that persistent connections are 'snipped' between requests

I worry a little about the lack of a Timeout around https://github.com/eventlet/eventlet/blob/v0.25.1/eventlet/wsgi.py#L616-L633 but I'm not entirely sure that we're any worse off than we were... I might need to see what kind of trouble a slow client with a large payload can cause.

Might be worth resetting to STATE_IDLE just before that, instead of waiting to get out of handle_one_request.

tipabu · 2019-12-27T06:03:10Z

eventlet/wsgi.py

        self.raw_requestline = self._read_request_line()
+        # WebSocketWSGI needs us, here, to consider the connection always idle
+        if not getattr(self.server.app, '_WSGI_APP_ALWAYS_IDLE', False):
+            self.conn_state[2] = STATE_REQUEST


Oh hey, we weren't actually using that before, were we...

tipabu

OK, I've thought about it some more, and the discard logic is (probably) the minority case -- odds are pretty good that the WSGI app actually needs to read the whole body, in which case moving the state transition doesn't help at all.

I've gotten more nervous about how we plumb in the websocket logic, though -- there may be several layers between eventlet and the WSGI app that got wrapped, no? For routing and such...

I think something like this would work better? I must admit, I don't really have any websockets experience.

diff --git a/eventlet/websocket.py b/eventlet/websocket.py
index 7e8933f..4bd7880 100644
--- a/eventlet/websocket.py
+++ b/eventlet/websocket.py
@@ -77,11 +77,6 @@ class WebSocketWSGI(object):
     the time of closure.
     """
 
-    # This tells the wsgi server code that handling a "request" for this WSGI
-    # app should always be considered "idle" and thus able to be shutdown "mid
-    # request" which for websockets means all the time, I guess.
-    _WSGI_APP_ALWAYS_IDLE = True
-
     def __init__(self, handler):
         self.handler = handler
         self.protocol_version = None
@@ -131,6 +126,10 @@ class WebSocketWSGI(object):
                            [('Connection', 'close'), ] + headers)
             return [body]
 
+        # We're ready to switch protocols; flag the connection
+        # as idle to play well with a graceful stop
+        environ['eventlet.set_idle']()
+
         try:
             self.handler(ws)
         except socket.error as e:
diff --git a/eventlet/wsgi.py b/eventlet/wsgi.py
index 336afd6..38e27c3 100644
--- a/eventlet/wsgi.py
+++ b/eventlet/wsgi.py
@@ -419,9 +419,7 @@ class HttpProtocol(BaseHTTPServer.BaseHTTPRequestHandler):
             self.protocol_version = self.server.max_http_version
 
         self.raw_requestline = self._read_request_line()
-        # WebSocketWSGI needs us, here, to consider the connection always idle
-        if not getattr(self.server.app, '_WSGI_APP_ALWAYS_IDLE', False):
-            self.conn_state[2] = STATE_REQUEST
+        self.conn_state[2] = STATE_REQUEST
         if not self.raw_requestline:
             self.close_connection = 1
             return
@@ -730,6 +728,12 @@ class HttpProtocol(BaseHTTPServer.BaseHTTPRequestHandler):
             chunked_input=chunked)
         env['eventlet.posthooks'] = []
 
+        # WebSocketWSGI needs a way to flag the connection as idle,
+        # since it may never fall out of handle_one_request
+        def set_idle():
+            self.conn_state[2] = STATE_IDLE
+        env['eventlet.set_idle'] = set_idle
+
         return env
 
     def finish(self):

dbishop · 2019-12-27T22:20:55Z

looks much better to me!

…

On Fri, Dec 27, 2019 at 2:09 PM Tim Burke ***@***.***> wrote: ***@***.**** commented on this pull request. OK, I've thought about it some more, and the discard logic is (probably) the minority case -- odds are pretty good that the WSGI app actually needs to read the whole body, in which case moving the state transition doesn't help at all. I've gotten more nervous about how we plumb in the websocket logic, though -- there may be several layers between eventlet and the WSGI app that got wrapped, no? For routing and such... I *think* something like this would work better? I must admit, I don't really have any websockets experience. diff --git a/eventlet/websocket.py b/eventlet/websocket.py index 7e8933f..4bd7880 100644 --- a/eventlet/websocket.py +++ b/eventlet/websocket.py @@ -77,11 +77,6 @@ class WebSocketWSGI(object): the time of closure. """ - # This tells the wsgi server code that handling a "request" for this WSGI - # app should always be considered "idle" and thus able to be shutdown "mid - # request" which for websockets means all the time, I guess. - _WSGI_APP_ALWAYS_IDLE = True - def __init__(self, handler): self.handler = handler self.protocol_version = None @@ -131,6 +126,10 @@ class WebSocketWSGI(object): [('Connection', 'close'), ] + headers) return [body] + # We're ready to switch protocols; flag the connection + # as idle to play well with a graceful stop + environ['eventlet.set_idle']() + try: self.handler(ws) except socket.error as e: diff --git a/eventlet/wsgi.py b/eventlet/wsgi.py index 336afd6..38e27c3 100644 --- a/eventlet/wsgi.py +++ b/eventlet/wsgi.py @@ -419,9 +419,7 @@ class HttpProtocol(BaseHTTPServer.BaseHTTPRequestHandler): self.protocol_version = self.server.max_http_version self.raw_requestline = self._read_request_line() - # WebSocketWSGI needs us, here, to consider the connection always idle - if not getattr(self.server.app, '_WSGI_APP_ALWAYS_IDLE', False): - self.conn_state[2] = STATE_REQUEST + self.conn_state[2] = STATE_REQUEST if not self.raw_requestline: self.close_connection = 1 return @@ -730,6 +728,12 @@ class HttpProtocol(BaseHTTPServer.BaseHTTPRequestHandler): chunked_input=chunked) env['eventlet.posthooks'] = [] + # WebSocketWSGI needs a way to flag the connection as idle, + # since it may never fall out of handle_one_request + def set_idle(): + self.conn_state[2] = STATE_IDLE + env['eventlet.set_idle'] = set_idle + return env def finish(self): — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#590?email_source=notifications&email_token=AAAJWMXTQDTE4TWIXHTBKKLQ2Z4K3A5CNFSM4JHC3R5KYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCQKGFZY#pullrequestreview-336880359>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAJWMXSEXA3JO7SLH43HETQ2Z4K3ANCNFSM4JHC3R5A> .

tipabu

Seems sane to me, though it might be good for someone not involved with Swift to take a look 😉

temoto

thanks

tipabu · 2024-02-05T19:04:31Z

Resolved conflicts and re-opened as #912 (since the source repo got deleted).

Fix tests for python 3.4

2f157d3

...which apparently won't let you interpolate ints into bytestrings. Also remove some extraneous comments in the test.

clayg reviewed Oct 31, 2019

View reviewed changes

tests/wsgi_test.py Outdated Show resolved Hide resolved

tests/wsgi_test.py Show resolved Hide resolved

Review comment fixes.

684f072

Reduce chance for regression in future and show that client socket is unusable after an in-flight requet is finished.

tipabu reviewed Dec 27, 2019

View reviewed changes

Make Websocket apps work when wrapped

a905bf8

tipabu approved these changes Apr 5, 2020

View reviewed changes

temoto approved these changes Oct 22, 2020

View reviewed changes

temoto requested a review from jstasiak October 22, 2020 03:43

itamarst added the bug label Dec 19, 2023

tipabu mentioned this pull request Feb 5, 2024

Fix wsgi.server shutdown for in-flight requests #912

Merged

tipabu closed this Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix wsgi.server shutdown for in-flight requests#590

Fix wsgi.server shutdown for in-flight requests#590
dbishop wants to merge 4 commits intomasterfrom
unknown repository

dbishop commented Oct 31, 2019

Uh oh!

codecov-io commented Oct 31, 2019 •

edited

Loading

Uh oh!

codecov-io commented Oct 31, 2019 •

edited by codecov bot

Loading

Uh oh!

clayg left a comment

Uh oh!

Uh oh!

Uh oh!

tipabu left a comment

Uh oh!

tipabu Dec 27, 2019

Uh oh!

tipabu left a comment

Uh oh!

dbishop commented Dec 27, 2019 via email

Uh oh!

tipabu left a comment

Uh oh!

temoto left a comment

Uh oh!

tipabu commented Feb 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

dbishop commented Oct 31, 2019

Uh oh!

codecov-io commented Oct 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-io commented Oct 31, 2019 • edited by codecov bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

clayg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tipabu left a comment

Choose a reason for hiding this comment

Uh oh!

tipabu Dec 27, 2019

Choose a reason for hiding this comment

Uh oh!

tipabu left a comment

Choose a reason for hiding this comment

Uh oh!

dbishop commented Dec 27, 2019 via email

Uh oh!

tipabu left a comment

Choose a reason for hiding this comment

Uh oh!

temoto left a comment

Choose a reason for hiding this comment

Uh oh!

tipabu commented Feb 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov-io commented Oct 31, 2019 •

edited

Loading

codecov-io commented Oct 31, 2019 •

edited by codecov bot

Loading