Fix wsgi.server shutdown for in-flight requests#590
Conversation
While doing some work on OpenStack Swift, I noticed that in-flight requests were getting their sockets closed by eventlet when the listen socket was closed. In that case, there is a server process which bound the listen socket and N fork'ed off worker processes which all run the wsgi.server loop calling accept() and handling requests. I noticed that nothing was ever setting connection state to STATE_REQUEST (i.e. unused constant in previous patch). Upon closer inspection, it turns out that the socket cleanup code for STATE_IDLE sockets got applied to the socket with an in-progress request. This caused code using the socket to get an EPIPE and exit the request's greenthread without completing the request. With some instrumentation in the code, without this patch's fix, the new test observed the code doing this: (7544) wsgi starting up on http://127.0.0.1:36745 (7544) calling accept (7544) accept got <eventlet.greenio.base.GreenSocket object at 0x7f7650d73290>, ('127.0.0.1', 50194) (7544) accepted ('127.0.0.1', 50194) (7544) completed req ('127.0.0.1', 50194) (7544) calling accept (7544) got exception 22 (7544) re-raising! (7544) wsgi exiting, calling pool.waitall() Traceback (most recent call last): File "/tmp/swiftstack-eventlet/eventlet/wsgi.py", line 602, in handle_one_response write(b'') File "/tmp/swiftstack-eventlet/eventlet/wsgi.py", line 541, in write wfile.flush() File "/usr/lib64/python2.7/socket.py", line 303, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 403, in sendall tail = self.send(data, flags) File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 397, in send return self._send_loop(self.fd.send, data, flags) File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 384, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe 127.0.0.1 - - [31/Oct/2019 01:08:05] "PUT /foo-bar HTTP/1.1" 200 0 0.000696 (7544) wsgi exited, is_accepting=True In the new tests, not having the fix manifests as all greenthreads exiting before the client finishes the in-flight request. For the WebsocketWSGI app, it has a private class attribute that tells the wsgi.server() code to consider its "reqeusts" always idle, which I believe was the primary intent of the original patch.
Codecov Report
@@ Coverage Diff @@
## master #590 +/- ##
=====================================
- Coverage 46% 46% -1%
=====================================
Files 81 81
Lines 7978 7978
Branches 1366 1366
=====================================
- Hits 3701 3682 -19
- Misses 4019 4034 +15
- Partials 258 262 +4
Continue to review full report at Codecov.
|
...which apparently won't let you interpolate ints into bytestrings. Also remove some extraneous comments in the test.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #590 +/- ##
=====================================
- Coverage 46% 46% -1%
=====================================
Files 81 81
Lines 7978 7977 -1
Branches 1366 1366
=====================================
- Hits 3701 3690 -11
- Misses 4019 4026 +7
- Partials 258 261 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
clayg
left a comment
There was a problem hiding this comment.
I'm not sure about the websocket case at all - but I'll see about a repo clientside script based off these tests to try and understand the problem for wsgi servers on master better at least
Reduce chance for regression in future and show that client socket is unusable after an in-flight requet is finished.
* Update swift from branch 'master'
- Merge "Seamlessly reload servers with SIGUSR1"
- Seamlessly reload servers with SIGUSR1
Swift servers can now be seamlessly reloaded by sending them a SIGUSR1
(instead of a SIGHUP). The server forks off a synchronized child to
wait to close the old listen socket(s) until the new server has started
up and bound its listen socket(s). The new server is exec'ed from the
old one so its PID doesn't change. This makes Systemd happier, so a
ReloadExec= stanza can now be used.
The seamless part means that incoming connections will alwyas get
accepted either by the old server or the new one. This eliminates
client-perceived "downtime" during server reloads, while allowing the
server to fully reload, re-reading configuration, becoming a fresh
Python interpreter instance, etc. The SO_REUSEPORT socket option has
already been getting used, so nothing had to change there.
This patch also includes a non-invasive fix for a current eventlet bug;
see eventlet/eventlet#590
That bug prevents a SIGHUP "reload" from properly servicing existing
requests before old worker processes close sockets and exit. The
existing probtests missed this, but the new ones, in this patch, caught
it.
New probe tests cover both old SIGHUP "reload" behavior as well as the
new SIGUSR1 seamless reload behavior.
Change-Id: I3e5229d2fb04be67e53533ff65b0870038accbb7
Swift servers can now be seamlessly reloaded by sending them a SIGUSR1 (instead of a SIGHUP). The server forks off a synchronized child to wait to close the old listen socket(s) until the new server has started up and bound its listen socket(s). The new server is exec'ed from the old one so its PID doesn't change. This makes Systemd happier, so a ReloadExec= stanza can now be used. The seamless part means that incoming connections will alwyas get accepted either by the old server or the new one. This eliminates client-perceived "downtime" during server reloads, while allowing the server to fully reload, re-reading configuration, becoming a fresh Python interpreter instance, etc. The SO_REUSEPORT socket option has already been getting used, so nothing had to change there. This patch also includes a non-invasive fix for a current eventlet bug; see eventlet/eventlet#590 That bug prevents a SIGHUP "reload" from properly servicing existing requests before old worker processes close sockets and exit. The existing probtests missed this, but the new ones, in this patch, caught it. New probe tests cover both old SIGHUP "reload" behavior as well as the new SIGUSR1 seamless reload behavior. Change-Id: I3e5229d2fb04be67e53533ff65b0870038accbb7
tipabu
left a comment
There was a problem hiding this comment.
Yeah, this seems much more in line with #188's goal of having some
mechanism to ensure that in progress requests are serviced and cleanly disconnected, but that persistent connections are 'snipped' between requests
I worry a little about the lack of a Timeout around https://github.com/eventlet/eventlet/blob/v0.25.1/eventlet/wsgi.py#L616-L633 but I'm not entirely sure that we're any worse off than we were... I might need to see what kind of trouble a slow client with a large payload can cause.
Might be worth resetting to STATE_IDLE just before that, instead of waiting to get out of handle_one_request.
eventlet/wsgi.py
Outdated
| self.raw_requestline = self._read_request_line() | ||
| # WebSocketWSGI needs us, here, to consider the connection always idle | ||
| if not getattr(self.server.app, '_WSGI_APP_ALWAYS_IDLE', False): | ||
| self.conn_state[2] = STATE_REQUEST |
There was a problem hiding this comment.
Oh hey, we weren't actually using that before, were we...
tipabu
left a comment
There was a problem hiding this comment.
OK, I've thought about it some more, and the discard logic is (probably) the minority case -- odds are pretty good that the WSGI app actually needs to read the whole body, in which case moving the state transition doesn't help at all.
I've gotten more nervous about how we plumb in the websocket logic, though -- there may be several layers between eventlet and the WSGI app that got wrapped, no? For routing and such...
I think something like this would work better? I must admit, I don't really have any websockets experience.
diff --git a/eventlet/websocket.py b/eventlet/websocket.py
index 7e8933f..4bd7880 100644
--- a/eventlet/websocket.py
+++ b/eventlet/websocket.py
@@ -77,11 +77,6 @@ class WebSocketWSGI(object):
the time of closure.
"""
- # This tells the wsgi server code that handling a "request" for this WSGI
- # app should always be considered "idle" and thus able to be shutdown "mid
- # request" which for websockets means all the time, I guess.
- _WSGI_APP_ALWAYS_IDLE = True
-
def __init__(self, handler):
self.handler = handler
self.protocol_version = None
@@ -131,6 +126,10 @@ class WebSocketWSGI(object):
[('Connection', 'close'), ] + headers)
return [body]
+ # We're ready to switch protocols; flag the connection
+ # as idle to play well with a graceful stop
+ environ['eventlet.set_idle']()
+
try:
self.handler(ws)
except socket.error as e:
diff --git a/eventlet/wsgi.py b/eventlet/wsgi.py
index 336afd6..38e27c3 100644
--- a/eventlet/wsgi.py
+++ b/eventlet/wsgi.py
@@ -419,9 +419,7 @@ class HttpProtocol(BaseHTTPServer.BaseHTTPRequestHandler):
self.protocol_version = self.server.max_http_version
self.raw_requestline = self._read_request_line()
- # WebSocketWSGI needs us, here, to consider the connection always idle
- if not getattr(self.server.app, '_WSGI_APP_ALWAYS_IDLE', False):
- self.conn_state[2] = STATE_REQUEST
+ self.conn_state[2] = STATE_REQUEST
if not self.raw_requestline:
self.close_connection = 1
return
@@ -730,6 +728,12 @@ class HttpProtocol(BaseHTTPServer.BaseHTTPRequestHandler):
chunked_input=chunked)
env['eventlet.posthooks'] = []
+ # WebSocketWSGI needs a way to flag the connection as idle,
+ # since it may never fall out of handle_one_request
+ def set_idle():
+ self.conn_state[2] = STATE_IDLE
+ env['eventlet.set_idle'] = set_idle
+
return env
def finish(self):
|
looks much better to me!
…On Fri, Dec 27, 2019 at 2:09 PM Tim Burke ***@***.***> wrote:
***@***.**** commented on this pull request.
OK, I've thought about it some more, and the discard logic is (probably)
the minority case -- odds are pretty good that the WSGI app actually needs
to read the whole body, in which case moving the state transition doesn't
help at all.
I've gotten more nervous about how we plumb in the websocket logic, though
-- there may be several layers between eventlet and the WSGI app that got
wrapped, no? For routing and such...
I *think* something like this would work better? I must admit, I don't
really have any websockets experience.
diff --git a/eventlet/websocket.py b/eventlet/websocket.py
index 7e8933f..4bd7880 100644
--- a/eventlet/websocket.py
+++ b/eventlet/websocket.py
@@ -77,11 +77,6 @@ class WebSocketWSGI(object):
the time of closure.
"""
- # This tells the wsgi server code that handling a "request" for this WSGI
- # app should always be considered "idle" and thus able to be shutdown "mid
- # request" which for websockets means all the time, I guess.
- _WSGI_APP_ALWAYS_IDLE = True
-
def __init__(self, handler):
self.handler = handler
self.protocol_version = None
@@ -131,6 +126,10 @@ class WebSocketWSGI(object):
[('Connection', 'close'), ] + headers)
return [body]
+ # We're ready to switch protocols; flag the connection
+ # as idle to play well with a graceful stop
+ environ['eventlet.set_idle']()
+
try:
self.handler(ws)
except socket.error as e:
diff --git a/eventlet/wsgi.py b/eventlet/wsgi.py
index 336afd6..38e27c3 100644
--- a/eventlet/wsgi.py
+++ b/eventlet/wsgi.py
@@ -419,9 +419,7 @@ class HttpProtocol(BaseHTTPServer.BaseHTTPRequestHandler):
self.protocol_version = self.server.max_http_version
self.raw_requestline = self._read_request_line()
- # WebSocketWSGI needs us, here, to consider the connection always idle
- if not getattr(self.server.app, '_WSGI_APP_ALWAYS_IDLE', False):
- self.conn_state[2] = STATE_REQUEST
+ self.conn_state[2] = STATE_REQUEST
if not self.raw_requestline:
self.close_connection = 1
return
@@ -730,6 +728,12 @@ class HttpProtocol(BaseHTTPServer.BaseHTTPRequestHandler):
chunked_input=chunked)
env['eventlet.posthooks'] = []
+ # WebSocketWSGI needs a way to flag the connection as idle,
+ # since it may never fall out of handle_one_request
+ def set_idle():
+ self.conn_state[2] = STATE_IDLE
+ env['eventlet.set_idle'] = set_idle
+
return env
def finish(self):
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#590?email_source=notifications&email_token=AAAJWMXTQDTE4TWIXHTBKKLQ2Z4K3A5CNFSM4JHC3R5KYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCQKGFZY#pullrequestreview-336880359>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAJWMXSEXA3JO7SLH43HETQ2Z4K3ANCNFSM4JHC3R5A>
.
|
tipabu
left a comment
There was a problem hiding this comment.
Seems sane to me, though it might be good for someone not involved with Swift to take a look 😉
|
Resolved conflicts and re-opened as #912 (since the source repo got deleted). |
While doing some work on OpenStack Swift, I noticed that in-flight
requests were getting their sockets closed by eventlet when the listen
socket was closed. In that case, there is a server process which bound
the listen socket and N fork'ed off worker processes which all run the
wsgi.server loop calling accept() and handling requests.
I noticed that nothing was ever setting connection state to
STATE_REQUEST (i.e. unused constant in previous patch). Upon closer
inspection, it turns out that the socket cleanup code for STATE_IDLE
sockets got applied to the socket with an in-progress request. This
caused code using the socket to get an EPIPE and exit the request's
greenthread without completing the request.
With some instrumentation in the code, without this patch's fix, the new
test observed the code doing this:
(7544) wsgi starting up on http://127.0.0.1:36745
(7544) calling accept
(7544) accept got <eventlet.greenio.base.GreenSocket object at 0x7f7650d73290>, ('127.0.0.1', 50194)
(7544) accepted ('127.0.0.1', 50194)
(7544) completed req ('127.0.0.1', 50194)
(7544) calling accept
(7544) got exception 22
(7544) re-raising!
(7544) wsgi exiting, calling pool.waitall()
Traceback (most recent call last):
File "/tmp/swiftstack-eventlet/eventlet/wsgi.py", line 602, in handle_one_response
write(b'')
File "/tmp/swiftstack-eventlet/eventlet/wsgi.py", line 541, in write
wfile.flush()
File "/usr/lib64/python2.7/socket.py", line 303, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 403, in sendall
tail = self.send(data, flags)
File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 397, in send
return self._send_loop(self.fd.send, data, flags)
File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 384, in _send_loop
return send_method(data, *args)
error: [Errno 32] Broken pipe
127.0.0.1 - - [31/Oct/2019 01:08:05] "PUT /foo-bar HTTP/1.1" 200 0 0.000696
(7544) wsgi exited, is_accepting=True
In the new tests, not having the fix manifests as all greenthreads
exiting before the client finishes the in-flight request.
For the WebsocketWSGI app, it has a private class attribute that tells
the wsgi.server() code to consider its "reqeusts" always idle, which I
believe was the primary intent of the original patch.