Skip to content

Fix wsgi.server shutdown for in-flight requests#590

Closed
dbishop wants to merge 4 commits intomasterfrom
unknown repository
Closed

Fix wsgi.server shutdown for in-flight requests#590
dbishop wants to merge 4 commits intomasterfrom
unknown repository

Conversation

@dbishop
Copy link
Copy Markdown
Contributor

@dbishop dbishop commented Oct 31, 2019

While doing some work on OpenStack Swift, I noticed that in-flight
requests were getting their sockets closed by eventlet when the listen
socket was closed. In that case, there is a server process which bound
the listen socket and N fork'ed off worker processes which all run the
wsgi.server loop calling accept() and handling requests.

I noticed that nothing was ever setting connection state to
STATE_REQUEST (i.e. unused constant in previous patch). Upon closer
inspection, it turns out that the socket cleanup code for STATE_IDLE
sockets got applied to the socket with an in-progress request. This
caused code using the socket to get an EPIPE and exit the request's
greenthread without completing the request.

With some instrumentation in the code, without this patch's fix, the new
test observed the code doing this:

(7544) wsgi starting up on http://127.0.0.1:36745
(7544) calling accept
(7544) accept got <eventlet.greenio.base.GreenSocket object at 0x7f7650d73290>, ('127.0.0.1', 50194)
(7544) accepted ('127.0.0.1', 50194)
(7544) completed req ('127.0.0.1', 50194)
(7544) calling accept
(7544) got exception 22
(7544) re-raising!
(7544) wsgi exiting, calling pool.waitall()
Traceback (most recent call last):
File "/tmp/swiftstack-eventlet/eventlet/wsgi.py", line 602, in handle_one_response
write(b'')
File "/tmp/swiftstack-eventlet/eventlet/wsgi.py", line 541, in write
wfile.flush()
File "/usr/lib64/python2.7/socket.py", line 303, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 403, in sendall
tail = self.send(data, flags)
File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 397, in send
return self._send_loop(self.fd.send, data, flags)
File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 384, in _send_loop
return send_method(data, *args)
error: [Errno 32] Broken pipe

127.0.0.1 - - [31/Oct/2019 01:08:05] "PUT /foo-bar HTTP/1.1" 200 0 0.000696
(7544) wsgi exited, is_accepting=True

In the new tests, not having the fix manifests as all greenthreads
exiting before the client finishes the in-flight request.

For the WebsocketWSGI app, it has a private class attribute that tells
the wsgi.server() code to consider its "reqeusts" always idle, which I
believe was the primary intent of the original patch.

While doing some work on OpenStack Swift, I noticed that in-flight
requests were getting their sockets closed by eventlet when the listen
socket was closed.  In that case, there is a server process which bound
the listen socket and N fork'ed off worker processes which all run the
wsgi.server loop calling accept() and handling requests.

I noticed that nothing was ever setting connection state to
STATE_REQUEST (i.e. unused constant in previous patch).  Upon closer
inspection, it turns out that the socket cleanup code for STATE_IDLE
sockets got applied to the socket with an in-progress request.  This
caused code using the socket to get an EPIPE and exit the request's
greenthread without completing the request.

With some instrumentation in the code, without this patch's fix, the new
test observed the code doing this:

(7544) wsgi starting up on http://127.0.0.1:36745
(7544) calling accept
(7544) accept got <eventlet.greenio.base.GreenSocket object at 0x7f7650d73290>, ('127.0.0.1', 50194)
(7544) accepted ('127.0.0.1', 50194)
(7544) completed req ('127.0.0.1', 50194)
(7544) calling accept
(7544) got exception 22
(7544) re-raising!
(7544) wsgi exiting, calling pool.waitall()
Traceback (most recent call last):
  File "/tmp/swiftstack-eventlet/eventlet/wsgi.py", line 602, in handle_one_response
    write(b'')
  File "/tmp/swiftstack-eventlet/eventlet/wsgi.py", line 541, in write
    wfile.flush()
  File "/usr/lib64/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
  File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 403, in sendall
    tail = self.send(data, flags)
  File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 397, in send
    return self._send_loop(self.fd.send, data, flags)
  File "/tmp/swiftstack-eventlet/eventlet/greenio/base.py", line 384, in _send_loop
    return send_method(data, *args)
error: [Errno 32] Broken pipe

127.0.0.1 - - [31/Oct/2019 01:08:05] "PUT /foo-bar HTTP/1.1" 200 0 0.000696
(7544) wsgi exited, is_accepting=True

In the new tests, not having the fix manifests as all greenthreads
exiting before the client finishes the in-flight request.

For the WebsocketWSGI app, it has a private class attribute that tells
the wsgi.server() code to consider its "reqeusts" always idle, which I
believe was the primary intent of the original patch.
@codecov-io
Copy link
Copy Markdown

codecov-io commented Oct 31, 2019

Codecov Report

Merging #590 into master will decrease coverage by <1%.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #590   +/-   ##
=====================================
- Coverage      46%    46%   -1%     
=====================================
  Files          81     81           
  Lines        7978   7978           
  Branches     1366   1366           
=====================================
- Hits         3701   3682   -19     
- Misses       4019   4034   +15     
- Partials      258    262    +4
Flag Coverage Δ
#ipv6 ?
#py27epolls 49% <ø> (-1%) ⬇️
#py27poll ?
#py27selects 48% <ø> (-1%) ⬇️
#py34epolls 42% <ø> (-1%) ⬇️
#py34poll ?
#py34selects ?
#py35epolls 42% <ø> (-1%) ⬇️
#py35poll 42% <ø> (-1%) ⬇️
#py35selects 42% <ø> (-1%) ⬇️
#py36epolls 42% <ø> (-1%) ⬇️
#py36poll 42% <ø> (-1%) ⬇️
#py36selects 42% <ø> (-1%) ⬇️
#py37epolls 42% <ø> (-1%) ⬇️
#py37poll ?
#py37selects 42% <ø> (-1%) ⬇️
Impacted Files Coverage Δ
eventlet/backdoor.py 87% <0%> (-7%) ⬇️
eventlet/hubs/__init__.py 76% <0%> (-6%) ⬇️
eventlet/green/ssl.py 71% <0%> (-2%) ⬇️
eventlet/hubs/hub.py 90% <0%> (-1%) ⬇️
eventlet/greenio/base.py 87% <0%> (-1%) ⬇️
eventlet/green/http/client.py 35% <0%> (-1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac7917c...2f157d3. Read the comment docs.

...which apparently won't let you interpolate ints into bytestrings.

Also remove some extraneous comments in the test.
@codecov-io
Copy link
Copy Markdown

codecov-io commented Oct 31, 2019

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ac7917c) 46% compared to head (a905bf8) 46%.
Report is 114 commits behind head on master.

Additional details and impacted files
@@          Coverage Diff          @@
##           master   #590   +/-   ##
=====================================
- Coverage      46%    46%   -1%     
=====================================
  Files          81     81           
  Lines        7978   7977    -1     
  Branches     1366   1366           
=====================================
- Hits         3701   3690   -11     
- Misses       4019   4026    +7     
- Partials      258    261    +3     
Flag Coverage Δ
ipv6 15% <ø> (-1%) ⬇️
py27epolls 49% <ø> (-1%) ⬇️
py27poll 49% <ø> (-1%) ⬇️
py27selects 48% <ø> (-1%) ⬇️
py34epolls 42% <ø> (-1%) ⬇️
py34poll 42% <ø> (-1%) ⬇️
py34selects 42% <ø> (-1%) ⬇️
py35epolls 42% <ø> (-1%) ⬇️
py35poll 42% <ø> (-1%) ⬇️
py35selects 42% <ø> (-1%) ⬇️
py36epolls 42% <ø> (-1%) ⬇️
py36poll 42% <ø> (-1%) ⬇️
py36selects ?
py37epolls 42% <ø> (-1%) ⬇️
py37poll 42% <ø> (-1%) ⬇️
py37selects 42% <ø> (-1%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@clayg clayg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the websocket case at all - but I'll see about a repo clientside script based off these tests to try and understand the problem for wsgi servers on master better at least

Reduce chance for regression in future and show that client socket is
unusable after an in-flight requet is finished.
openstack-gerrit pushed a commit to openstack/openstack that referenced this pull request Nov 14, 2019
* Update swift from branch 'master'
  - Merge "Seamlessly reload servers with SIGUSR1"
  - Seamlessly reload servers with SIGUSR1
    
    Swift servers can now be seamlessly reloaded by sending them a SIGUSR1
    (instead of a SIGHUP).  The server forks off a synchronized child to
    wait to close the old listen socket(s) until the new server has started
    up and bound its listen socket(s).  The new server is exec'ed from the
    old one so its PID doesn't change.  This makes Systemd happier, so a
    ReloadExec= stanza can now be used.
    
    The seamless part means that incoming connections will alwyas get
    accepted either by the old server or the new one.  This eliminates
    client-perceived "downtime" during server reloads, while allowing the
    server to fully reload, re-reading configuration, becoming a fresh
    Python interpreter instance, etc.  The SO_REUSEPORT socket option has
    already been getting used, so nothing had to change there.
    
    This patch also includes a non-invasive fix for a current eventlet bug;
    see eventlet/eventlet#590
    That bug prevents a SIGHUP "reload" from properly servicing existing
    requests before old worker processes close sockets and exit.  The
    existing probtests missed this, but the new ones, in this patch, caught
    it.
    
    New probe tests cover both old SIGHUP "reload" behavior as well as the
    new SIGUSR1 seamless reload behavior.
    
    Change-Id: I3e5229d2fb04be67e53533ff65b0870038accbb7
openstack-gerrit pushed a commit to openstack/swift that referenced this pull request Nov 14, 2019
Swift servers can now be seamlessly reloaded by sending them a SIGUSR1
(instead of a SIGHUP).  The server forks off a synchronized child to
wait to close the old listen socket(s) until the new server has started
up and bound its listen socket(s).  The new server is exec'ed from the
old one so its PID doesn't change.  This makes Systemd happier, so a
ReloadExec= stanza can now be used.

The seamless part means that incoming connections will alwyas get
accepted either by the old server or the new one.  This eliminates
client-perceived "downtime" during server reloads, while allowing the
server to fully reload, re-reading configuration, becoming a fresh
Python interpreter instance, etc.  The SO_REUSEPORT socket option has
already been getting used, so nothing had to change there.

This patch also includes a non-invasive fix for a current eventlet bug;
see eventlet/eventlet#590
That bug prevents a SIGHUP "reload" from properly servicing existing
requests before old worker processes close sockets and exit.  The
existing probtests missed this, but the new ones, in this patch, caught
it.

New probe tests cover both old SIGHUP "reload" behavior as well as the
new SIGUSR1 seamless reload behavior.

Change-Id: I3e5229d2fb04be67e53533ff65b0870038accbb7
Copy link
Copy Markdown
Contributor

@tipabu tipabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems much more in line with #188's goal of having some

mechanism to ensure that in progress requests are serviced and cleanly disconnected, but that persistent connections are 'snipped' between requests

I worry a little about the lack of a Timeout around https://github.com/eventlet/eventlet/blob/v0.25.1/eventlet/wsgi.py#L616-L633 but I'm not entirely sure that we're any worse off than we were... I might need to see what kind of trouble a slow client with a large payload can cause.

Might be worth resetting to STATE_IDLE just before that, instead of waiting to get out of handle_one_request.

eventlet/wsgi.py Outdated
self.raw_requestline = self._read_request_line()
# WebSocketWSGI needs us, here, to consider the connection always idle
if not getattr(self.server.app, '_WSGI_APP_ALWAYS_IDLE', False):
self.conn_state[2] = STATE_REQUEST
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hey, we weren't actually using that before, were we...

Copy link
Copy Markdown
Contributor

@tipabu tipabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've thought about it some more, and the discard logic is (probably) the minority case -- odds are pretty good that the WSGI app actually needs to read the whole body, in which case moving the state transition doesn't help at all.

I've gotten more nervous about how we plumb in the websocket logic, though -- there may be several layers between eventlet and the WSGI app that got wrapped, no? For routing and such...

I think something like this would work better? I must admit, I don't really have any websockets experience.

diff --git a/eventlet/websocket.py b/eventlet/websocket.py
index 7e8933f..4bd7880 100644
--- a/eventlet/websocket.py
+++ b/eventlet/websocket.py
@@ -77,11 +77,6 @@ class WebSocketWSGI(object):
     the time of closure.
     """
 
-    # This tells the wsgi server code that handling a "request" for this WSGI
-    # app should always be considered "idle" and thus able to be shutdown "mid
-    # request" which for websockets means all the time, I guess.
-    _WSGI_APP_ALWAYS_IDLE = True
-
     def __init__(self, handler):
         self.handler = handler
         self.protocol_version = None
@@ -131,6 +126,10 @@ class WebSocketWSGI(object):
                            [('Connection', 'close'), ] + headers)
             return [body]
 
+        # We're ready to switch protocols; flag the connection
+        # as idle to play well with a graceful stop
+        environ['eventlet.set_idle']()
+
         try:
             self.handler(ws)
         except socket.error as e:
diff --git a/eventlet/wsgi.py b/eventlet/wsgi.py
index 336afd6..38e27c3 100644
--- a/eventlet/wsgi.py
+++ b/eventlet/wsgi.py
@@ -419,9 +419,7 @@ class HttpProtocol(BaseHTTPServer.BaseHTTPRequestHandler):
             self.protocol_version = self.server.max_http_version
 
         self.raw_requestline = self._read_request_line()
-        # WebSocketWSGI needs us, here, to consider the connection always idle
-        if not getattr(self.server.app, '_WSGI_APP_ALWAYS_IDLE', False):
-            self.conn_state[2] = STATE_REQUEST
+        self.conn_state[2] = STATE_REQUEST
         if not self.raw_requestline:
             self.close_connection = 1
             return
@@ -730,6 +728,12 @@ class HttpProtocol(BaseHTTPServer.BaseHTTPRequestHandler):
             chunked_input=chunked)
         env['eventlet.posthooks'] = []
 
+        # WebSocketWSGI needs a way to flag the connection as idle,
+        # since it may never fall out of handle_one_request
+        def set_idle():
+            self.conn_state[2] = STATE_IDLE
+        env['eventlet.set_idle'] = set_idle
+
         return env
 
     def finish(self):

@dbishop
Copy link
Copy Markdown
Contributor Author

dbishop commented Dec 27, 2019 via email

Copy link
Copy Markdown
Contributor

@tipabu tipabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems sane to me, though it might be good for someone not involved with Swift to take a look 😉

Copy link
Copy Markdown
Member

@temoto temoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@tipabu
Copy link
Copy Markdown
Contributor

tipabu commented Feb 5, 2024

Resolved conflicts and re-opened as #912 (since the source repo got deleted).

@tipabu tipabu closed this Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants