Honor the timeout setting of the underlying stream#69
Honor the timeout setting of the underlying stream#69jini-zh wants to merge 2 commits intocl-plus-ssl:masterfrom
Conversation
|
Travis has found an error in the IO-WAIT implementation under ECL. I fixed it with b036b50. There are also failed tests under CCL because its IO-WAIT returns NIL which is treated as timeout by the new version of ENSURE-SSL-FUNCALL. The CCL IO-WAIT also emits CCL::COMMUNICATION-DEADLINE-EXPIRED on deadline. Maybe the timeout should be signalled by IO-WAITs instead of ENSURE-SSL-FUNCALL? I'm waiting for your comments. |
|
@jini-zh , I'm not ready to comment - need to study all that. |
|
@jini-zh, implementation aside, in terms of cl+ssl API, what changes for the user? |
|
@jini-zh, so you are trying to solve the problem of hunchentoot not expiring keep-alive connections? Achieving that via the mix of lisp implementation-dependent, sometimes incomplete, interfaces to sockopt and socket read/write functions, mixed into usocket and cl+ssl doesn't seem reliable. I'd suggest implementing a background thread in hunchentoot which periodically loops over the keep-alive connections and closes timed-out ones. This would work on all implementations and would be more reliable. cc @hanshuebner Only that the single-threaded version of hunchentoot will not be fixed by such a thread. Therefore we will do what we can in cl+ssl too. But setting socket timeout on an underlying socket, then wrapping it into an OpenSSL stream and hoping the wrapper stream will inherit the same timeout behavior doesn't seem correct. Probably the timeout configuration should be done via cl+ssl API directly. @jini-zh, can you show in hunchentoot code how "the thread waits for further requests"? What function do you expect to signal the new TIMEOUT condition? |
|
@avodonosov I would be kind of reluctant to recommend accessing CL+SSL streams from multiple threads at the same time, as I would not trust that CL+SSL and the OpenSSL API would like it if they're being called concurrently. If I were to implement the background, timeout-handling thread, I'd make it close the socket using a direct operating system call and then let the other thread that might be waiting on the CL+SSL socket deal with the hanging read/write system call returning an error. That approach would still require investigating whether a In that sense, using socket options instead of implementing the timeout in Lisp is safer as well, and I would generally think that this would be the overall best approach if all that is desired is eventual timeout, and not correct deadline behavior. |
In terms of cl+ssl API, a signal is emitted when the underlying stream times out. The signal is sent through SIGNAL, so if it isn't caught, execution proceeds as before the patch. I don't know what is The Right Thing here, but I think that there should be some way to catch a timeout in the wrapped stream. The approach I took seemed to be the least invasive and backwards compatible. By the way, in the current master branch, there is some logic regarding deadlines on stream input/output. Does it work? There is no way out of the loop in ENSURE-SSL-FUNCALL except with a condition of some sort. The following code raises a condition of TYPE-ERROR because SB-SYS:WAIT-UNTIL-FD-USABLE eventually gets a negative timeout parameter, but that's too obscure to rely on. However, SSL-STREAM-DEADLINE is not exported, so maybe I'm talking about an unfinished feature here.
Yes.
Well, this is the way the timeout is set in the Hunchentoot code, and also the way the deadlines are implemented in CL+SSL (INPUT-WAIT and OUTPUT-WAIT depend on the implementation). I don't like the solution with the timeout-handling thread. For one thing, it's going to be tricky to access the thread socket, fd-stream or ssl-stream. Currently the ssl-stream is bound to a Hunchentoot dynamic variable. I will need to manage a list of streams and work around possible race conditions. For the other thing, having a thread messing with other threads just to make them terminate on timeout doesn't feel right. If you won't accept the patch, I will probably just override CL+SSL::INPUT-WAIT in my project.
When a client connects to the server, generic function HUCHENTOOT:PROCESS-CONNECTION is called with two arguments: ACCEPTOR which is an instance of a subclass of HUNCHENTOOT:ACCEPTOR, and SOCKET which is an instance of USOCKET:STREAM-USOCKET. PROCESS-CONNECTION calls generic function INITIALIZE-CONNECTION-STREAM which wraps SOCKET with a CL+SSL::SSL-STREAM if ACCEPTOR is also an instance of a subclass of HUNCHENTOOT:SSL-ACCEPTOR (in other words, if SSL is enabled). Then PROCESS-CONNECTION processes requests in a loop, blocking while reading from the socket. In SBCL, when SSL is disabled, the reading is performed on an instance of SB-SYS:FD-STREAM. SB-SYS:IO-TIMEOUT is raised on timeout. In the :AROUND method of PROCESS-CONNECTION, it is first converted to USOCKET:TIMEOUT-ERROR by the WITH-MAPPED-CONDITIONS macro and then logged with the WITH-CONDITIONS-CAUGHT-AND-LOGGED macro. After that the thread is allowed to terminate. When SSL is enabled, the timeout is ignored, and the thread is blocked in SB-SYS:WAIT-UNTIL-FD-USABLE called from CL+SSL::INPUT-WAIT until the socket is closed gracefully. With the patch applied, I expect to get CL+SSL:TIMEOUT in GET-REQUEST-DATA called in PROCESS-CONNECTION. Then I can handle the timeout with the following code: where SSL-WEBSERVER is a subclass of HUNCHENTOOT:SSL-WEBSERVER (the symbol belongs to another package). Given the process with SSL disabled described above, I now think that this is not the proper approach. A better way would be to raise an error in CL+SSL::IO-WAIT: SB-SYS:IO-TIMEOUT in SBCL and CCL::COMMUNICATION-DEADLINE-EXPIRED in CCL (unless there is a better suited condition in CCL, I've got to check its documentation). I think that's the way it is supposed to work with deadlines in CCL. @avodonosov, would you like me to implement it? |
|
@hanshuebner, what is the "correct deadline behavior"?
That is a great API.
Exactly! They seem to be called deadlines. And it is the third part of the equation we need to solve in this issue. It was implemented in this commit: 5bd5225 I even suspect your goal may be achieved by implementing ssl-stream-deadline for your stream, which would take into account the underlying stream deadline. Why the timeout not honored today? Because (sb-sys:wait-until-fd-usable ...) is called with nil timeout, where nil is the value of (ssl-stream-deadline stream), right? So the problem we address here is not when OpenSSL reads the socket, but when we wait with (sb-sys:wait-until-fd-usable ...). The case of OpenSSL read encoutering timeout deserves attention too. What will ssl-read return if the socke read it calls returns EWOULDBLOCK because of the timeout? How will our ensure-ssl-funcall handle the ssl-read return value? |
|
@avodonosov sorry, I am having a very busy week. I will come back to this topic at the weekend. |
I know it isn't. I was hoping you'd help me to find a better solution. I didn't want to be intrusive.
I was thinking of a more general approach. In fact, I'm surprised that nobody complained about stale threads with SBCL + Hunchentoot + SSL so far. I suspect that I'm doing something wrong, but it also might the undermanning of the Common Lisp community. Anyway, I found a better solution (see below).
Right.
Correct.
SBCL tests whether a read from an FD-STREAM not associated with a regular file would block with a call to I found that if UNWRAP-STREAM-P is set to NIL in the call to MAKE-SSL-SERVER-STREAM, then the OpenSSL library is arranged to read from Lisp streams through a I am going to test that next week. If I get no stale threads, I will submit a patch to Hunchentoot. Would you want me to implement the timeout in CL+SSL anyway? |
Maybe people implement SSL by putting nginx or another reverse proxy in front of Hunchentoot.
If that's an answer to my question, then I meant a different thing. Not when lisp code waits for input, but when OpenSSL code reads the file descriptor provided, and encounters the timeout.
There are some deadline examples in https://github.com/cl-plus-ssl/cl-plus-ssl/blob/90619d23f7deca0949ba41f3c3a99921c8643a1f/test.lisp. Maybe that can shed some light on the question.
Years ago I heard on the mailing list that lisp BIOs are unstable. That's why people use file descriptor approach with cl+ssl. But maybe that is imprecise or just outdated info. You can try.
Fixed, thanks.
I would be glad. |
Yeah, that looks quite plausible.
As per
Yeah, it can. In READ-DEADLINE I see that the CCL version of the code expects CCL::COMMUNICATION-DEADLINE-EXPIRED which is raised in INPUT-WAIT, while the SBCL version expects SB-SYS:DEADLINE-TIMEOUT which is raised from SBCL internals when the call is wrapped in SB-SYS:WITH-DEADLINE (it is). I noticed that the CCL INPUT-WAIT does use the underlying stream/socket timeout through the call to STREAM-DEADLINE. Apparently, such a functionality was intended in general, but wasn't implemented for SBCL. SB-SYS:WITH-DEADLINE was used instead.
It has been working perfectly for three days under the load of a few users per hour not counting the bots.
Cool. I think I now know how to do it. I will just copy the CCL approach. I will also clean up a few things: INPUT-WAIT and OUTPUT-WAIT share too much code. Do you mind if I start afresh with eventual |
|
-f is ok |
|
Done. I tested it for a week in production, works fine. I also ran the code in CCL and ECL. All my tests and the tests from fiveam succeeded. |
|
I gave it a thought but couldn't imagine anything important to add. The new behaviour is intuitive and is what I would expect from cl+ssl in the first place. Perhaps you want to make public the new generic functions STREAM-INPUT-TIMEOUT, STREAM-OUTPUT-TIMEOUT and STREAM-DEADLINE? In this case I suggest adding to the documentation the following lines:
|
|
to handle? |
|
@jini-zh , btw, do you use also a hunchentoot patch, or the current hunchentoot works as is with the cl+ssl modification you did? |
|
Is that better?
Yes, I was testing with unpatched Hunchentoot. It works as is. |
|
Yes, better, thanks. |
|
@avodonosov I have exported the functions and updated the documentation. |
src/ffi.lisp
Outdated
| (deadline->timeout deadline))) | ||
| (timeout-error (case direction | ||
| (:input 'ccl::input-timeout) | ||
| (:output 'ccl::output-timeout)))) |
There was a problem hiding this comment.
This style of indentation (creating an aligned "second column" of values) is not used in cl+ssl code. Merging it will increase inconsistency in code formatting.
There was a problem hiding this comment.
Okay, I will make it consistent.
src/ffi.lisp
Outdated
| (:output (stream-output-timeout socket))))) | ||
| (deadline-timeout (when deadline | ||
| (coerce (deadline->timeout deadline) | ||
| 'single-float)))) |
There was a problem hiding this comment.
why coerce, and why exactly here (CCL version doesn't have it - something SBCL specific?)
There was a problem hiding this comment.
I thought that SB-SYS:WAIT-UNTIL-FD-USABLE requires a SINGLE-FLOAT as a timeout. But I was wrong; Hucnhentoot does the coercion because the TIMEOUT slot in FD-STREAM is declared as (or single-float null). I will remove the coercion.
src/ffi.lisp
Outdated
| (coerce (deadline->timeout deadline) | ||
| 'single-float)))) | ||
| (when (and deadline (minusp deadline-timeout)) | ||
| (error 'sb-sys:deadline-timeout :seconds deadline-timeout)) |
There was a problem hiding this comment.
Signaling this condition without providing sb-sys:defer-deadline and sb-sys:cancel-deadline restarts violates the SBCL API we are trying to emulate.
There was a problem hiding this comment.
It can be done, although we have to decide whether we want to store the new deadline in the stream object. If you don't want to support the deadlines coming from SSL-STREAM, this branch of code can be deleted.
src/ffi.lisp
Outdated
| (when (and timeout (minusp timeout)) | ||
| (error 'sb-sys:io-timeout :seconds timeout)) | ||
| (unless (if deadline | ||
| (sb-sys:with-deadline (:seconds deadline-timeout) |
There was a problem hiding this comment.
I'm in doubt about signaling sb-sys:deadline-timeout without the user requesting it by wrapping io calls into sb-sys:with-deadline himself.
The proposed change essentially treats cl+ssl::ssl-stream-deadline slot set on a cl+ssl stream as a request to do (sb-sys:with-deadline ) inside the standard cl stream io function implementation (e.g. cl:read-char, cl:read-sequcence, etc).
Maybe original cl+ssl authors thought in this direction, but support for cl+ssl::ssl-stream-deadline was not implemented properly (the current SBCL implementation seems to lead to a spin loop waiting in case cl+ssl::ssl-stream-deadline expires, instead of signaling the sb-sys:deadline-timeout condition), and the symbol is not exported from the cl+ssl package.
As the hunchentoo's case motivating the cl+ssl fix relies on sb-sys:io-timeout, I'm thinking about limiting the cl+ssl fix by timeouts only (and the PR title speaks about timeouts only as well)
There was a problem hiding this comment.
I think it comes from Clozure. I don't understand why there may be two timeouts associated with a stream, one set via STREAM-INPUT-TIMEOUT or STREAM-OUTPUT-TIMEOUT, and the other with STREAM-DEADLINE. Apparently CL+SSL was designed with Clozure deadlines in mind. I back up the idea of dropping deadlines support in SSL-STREAM, at least for SBCL.
An implementation-dependent timeout might be set of the socket stream representing the SSL connection. This patch makes CL+SSL to respect this timeout while waiting for input/output.
Also describe them in the manual.
|
See if you like this version. I left deadlines support for Clozure. |
|
@jini-zh, sorry for the delay. I'd like to change something, but it's easier to do myyself than asking in the comments. I hope to do that maybe this weekend. |
|
@jini-zh , for you to understand the reason for the delay. 100% backwards compatibility is needed. When I say I'm not going to extend the current deadline functionality, it doesn't mean to drop it. What worked before should work now. The same for timeouts. The code which didn't have timeouts previously should not get them after the pull request. If the new timeout semantics is needed, this shoud be specified by a new parameter in the make-ssl-(client or server)-stream functions. Also, I'm trying to rely on the old testcases in the test.lisp file - make them passing now and make sure they are not boken after this change. And I have very little free time, unfortunately. |
8d920fe to
300ab3a
Compare
a7d1bcd to
701e645
Compare
7da553f to
f1ac20a
Compare
f2b0e3b to
ed25ffc
Compare
58cba6b to
48f88ce
Compare
93d302f to
13d824e
Compare
7f45344 to
d3e3cc9
Compare
Hunchentoot with an SSL acceptor and one thread per connection taskmaster results in stale threads on the server. This taskmaster handles each connection in its own thread. The thread opens a socket with the help of the acceptor, reads and processes the request and then waits with a specified timeout for possible further requests. The timeout is set when the socket is opened in an implementation defined way (see set-timeouts.lisp in the Hunchentoot code). In SBCL, the timeout is set by changing the value in the SB-SYS:FD-STREAM structure. CL+SSL wraps the SB-SYS:FD-STREAM returned by USOCKET with an SSL-STREAM which is then used for all input/output. In particular, when the thread waits for further requests, it ends up in the INPUT-WAIT function which calls SB-SYS:WAIT-UNTIL-FD-USABLE with TIMEOUT set to NIL. I'm not sure why, but it seems that if the connection is lost due to a network failure, the socket is never notified about it, and the thread just hangs forever.
ca79378 fixes that by using the timeout parameter of the underlying stream along with the deadline parameter of the SSL-STREAM to calculate the proper argument to SB-SYS:WAIT-UNTIL-FD-USABLE.
I don't know Clozure, so I didn't modify its branch of code, just made it look similar to SBCL.
That didn't solve the problem, however, because ENSURE-SSL-FUNCALL waits forever for the proper end of communication. I'm not sure what would be the best API to inform the caller about the timeout. 07650c4 makes ENSURE-SSL-FUNCALL to signal a TIMEOUT condition with a RETRY restart established.
The following code demonstrates the problem. If the client never ends communication, the server thread hangs forever. With the patches applied, it catches the TIMEOUT signal and terminates.