In TLSProxy::Proxy::clientstart(), don't cut off all com at once#5067
In TLSProxy::Proxy::clientstart(), don't cut off all com at once#5067levitte wants to merge 2 commits intoopenssl:masterfrom
Conversation
|
Hopefully, this will fix the strange test failures we've seen on Travis lately, and possibly similar issues I've observed on Cygwin as well |
|
I am not sure if a |
util/perl/TLSProxy/Proxy.pm
Outdated
There was a problem hiding this comment.
I think here we need to disable SIGPIPE with:
SIG{PIPE} = "IGNORE";
Yeah, but note how in the change here, I avoid writing to a socket that has been removed from |
|
So if you wouldn't mind running your loop with this PR, that would be great! |
|
And sure, it really needed, I will add an ignore of SIGPIPE |
util/perl/TLSProxy/Proxy.pm
Outdated
There was a problem hiding this comment.
I wonder why the "Connection closed" is no longer in the V=1 trace.
There was a problem hiding this comment.
No "Server connection closed" or "Client connection closed" either?
There was a problem hiding this comment.
so, why is the new print to STDERR, and guarded by self->debug ?
There was a problem hiding this comment.
You know, good point. I'll add that back...
There was a problem hiding this comment.
Yes, not a single "Server|Client connection closed",
only one "Error on client end: Connection reset by peer"
|
I am now running this PR with added SIG{PIPE} = "IGNORE" |
|
For maximum data coverage, I'd encourage you to continue that test loop, and then do another without No pressure, though |
|
My first build+test at commit c288c60fe8347e662688f9644e56ce29941e9b86 managed to get a But that used my normal build invocation that lacked |
|
Yep, got the same, after 49 iterations, |
|
I wonder what he cost/benefit analysis would be for having output from the different processes be sent to separate logfiles or prefixed with a pid+timestamp or something. |
|
Good questions. Feel free to experiment, it's a good thing if a few more get acquainted with the TLSProxy code. Me, I need to call it a night... |
|
With this patch failure mode is a bit different, "No progress made" then test breaks immediately. |
|
I'm interested in that log file |
|
So as requested, log file, and strace output from a41a612 with your patch. do you understand how to read the strace file, or need help ? |
|
fail-sslrecords.txt |
|
In the attached strace file I see somthing that I don't understand. grep ^26374 test10-strace.log prints the proxy process what I think I see there, is the client does: but the proxy receives only the test and then ECONNRESET. The SIGPIPE patch does not fix this, |
|
That's exactly the behavior I have observed on my cygwin installation. Why my patch makes it worse, I cannot say... Gut feeling, I'm guessing that the client is taken down before all packets are actually sent out, and that cancels all packet queues. I don't know why this happens yet, and don't have time today, but I would start looking at |
|
This should probably be raised as a separate issue, BTW... |
c288c60 to
d7f2467
Compare
|
I think I have now seen tests failures (without this patch) where |
|
I tried this to force the client to send the ALERT after the server exits, but did not work: make test pass without this patch... |
|
Ok, sounds like this isn't necessary then. Closing... |
|
Oh, err, wait, I seem to be getting mixed messages... @bernd-edlinger, are you telling me I should merge this (i.e. listen to your approval) or that just closing was the right thing? |
|
Yes, no... I had run the test_sslrecords loop on another machine (faster and more recent linux kernel), But my other experiment shows that the server does not race with the client Currently my I have no Idea, what went wrong. When I start tracing too much |
|
for completeness this is how the test failed after 4939 iterations: intersting, where is the "DONE", did the client just crash? |
|
and a second fail after 1289 iterations: This time there is a "DONE" followed by "[test", "[tset", No progress. Theoretically possible that there was a packet loss, due to the RST The linux version is: Linux version 4.13.0-26-lowlatency (buildd@lgw01-amd64-031) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1 |
|
While I wrote this, the very same test ran more than 10.000 iterations so I cannot reproduce any more. |
This made me realise that when the client end shuts down its writing end, this patch keeps the proxy connection to the server up and vice versa. It's incomplete in that respect, and then we end up in exactly the situation quoted there, there's nothing more coming from the client end, and the server end keeps hanging, waiting for more. I'll fix |
Whenever the client or the server would cut off communication, the proxy would immediately take down the proxy entirely. However, it's very possible that just because one side stopped communicating, the other side might still have a packet on its way. Since the proxy runs filters, and most of all checks for specific records to see if the communication was successful or not, it seems important that none of them are ignored. Therefore, recode the loop to simply ignore the channel where communication has stopped, but continue to receive data on the other until that one shuts down as well.
|
... thinking more about it, it's not entirely clear to me that doing so will make this change differ very much from the code before this change... |
d7f2467 to
878cd18
Compare
|
But anyway, fix committed, and the whole thing rebased on top of fresh master |
|
Well, yes, using the latest patch the behavior changed slightly, but still not ok: So maybe there is still a problem in the client shutdown sequence. |
|
Okay, I have been able to reproduce and capture the conversation with tcpdump: The capture shows the problem starts in the server: Funny, there is first a RST and then something like an ALERT from the server. And why is the proxy adding one trailing byte to each packet? |
Hmm, this was actually from the client. |
Not sure what you mean there... |
|
I mean in the capture file packet 15 is from server -> proxy 16 what is that? |
See #5370. So what is happening with this PR now? |
Dunno, really... with all the other changes we made, it seems this one isn't needed any more, possibly because it didn't address the actual issue. I'd say that's grounds for just closing it, right? |
|
There capture in my previous comment shows a sort of race condition, I think the packets arrive not in the same order as they are sent to the loopback device. I think this happens way too rarely to be worth the effort. If we want to fix that really, we would not only forward the FIN packets |
|
So I think the conclusion from the above is that this should be closed. |
Whenever the client or the server would cut off communication, the
proxy would immediately take down the proxy entirely. However, it's
very possible that just because one side stopped communicating, the
other side might still have a packet on its way.
Since the proxy runs filters, and most of all checks for specific
records to see if the communication was successful or not, it seems
important that none of them are ignored.
Therefore, recode the loop to simply ignore the channel where
communication has stopped, but continue to receive data on the other
until that one shuts down as well.