[RFC] gnrc_tcp_recv(): fix connection closed by remote host not handled by benemorius · Pull Request #10899 · RIOT-OS/RIOT

benemorius · 2019-01-29T22:26:11Z

Contribution description

I'm not certain this is right, but it fixed the problem I was having and it seems like it's right. Without this PR, I don't see how application code calling gnrc_tcp_recv() is notified of the remote host closing the connection. Documentation doesn't say what it should return when the remote end has closed. Currently it just continues blocking until timeout (or else it returns -EAGAIN), so we aren't able to tell that the remote host has closed the connection. This seems wrong.

The way recv() works on linux is that it returns 0 if the remote end has closed. The application usually responds to this by closing the local end if it has no more data to send. It seems to me that gnrc_tcp_recv() should return 0 so that the application can then call gnrc_tcp_close().

Otherwise what happens is that the connection gets stuck half open because the application never learns that it should close it. That is, tcb->state gets stuck in FSM_STATE_CLOSE_WAIT and new connections from a client are rejected until the riot node is reset.

Testing procedure

We don't have anything in tests or examples that uses gnrc_tcp_recv() in a way that can test what happens when the remote host closes the connection. What I'm testing with is a telnet server that I implemented against gnrc_tcp. I can connect with a telnet client and then disconnect, but without this PR subsequent connection attempts are refused. With this PR, the connection is closed properly and subsequent connections are successful.

Maybe the telnet server will compile for many platforms but unfortunately I don't have it alone in a clean branch right now and that branch has some hacks in core and sys, so really it's a very non-ideal way to be testing this.

Normally I'd want to stop and write a stripped down test for this, but I don't have a lot of time and this seems like something we should try to address in time for the 2019.01 release unless I'm just mistaken about how it's supposed to work. Anyway I wanted to see what anyone else thought about it since I don't know when I'll have time to continue with it.

brummer-simon

Hi benemorius,

I took a brief look into the TCP RFC regarding the handling of the receive call (https://tools.ietf.org/html/rfc793#page-59). Your bug fix is not consistent with the specified behavior.

It is possible that an incoming packet with FIN set contains payload. This means there could be received data in the TCB. The early return would make already received data in CLOSE-WAIT state unavailable to the calling code.

I'll take a more detailed look into the recv behavior and implementation during next the weekend and give you a more detailed on the topic.

Regarding the lack of test code. There is a PR already in Progress, addressing this issue.
I have currently not much spare time to work on this testing PR but if you want to help you are more than welcome to help me. Here is the current state of the PR: https://github.com/brummer-simon/RIOT/tree/gnrc_tcp-improve_tests.

brummer-simon · 2019-01-30T09:28:36Z

On the Topic of "new connections from a client are rejected until the riot node is reset". That might be valid behavior. RIOT allows per default only a single TCB to be allocated at a time(however this can be changed via a define).

Depending on the order of the incoming packets for the shutdown sequence, the RIOT TCB can enter a state (TIME-WAIT), where it has to wait for remaining packets on the air associated the closed connection (default value is two minutes). Until that time passed the TCB is not allowed to be reused.
For further details see: https://tools.ietf.org/html/rfc793#page-39

In case of a linux system, a new TCB would be allocated and used to accept that new connection.
On RIOT this is not an option. Basically you are forced to wait until the TCB can be reused.

Are you sure that the behavior you are observing it not caused by this waiting period?

benemorius · 2019-01-30T13:02:25Z

On the Topic of "new connections from a client are rejected until the riot node is reset". That might be valid behavior.

Yes I'm sure that this part is expected behavior although it isn't from being stuck in TIME-WAIT. The TIME-WAIT state is never actually reached as that can't happen until after gnrc_tcp_close() is called.

The unexpected part is being unable to call gnrc_tcp_close() due to gnrc_tcp_recv() not giving indication that it has entered CLOSE-WAIT.

benemorius · 2019-01-30T13:26:46Z

I think you're right about the early return. I can fix that but I wanted to wait for confirmation that there's really a bug here before going any further. Maybe everything is fine and I'm just using it wrong. I just can't see any other way that I'm supposed to detect the CLOSE-WAIT state and call gnrc_tcp_close().

brummer-simon · 2019-01-30T13:36:22Z

I'll take a look into the entire issue next weekend (I want to take a look how CLOSE-WAIT is handled in the internal FSM). After that, I can give you more detailed feedback on the issue.

But it certainly looks like you stumbled upon a bug.

brummer-simon · 2019-02-03T15:07:38Z

Hi benemorius,

After looking at the current implementation I think you found something there.

As you said the problem is that the caller is blocked in the recv()-call until the next data carrying packet arrives. If this packet doesn't arrive you get the error that your requested timeout. So far so good.
In cases where you read anything from the recv buffer and the last packet contained the FIN-Flag, it makes no sense to wait for new Payload because the connection is terminated already.

I would suggest the following semantics for your fix:
The RFC says that even if the TCB is in the CLOSE-WAIT state, a recv-call must allow to read from the recv-buffer. At first I would check if the TCB is in the CLOSE-WAIT state. If so, try to read Data from the receive buffer(_fsm_call_recv() in gnrc_tcp_fsm.c). If it is not empty (return code > 0, it is the number of read bytes) return this code. In case the buffer was empty, return an appropriate return code.

On the return code: Currently either the number of read bytes or an error code is returned, so using zero to indicate that the connection was closed is fine, although please update the documentation in the header file because it is a small change in the semantics.

Additionally please add something to _fsm_call_recv():
As soon as the data was read from the receive buffer, there is a check to the receive buffer can hold new data >= MMS. If so it sends a Packet to the peer announcing the new free window size. Please add a check that this window update is not sent in CLOSE-WAIT State. I think it doesn't make sense to reopen the receive window because the peer announced that it has nothing to send any more. I can't find anything on this in the RFC but I think prevents unnecessary packets from being sent.

stale · 2019-08-10T02:07:54Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

miri64 · 2019-10-03T16:40:54Z

See #12367

benemorius added 2 commits January 29, 2019 13:31

gnrc_tcp_recv(): fix typo in debug output

67e3e6e

gnrc_tcp_recv(): return 0 if connection was closed by remote host

5d66837

PeterKietzmann requested a review from brummer-simon January 30, 2019 08:00

PeterKietzmann added Area: network Area: Networking Area: tests Area: tests and testing framework labels Jan 30, 2019

brummer-simon approved these changes Jan 30, 2019

View reviewed changes

brummer-simon suggested changes Jan 30, 2019

View reviewed changes

miri64 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR labels Jan 31, 2019

stale bot added the State: stale State: The issue / PR has no activity for >185 days label Aug 10, 2019

miri64 removed the State: stale State: The issue / PR has no activity for >185 days label Aug 10, 2019

brummer-simon mentioned this pull request Oct 3, 2019

gnrc_tcp: return immediatly on gnrc_tcp_recv if a connection is closing #12367

Merged

miri64 closed this Oct 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] gnrc_tcp_recv(): fix connection closed by remote host not handled#10899

[RFC] gnrc_tcp_recv(): fix connection closed by remote host not handled#10899
benemorius wants to merge 2 commits intoRIOT-OS:masterfrom
benemorius:pr/gnrc_tcp-remoteclosed

benemorius commented Jan 29, 2019

Uh oh!

brummer-simon left a comment •

edited

Loading

Uh oh!

brummer-simon commented Jan 30, 2019

Uh oh!

benemorius commented Jan 30, 2019 •

edited

Loading

Uh oh!

benemorius commented Jan 30, 2019 •

edited

Loading

Uh oh!

brummer-simon commented Jan 30, 2019 •

edited

Loading

Uh oh!

brummer-simon commented Feb 3, 2019 •

edited

Loading

Uh oh!

stale bot commented Aug 10, 2019

Uh oh!

miri64 commented Oct 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

benemorius commented Jan 29, 2019

Contribution description

Testing procedure

Uh oh!

brummer-simon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brummer-simon commented Jan 30, 2019

Uh oh!

benemorius commented Jan 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benemorius commented Jan 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brummer-simon commented Jan 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brummer-simon commented Feb 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stale bot commented Aug 10, 2019

Uh oh!

miri64 commented Oct 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

brummer-simon left a comment •

edited

Loading

benemorius commented Jan 30, 2019 •

edited

Loading

benemorius commented Jan 30, 2019 •

edited

Loading

brummer-simon commented Jan 30, 2019 •

edited

Loading

brummer-simon commented Feb 3, 2019 •

edited

Loading