Conversation
|
Nice working solving these problems @Mellvik! Fixes look good to me. I am a bit interested to know whether ktcp shows any signs of running out of memory with the expanded send window and max retransmit queue sizes, given the number of simultaneous connections you're testing with. |
|
Thanks, @ghaerr. Two heavy outgoing file transfers now run fine at great speed - same segment and through the router, even if disturbed by other activities. The litmustest that really pushes ktcp and the retransmit implementation is 3 incoming telnet sessions running IOW - the jury is still out ... |
|
@ghaerr, As to the dynamic task array: I ported it and it seems to work fine, but as we're running low on memory, processes start to hang. The ktcp problem prevented me from diving into this one right away, but just out of curiosity - and before diving in: How much has this feature been tested? My setting is 20 tasks by the way. |
Oh geez, that's too bad. It looks as though you're on to something though. This first fix only doubles the initial RTO in one case, perhaps RTO needs to be calculated over a longer time period. I don't have enough data on how RTO and retransmit works to suggest anything intelligent at this point.
I tested the feature very heavily and believe it works quite well, as I was careful not to change anything structurally other than using a pointer for Moving the task array into far memory and/or allocating different stack sizes per task turns out to be extremely complicated, so unfortunately tasks remain a heavy user of kernel heap. I added dynamic inodes later in an attempt at allowing a smaller inode store to increase kernel memory for tasks. Ultimately, all of this points to the huge need to get all driver buffers out of the kernel heap unless absolutely necessary. A rule of thumb would be one kernel buffer is about the same as an additional task in terms of data usage. |
|
OK, I'm beginning to understand how
IOW - the RTT/RTO mechanism simply doesn't lend itself to this particular scenario and we need to come up with something else, either a different mechanism or an add on, the latter being the preferred until we see how this can work in practice. I found out from commented-out code in
Anyway, now that the picture is clear(er), it is possible to think about possible solutions. |
|
@Mellvik, super writeup and explanation on RTT/RTO. I don't think I ever fully understood it myself, but this helps. To help me understand, is this PR fix trying to solve multiple problems (e.g. increasing max send window to allow for more time for ACKs so that efficiency is increased, and also trying a new RTT/RTO algorithm because of our previous multiple-ACK problem/hack), or something else? It seems the initial issue was that you thought the send window size increase could solve the efficiency issue and allow removal of the read delay kluge hack. Did that get partially solved, or are you finding that somehow retransmits are getting involved in all of this, thus leading to the RTT/RTO calc issues? |
|
Actually, @ghaerr - with the upcoming update to the pr, both are fixed. It is purely incidental that the two fixes virtually end up in adjacent lines in the code. The original problem was fixed in the pr as is and it's truly a relief to have found and fixed (testing now) the second one - finally. The fix needs testing in a real high latency environment to make sure nothing is broken. Planned for tomorrow. Then back to the dynamic task array testing. Thanks. |
This sounds like a potentially good idea. If the problem is that the retransmit buffer is unloading everything all at once when a single RTO expires, perhaps the first retrans packet should be sent, and then either the remaining RTT/RTOs adjusted and/or the send window itself decreased so that the retrans packets go out more slowly. That is, when the machine-to-machine communication "clogs up" and effectively halts, awaiting a retransmit (or provoking other problems with a faked double ACK), the sender might want to start sending quite differently than it otherwise would have, by starting quite slow with a single packet, then speeding up slowly. This could possibly be done using RTT/RTO fixups like you're thinking but applying them to all the packets in the retransmit queue as well, and adjusting their first/next retrans start time. I hope I'm making sense here, I probably need to dive in to help more specifically. I sure wish there was an easy way to duplicate this using emulators, as of course localhost connections don't even hit the NIC cards. |
|
Thanks @ghaerr - you are making sense. -M |
You're definitely on to something, @ghaerr. Got me thinking through the whole thing again. It's tempting now that the problem is (more ) understood, to throw in a fix and move on. Only to find that the not-so-well understood consequences bite back later. I've rigged my test environment to have a machine in Amsterdam log into the local TLVC system to get a picture of 'normal' jitter, delays and occasional retransmits. ktcp handles it well and this becomes - to me - a baseline, as in 'it ain't broken, don't fix'. Btw - from the above, ktcp does not really dump all the packets in the retrans buffer on the recipient, it just looks that way. They all time out, because the timeout is too short. So the idea of letting the first retrans go and then adjust the timeout in the remaining packets is in fact very viable. More about that later. Here's the thing, the challenge: we're dealing with a special case, one that breaks the rules of 'normality' sufficiently to create real problems. How do we (1) recognize this particular situation and (2) handle it? Before doing more experimentation (not that it has been in vain, I've learned a lot), those 2 problems must be assessed, the first one by carefully studying the logs/traces (again). I'll have something to that effect tomorrow. |
This PR fixes the
ktcpperformance problem described in #67 by increasing the allowed max send window. The problem popped up only on relatively fast systems and only if there was a router between the TLVC system and the peer. The fix eliminates a wait/wakeup cycle and allows the TLVC sender to send packets more or less continuously, reaching the same level of performance as if the hosts were on the same network segment.Also fixed is an old
ktcpproblem which caused lots of unneccessary retransmissions when sending to slow host. The retransmit timeout calculation simply didn't work in such settings. The fix is to check the roundtrip time before setting the first retransmit timeout. If the peer system is found to be very slow or heavily loaded, double the initial timeout.Included:
af_inet.cfile.