TCP/IP Essentials A Lab-Based Approach
Chapter 6
TCP Study
Shivendra Panwar, Shiwen Mao Jeong-dong Ryoo, and Yihan Li
TCP Overview
Transport layer protocol Provides connection-oriented, reliable service to applications, such as HTTP, email, FTP, telnet. Only support unicast. Features:
Error
control Flow control Congestion control
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Header Format
IP header TCP header
20 bytes 20 bytes
TCP data
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Header Fields
Source Port Number:
16 bits. The port number of the source process. 16 bits. The port number of the destination process. 32 bits. Identifies the byte in the stream of data from the sending TCP to the receiving TCP that the first byte of data in this segment represents. 32bits. The next sequence number that the host wants to receive.
4
Destination Port Number:
Sequence Number:
Acknowledgement Number:
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Header Fields
Header Length
4 bits. The length of the header in 32-bit words.
Reserved:
6 bits. Reserved for future use.
16 bits. The maximum number of bytes that a receiver can accept. 16bits. Covers both the TCP header and TCP data.
5
Winder Size:
TCP Checksum:
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Header Fields
Flags: 6 bits
URG: an urgent message is being carried. ACK: the acknowledgment number is valid. PSH: a notification from the sender to the receiver that it should pass all the data received to the application as soon as possible. RST: signals a request to reset the TCP connection. SYN: set when initiating a connection. FIN: set to terminate a connection. 16 bits. If the URG flag is set, the pointer points to the last byte of the urgent message in the TCP payload.
6
Urgent Pointer
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Connection
Source and destination port numbers identify the sending and receiving application processes, respectively. Socket: the combination of and IP address and a port number. A TCP connection is uniquely identified by the two end sockets.
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Connection Management
TCP connection establishment: two end TCP modules
allocate
required resources for the connection, and Negotiate the value of the parameter uses, such as
Maximum segment size Receiving buffer size Initial sequence number (ISN)
TCP connection termination TCP timers
8
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Connection Establishment
Three-way handshake
An end host initiates a TCP connection by sending a packet with
ISN, n, in the sequence number field An empty payload field MSS, and TCP receiving window size SYN flag bit is set. ACK=n+1 Its own ISN, m MSS, and TCP receiving window size
The other end replies a SYN packet with
The initiating host sends an acknowledgement: ACK=m+1
9
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Connection Termination
A TCP connection is full duplex. Each end of the connection has to shut down its one-way data flow. After termination performed, the connection must stay in the TIME_WAIT state for twice the maximum segment life (MSL) to wait for delayed segments. If an unrecoverable error is detected, either end can close the TCP connection by sending a RST segment.
10
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Connection Termination
Four-way handshake
TCP
Half-Close
One end TCP sends a packet with the FIN flag set. The other end acknowledges the FIN segment. The data flow in the opposite direction still works.
Do
Half-close in the opposite direction.
11
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Timers
TCP Connection Establishment Timer
The
maximum period of time TCP keeps on trying to build a connection before it gives up.
if no ACK is received for a TCP segment when this timer expires. for delayed ACK in TCP interactive data flow.
TCP Retransmission Timer
Retransmit
Delayed ACK Timer
Used It
TCP Keepalive Timer
reminds a station to check if the other end is still alive when a TCP connection has been idle for a long time.
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
12
TCP Timers
TCP Persist Timer
Used in TCP flow control in the case of a fast transmitter and a slow receiver. While the advertised window size from the receiver is zero, the sender will probe the receiver for its window sizes when the timer times out. Uses the Exponential Backoff algorithm. Used in TCP connection termination. It is the period of time that a TCP connection keeps alive after the last ACK packet of the four-way handshake is sent. Prevents the delayed segments of a previous TCP connection from being interpreted as part of a new connection that uses the same sockets.
13
Two Maximum Segment Life Wait Timer
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Data Flow
http://histrory.visualland.net/tcp_fast_retransmit.html
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
14
TCP Error Control
TCP segments may be lost or out of order.
TCP
uses IP service. IP is connectionless and unreliable.
TCP provides error control for application data by retransmitting lost or errored TCP segments.
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
15
Error Detection
Each data byte is assigned a unique sequence number. TCP uses positive acknowledgments to inform the sender of the last correctly received byte. Error detection is performed in each layer of the TCP/IP stack by means of header checksums, and error packets are dropped. If a segment is dropped, an acknowledgement will be sent to the sender for the 1st byte in this segment. A gap in the received sequence numbers indicates a transmission loss or wrong order, and an acknowledgment for the first byte in the gap may be sent to the sender.
16
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
Error Detection
Selective acknowledgement
A window of TCP segments may be sent and received before an acknowledgment is received by the sender. Selective acknowledgment (SACK) is used to report multiple lost segments. The two ends use the TCP Sack-Permitted option to negotiate if SACK is allowed while a TCP connection is being established. The receiver uses the TCP Sack option to acknowledge all segments that has been successfully received in the last window of segments, and the sender can retransmit more than one lost segment at a time.
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
17
TCP Retransmission
A retransmission timer is started for each TCP segment sent on the sender side. If no ACK received when the timer expires, retransmit the segment. The value of the retransmission timer is critical to the TCP performance.
An overly small value causes frequent timeouts and unnecessary retransmissions. A too large value cases a large delay when a segment is lost. The value should be larger than but of the same order of magnitude as the RTT. TCP continuously measure the RTT and updates the retransmission timer value dynamically.
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
18
RTT Measurement
The time difference between sending a segment and receiving the ACK is measured. The measured delay is call one RTT measurement, denoted by M. Compute the retransmission timeout (RTO):
RTTs: smoothed RTT, set to the first measured RTT, M0. RTTd; smoothed RTT man deviation, RTT0d= M0/2 RTO0=RTTs0+max{G, 4xRTTd0}, G is the timeout interval of the base timer. For the ith measured RTT value Mi: (=1/8, =1/4)
If RTO is less than 1 sec, round up to 1 sec.
19
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
Retransmission Timer
In some systems, a base timer that goes off every, e.g., 500ms, is used for RTT measurements. The measured RTT is M = t x 500ms if there are t base timer ticks during a measurement. All RTO timeouts occur at the base timer ticks.
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
20
RTO Exponential Backoff
RTT measurement is not performed for a retransmitted TCP segment. Karns Algorithm: RTTs and RTTd are not updated.
Exponential Backoff algorithm is used to update RTO when the retransmission time expires for a retransmitted segment.
RTO is doubled for each retransmission, but with a maximum value of 64 sec.
21
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Interactive Data Flow
TCP supports interactive data flow for interactive user applications
telnet ssh A user keystroke is first sent from the user to the server. The server echoes the key back to the user and piggybacks the acknowledgment for the key stroke. The user sends an acknowledgment to the server for the received echo segment, and displays the echoed key on the screen.
Reduce the delay experienced by the user:
Reduce the number of small segments to be more efficient:
Delayed Acknowledgment Nagle Algorithm
22
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
Delayed Acknowledgment
Delay acknowledgment timer goes off every Kms (e.g., 50ms). TCP delays sending the ACK for a data segment until the next tick of the delayed acknowledgment timer,
If
there is new data to send during this period, the ACK can be piggybacked with the data segment. Otherwise, an ACK segment is sent.
An ACK may be delayed from 0ms up to Kms.
23
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
Nagle algorithm
Each TCP connection can have only one small segment outstanding (not been acknowledges). TCP sends one byte and buffers all subsequent bytes until an acknowledgment for the first byte is received. All buffered bytes are sent in a single segment. More efficient than sending multiple segments, each with one byte of data, at the cost of increased delay for the user.
24
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Bulk Data Flow
TCP supports bulk data flows, where a large number of bytes are sent through the TCP connection. Applications: Email, FTP, HTTP. Congestion may occur in the case of a fast transmitter and a slow receiver, and packets will be dropped when buffer is full.
The source always wants to increase sending rate to achieve high throughput. The source rate should be bounded by the maximum rate that can be allowed without causing congestion or receiver buffer overflow for a low packet loss rate.
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
25
Congestion Control and Flow Control
Congestion control and flow control are used to cope with congestion problems. Let the source be adaptive to the buffer occupancies in the routers and the receiver. TCP used slow start and congestion avoidance to react to congestion in routers and to avoid receiver buffer overflow.
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
26
TCP Sliding Window Flow Control
The receiver advertises the maximum amount of data it can receive (the Advertised Window, or awnd). The sender is not allowed to send more data than the advertised window.
http://histrory.visualland.net/tcp_swnd.html
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
27
TCP Sliding Window Flow Control
The receiver notifies the sender
The next segment it expects to receive (Acknowledgement Number) The amount of data it can receive (Window Size)
The sliding window
Wl moves to the right when a new segment is acknowledged. Wm moves to the right when new segments are sent. Wr moves
To the right when a larger window is advertised by the receiver or when new segments are acknowledged, To the left when a smaller window is advertised.
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
28
TCP Congestion Control
TCP uses a congestion control to adapt to network congestion and achieve a high throughput. Usually the buffer in a router is shared by many TCP connections and other non-TCP data flows. TCP needs to adjust its sending rate in reaction to the rate fluctuations of other flows sharing the same buffer.
A new TCP connection should increase its rate as quickly as possible to take all the available bandwidth. TCP should slow down its rate increase when the sending rate is higher than some threshold.
The sender can infer congestion when a retransmission timer goes off. The receiver reports congestion implicitly by sending duplicate acknowledgements.
29
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Congestion Control
The sender maintains two variables for congestion control:
Congestion wider size (cwnd): to upper bound the sender rate. Slow start threshold (ssthresh)
Slow start and congestion avoidance
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
30
TCP Congestion Control
Fast retransmit
After receiving three duplicate acknowledgments, the sender retransmits the segments without waiting for the retransmission timer to expire. After the retransmission, congestion avoidance is performed, Initial cwnd = ssthresh + one segment size.
Fast recovery
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
31
TCP Congestion Control
The evolution of swnd and ssthresh for a TCP connection, including
Slow start and Congestion avoidance
cwnd has two phases: an exponential increase phase and a linear increase phase. cwnd drops drastically when there is a packet loss.
Fast retransmit and fast recovery, occur at time around 610, 740, 950.
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
32
Tuning the TCP/IP Kernel
TCP/IP parameters
A
set of default values may not be optimal for all applications. The network administrator may wish to turn on or off some TCP/IP functions for performance or security considerations.
Many Unix and Linux systems provide some flexibility in tuning the TCP/IP kernel.
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
33
Tuning the TCP/IP Kernel in Red Hat Linux
/sbin/sysctl is used to configure the Linux kernel parameters at runtime.
Default
kernel configuration file is /sbin/sysctl.conf. Frequently used sysctl options:
sysctl a or sysctl A: list all current values. sysctl p file_name: load the sysctl setting from a configuration file. sysctl w variable=value: change the value of the parameter
TCP/IP related kernel parameters are stored in /proc/sys/net/ipv4/. Files can be modified directly to change setting.
34
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
TCP Diagnostic Tools
The Distributed Benchmark System (DBS) NIST Net Tcpdump output of TCP packets
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
35
The Distributed Benchmark System (DBS)
A benchmark for TCP performance evaluation. Can be used to run tests with multiple TCP connections or UDP flows. Three tools:
dbsc:
the DBS test controller dbsd: the DBS daemon, running on each host participating in the test dbs_view: a Perl script file, used to plot the experiment results
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
36
The Distributed Benchmark System (DBS)
DBS uses a command file to describe the test setting, which specifies
How many TCP or UDP flows to generate The sender and receiver for each flow The traffic pattern and duration of each flow Which statistics to collect.
One host services as the controller, running dbsc, and all other hosts are DBS hosts, running dbsd.
(1)
(2)
(3)
The controller reads the command file and send instructions to all DBS hosts. TCP connections will be set up between the hosts and traffic is transmitted on them. When the data transmissions are over, the controller collects statistics.
37
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
NIST Net
A Linux-based network emulator Can be used to emulate various network conditions:
Packet loss, duplication, delay and jitter, bandwidth limitations, network congestion
A host running NIST Net serves as a router between two subnets. NIST Net works like a firewall. A user can specify a connection and enforce a policy on it.
38
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
Tcpdump Output of TCP Packets
General format
An example
Panwar, Mao, Ryoo, Li: TCP/IP Essentials
39