Chapter 3
Transport Layer
Dr/ Hala Hassan
Chapter 3 outline
3.1 transport-layer services 3.5 connection-oriented transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP
• connection management
3.4 principles of reliable
3.6 principles of congestion control
data transfer
3.7 TCP congestion control
Transport Layer 3-2
TCP: Overview RFCs: 793,1122,1323, 2018, 2581
point-to-point: full duplex data:
one sender, one receiver • bi-directional data flow in
same connection
reliable, in-order byte steam:
• MSS: maximum segment size
no “message boundaries”
connection-oriented:
pipelined:
• handshaking (exchange of
TCP congestion and flow control msgs) inits sender,
control set window size receiver state before data
exchange
flow controlled:
• sender will not overwhelm
receiver
3-3
TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UAP R S F receive window
(generally not used) # bytes
checksum Urg data pointer
rcvr willing
RST, SYN, FIN: to accept
options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)
Transport Layer 3-4
TCP seq. numbers, ACKs
outgoing segment from sender
sequence numbers: source port # dest port #
sequence number
byte stream “number” of acknowledgement number
rwnd
first byte in segment’s data checksum urg pointer
acknowledgements: window size
N
seq # of next byte expected
from other side
sender sequence number space
cumulative ACK
Q: how receiver handles out-of- sent sent, not- usable not
ACKed yet ACKed but not usable
order segments (“in- yet sent
flight”)
A: TCP spec doesn’t say, - incoming segment to sender
up to implementor source port # dest port #
sequence number
acknowledgement number
A rwnd
Transport Layer 3-5
checksum urg pointer
TCP seq. numbers, ACKs
Host A Host B
User
types
‘C’ Seq=42, ACK=79, data = ‘C’
host ACKs
receipt of
‘C’, echoes
Seq=79, ACK=43, data = ‘C’ back ‘C’
host ACKs
receipt
of echoed
‘C’ Seq=43, ACK=80
simple telnet scenario
Transport Layer 3-6
TCP round trip time, timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value? SampleRTT: measured
longer than RTT time from segment
•but RTT varies transmission until ACK
receipt
too short: premature
timeout, unnecessary ignore retransmissions
retransmissions SampleRTT will vary, want
too long: slow reaction estimated RTT “smoother”
to segment loss average several recent
measurements, not just
current SampleRTT
Transport Layer 3-7
TCP round trip time, timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
exponential weighted moving average
influence of past sample decreases exponentially fast
typical value: = 0.125 RTT: [Link] to [Link]
350
RTT (milliseconds)
300
250
RTT (milliseconds)
200
sampleRTT
150
EstimatedRTT
Transport Layer 100
1 8 15 22 29 36 43 50 57 64 71 78 85
3-892 99 106
time (seconnds)
time (seconds)
SampleRTT Estimated RTT
TCP round trip time, timeout
timeout interval: EstimatedRTT plus “safety margin”
large variation in EstimatedRTT -> larger safety margin
estimate SampleRTT deviation from EstimatedRTT:
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically, = 0.25)
TimeoutInterval = EstimatedRTT + 4*DevRTT
estimated RTT “safety margin”
Transport Layer 3-9
TCP reliable data transfer
TCP creates rdt service
on top of IP’s unreliable
service
pipelined segments
let’s initially consider
cumulative acks simplified TCP sender:
single retransmission ignore duplicate acks
timer
ignoreflow control,
retransmissions triggered congestion control
by:
timeout events
duplicate acks 3-10
TCP sender events:
data rcvd from app: timeout:
create segment with seq # retransmit segment that
seq # is byte-stream number caused timeout
of first data byte in restart timer
segment
ack rcvd:
start timer if not already if ack acknowledges
running previously unacked
• think of timer as for oldest segments
unacked segment
• update what is known to be
• expiration interval: ACKed
TimeOutInterval • start timer if there are still
unacked segments
Transport Layer 3-11
TCP sender (simplified)
data received from application above
create segment, seq. #: NextSeqNum
pass segment to IP (i.e., “send”)
NextSeqNum = NextSeqNum + length(data)
if (timer currently not running)
L start timer
NextSeqNum = InitialSeqNum wait
SendBase = InitialSeqNum for
event timeout
retransmit not-yet-acked segment
with smallest seq. #
start timer
ACK received, with ACK field value y
if (y > SendBase) {
SendBase = y
/* SendBase–1: last cumulatively ACKed byte */
if (there are currently not-yet-acked segments)
start timer
Transport Layer
else stop timer 3-12
}
TCP: retransmission scenarios
Host A Host B Host A Host B
SendBase=92
Seq=92, 8 bytes of data Seq=92, 8 bytes of data
Seq=100, 20 bytes of data
timeout
timeout
ACK=100
X
ACK=100
ACK=120
Seq=92, 8 bytes of data Seq=92, 8
SendBase=100 bytes of data
SendBase=120
ACK=100
ACK=120
SendBase=120
Transport lost
Layer ACK scenario premature
3-13 timeout
TCP: retransmission scenarios
Host A Host B
Seq=92, 8 bytes of data
Seq=100, 20 bytes of data
timeout
ACK=100
X
ACK=120
Seq=120, 15 bytes of data
Transport Layercumulative ACK 3-14
TCP ACK generation [RFC 1122, RFC 2581]
event at receiver TCP receiver action
arrival of in-order segment with delayed ACK. Wait up to 500ms
expected seq #. All data up to for next segment. If no next segment,
expected seq # already ACKed send ACK
arrival of in-order segment with immediately send single cumulative
expected seq #. One other ACK, ACKing both in-order segments
segment has ACK pending
arrival of out-of-order segment immediately send duplicate ACK,
higher-than-expect seq. # . indicating seq. # of next expected byte
Gap detected
arrival of segment that immediate send ACK, provided that
partially or completely fills gap segment starts at lower end of gap
Transport Layer 3-15
TCP fast retransmit
time-out period often
relatively long: TCP fast retransmit
• long delay before resending if sender receives 3
lost packet
ACKs for same data
detect lost segments via
(“triple
(“triple duplicate
duplicate ACKs”),
ACKs”),
duplicate ACKs.
• sender often sends many
resend unacked
segments back-to-back segment with smallest
• if segment is lost, there
seq #
will likely be many likely that unacked
duplicate ACKs. segment lost, so don’t
wait for timeout
Transport Layer 3-16
TCP fast retransmit
Host A Host B
Seq=92, 8 bytes of data
Seq=100, 20 bytes of data
X
ACK=100
timeout
ACK=100
ACK=100
ACK=100
Seq=100, 20 bytes of data
Transport Layer fast retransmit after sender 3-17
receipt of triple duplicate ACK
TCP flow control
application
application may process
remove data from application
TCP socket buffers ….
TCP socket OS
receiver buffers
… slower than TCP
receiver is delivering
(sender is sending) TCP
code
IP
flow control code
receiver controls sender, so
sender won’t overflow
receiver’s buffer by transmitting from sender
too much, too fast
receiver protocol stack
Transport Layer 3-18
TCP flow control
receiver “advertises” free
buffer space by including to application process
rwnd value in TCP header of
receiver-to-sender segments
RcvBuffer buffered data
RcvBuffer size set via
socket options (typical
default is 4096 bytes)
rwnd free buffer space
many operating systems
autoadjust RcvBuffer TCP segment payloads
sender limits amount of
unacked (“in-flight”) data to receiver-side buffering
receiver’s rwnd value
guarantees receive buffer will
not overflow 3-19
Connection Management
before exchanging data, sender/receiver “handshake”:
agree to establish connection (each knowing the other willing to
establish connection)
agree on connection parameters
application application
connection state: ESTAB connection state: ESTAB
connection variables: connection Variables:
seq # client-to-server seq # client-to-server
server-to-client server-to-client
rcvBuffer size rcvBuffer size
at server,client at server,client
network network
Socket clientSocket = Socket connectionSocket =
newSocket("hostname","port [Link]();
3-20
number");
Agreeing to establish a connection
2-way handshake:
Q: will 2-way handshake always
work in network?
variable delays
Let’s talk
ESTAB retransmitted messages (e.g.
OK
ESTAB req_conn(x)) due to message
loss
message reordering
can’t “see” other side
choose x
req_conn(x)
ESTAB
acc_conn(x)
ESTAB
Transport Layer 3-21
Agreeing to establish a connection
2-way handshake failure scenarios:
choose x choose x
req_conn(x) req_conn(x)
ESTAB ESTAB
retransmit acc_conn(x) retransmit acc_conn(x)
req_conn(x) req_conn(x)
ESTAB ESTAB
data(x+1) accept
req_conn(x)
retransmit data(x+1)
data(x+1)
connection connection
client x completes server x completes server
client
terminates forgets x terminates forgets x
req_conn(x)
ESTAB ESTAB
data(x+1) accept
half open connection! data(x+1)
3-22
(no client!)
TCP 3-way handshake
client state server state
LISTEN LISTEN
choose init seq num, x
send TCP SYN msg
SYNSENT SYNbit=1, Seq=x
choose init seq num, y
send TCP SYNACK
msg, acking SYN SYN RCVD
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
received SYNACK(x)
ESTAB indicates server is live;
send ACK for SYNACK;
this segment may contain ACKbit=1, ACKnum=y+1
client-to-server data
received ACK(y)
indicates client is live
ESTAB
Transport Layer 3-23
TCP: closing a connection
client, server each close their side of connection
• send TCP segment with FIN bit = 1
respond to received FIN with ACK
• on receiving FIN, ACK can be combined with own FIN
simultaneous FIN exchanges can be handled
Transport Layer 3-24
TCP: closing a connection
client state server state
ESTAB ESTAB
[Link]()
FIN_WAIT_1 can no longer FINbit=1, seq=x
send but can
receive data CLOSE_WAIT
ACKbit=1; ACKnum=x+1
can still
FIN_WAIT_2 wait for server send data
close
LAST_ACK
FINbit=1, seq=y
TIMED_WAIT can no longer
send data
ACKbit=1; ACKnum=y+1
timed wait
for 2*max CLOSED
segment lifetime
CLOSED
Transport Layer 3-25
Chapter 3 outline
3.1 transport-layer services 3.5 connection-oriented transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP
• connection management
3.4 principles of reliable
3.6 principles of congestion control
data transfer
3.7 TCP congestion control
Transport Layer 3-26
Principles of congestion control
congestion:
informally: “too many sources sending too much
data too fast for network to handle”
different from flow control!
manifestations:
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem!
Transport Layer 3-27
Causes/costs of congestion: scenario 1
original data: lin throughput: lout
two senders, two
receivers Host A
one router, infinite unlimited shared
buffers output link buffers
output link capacity: R
no retransmission
Host B
R/2
delay
lout
lin R/2 lin R/2
maximum per-connection large delays as arrival rate, lin,
throughput: R/2 approaches capacity
Causes/costs of congestion: scenario 2
one router, finite buffers
sender retransmission of timed-out packet
• application-layer input = application-layer output: lin = lout
• transport-layer input includes retransmissions : lin lin
‘
lin : original data
lout
l'in: original data, plus
retransmitted data
Host A
finite shared output
TransportHost
LayerB 3-29
link buffers
Chapter 3 outline
3.1 transport-layer services 3.5 connection-oriented transport: TCP
3.2 multiplexing and • segment structure
demultiplexing • reliable data transfer
3.3 connectionless • flow control
transport: UDP
• connection management
3.4 principles of reliable
3.6 principles of congestion control
data transfer
3.7 TCP congestion control
Transport Layer 3-30
TCP congestion control: additive increase
multiplicative decrease
approach: sender increases transmission rate (window
size), probing for usable bandwidth, until loss occurs
• additive increase: increase cwnd by 1 MSS every
RTT until loss detected
• multiplicative decrease: cut cwnd in half after loss
additively increase window size …
…. until loss occurs (then cut window in half)
congestion window size
cwnd: TCP sender
AIMD saw tooth
behavior: probing
for bandwidth
Transport Layer 3-31
time
TCP Congestion Control: details
sender sequence number space
cwnd TCP sending rate:
roughly: send cwnd
bytes, wait RTT for
last byte last byte ACKS, then send more
ACKed sent, not- sent
yet ACKed bytes
(“in-
flight”) cwnd
rate ~
~ bytes/sec
sender limits transmission: RTT
LastByteSent- < cwnd
LastByteAcked
cwnd is dynamic, function of
perceived network congestion
3-32
TCP Slow Start
Host A Host B
when connection begins, increase
rate exponentially until first loss
event:
• initially cwnd = 1 MSS
RTT
• double cwnd every RTT
• done by incrementing cwnd for
every ACK received
summary: initial rate is slow but
ramps up exponentially fast
time
Transport Layer 3-33
TCP: detecting, reacting to loss
loss indicated by timeout:
• cwnd set to 1 MSS;
• window then grows exponentially (as in slow start) to threshold, then
grows linearly
loss indicated by 3 duplicate ACKs: TCP RENO
• dup ACKs indicate network capable of delivering some segments
• cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate
acks)
3-34
TCP: switching from slow start to CA
Q: when should the
exponential
increase switch to
linear?
A: when cwnd gets to
1/2 of its value
before timeout.
Implementation:
variable ssthresh
on loss event, ssthresh
is set to 1/2 of cwnd just
before loss event
Transport Layer 3-35
TCP throughput
avg. TCP thruput as function of window size, RTT?
ignore slow start, assume always data to send
W: window size (measured in bytes) where loss occurs
avg. window size (# in-flight bytes) is ¾ W
avg. thruput is 3/4W per RTT
3 W
avg TCP thruput = bytes/sec
4 RTT
W/2
Transport Layer 3-36
Chapter 3: summary
principles behind transport layer
services:
• multiplexing, demultiplexing
• reliable data transfer
• flow control
• congestion control
instantiation, implementation in
the Internet
• UDP
• TCP
Transport Layer 3-37