Transport Layer
Internet Transport-layer Protocols
• reliable, in-order
application
transport
network
delivery (TCP) data link
physical
network
– congestion control
lo
data link
gi
physical network
– flow control
ca
data link
physical
le
– connection setup
nd
•
- en
unreliable, unordered network
dt
data link
physical network
delivery: UDP
ra
data link
ns
physical
– no-frills extension of
po
network
rt
data link
“best-effort” IP physical network
application
transport
• services not available:
data link
physical
network
data link
physical
– delay guarantees
– bandwidth guarantees
UDP: User Datagram Protocol [RFC
768]
• “no frills,” “bare bones”
Internet transport Why is there a UDP?
protocol
• no connection
• “best effort” service, UDP establishment (which can
segments may be: add delay)
– lost • simple: no connection
– delivered out of order state at sender, receiver
to app • small segment header
• connectionless: • no congestion control:
– no handshaking UDP can blast away as
between UDP sender, fast as desired
receiver
– each UDP segment
handled
independently of
others
UDP: more
• Often used for streaming
(video/audio) or game 32 bits
apps source port # dest port #
Length, in
– loss tolerant bytes of UDP length checksum
– rate sensitive segment,
• other UDP uses
including
header
– DNS
– SNMP Application
• reliable transfer over data
UDP: add reliability at (message)
application layer
– application-specific
error recovery! UDP segment format
UDP: checksum
Goal: detect “errors” (e.g., flipped bits) in
transmitted segment
Sender: Receiver:
• treat segment contents • compute checksum of
as sequence of 16-bit received segment
integers • check if computed
• checksum: addition (1’s checksum equals
complement sum) of checksum field value:
segment contents – NO - error detected
• sender puts checksum – YES - no error detected.
value into UDP But maybe errors
checksum field nonetheless? More later
….
TCP: Overview RFCs: 793, 1122, 1323, 2018,
2581
• point-to-point: • full duplex data:
– one sender, one – bi-directional data flow
receiver in same connection
• reliable, in-order byte – MSS: maximum
steam: segment size
– no “message • connection-oriented:
boundaries” – handshaking
• pipelined: (exchange of control
msgs) init’s sender,
– TCP congestion and
receiver state before
flow control set window
data exchange
size
sock et •
application
send & receive
w rites data
application
reads data
•
sock et
flow controlled:
door door – sender will not
buffers
TCP
send buffer
TCP
receive buffer
overwhelm receiver
se g m e n t
TCP Segment Structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port # by bytes
ACK: ACK #
sequence number of data
(not segments!)
valid acknowledgement number
head not
PSH: push data now len used UAPRSF Receive window
(generally not used) # bytes
checksum Urg data pointer rcvr willing
to accept
RST, SYN, FIN: Options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)
TCP reliable data transfer
• Retransmissions
TCP creates rdt service
are triggered
on top of
by:IP’s unreliable
service
– Timeout events
• – Duplicatesegments
Pipelined ACKs
• Initially
Cumulativeconsider
ACKssimplified TCP sender:
• – Ignore duplicate ACKs
TCP uses single retransmission timer
– Ignore flow control, congestion control
TCP Sender Events:
Data rcvd from app: Timeout:
• Create segment with • retransmit segment
seq # that caused timeout
• seq # is byte-stream • restart timer
number of first data ACK rcvd:
byte in segment • If acknowledges
• Start timer if not previously unACKed
already running segments
(think of timer as for – update what is known
oldest unACKed to be ACKed
segment) – start timer if there are
• Expiration interval: outstanding
segments
TimeOutInterval
TCP: Retransmission Scenarios
Host A Host B Host A Host B
Seq=9 Seq=9
2, 8 b 2, 8 b
y t es d y t es d
at a Seq= at a
100,
Seq=92 timeout
20 b y
timeout
t es d
ata
=100
ACK 0
10
X CK
A AC
=
K =120
loss
Seq=9 Seq=9
2, 8 b 2, 8 b
y t es d Sendbase y t es d
at a at a
= 100
Seq=92 timeout
SendBase
= 120 =1 20
K
CK =100 AC
A
SendBase
= 100 SendBase
= 120 premature timeout
time time
lost ACK scenario
TCP Retransmission Scenarios
(more)
Host A Host B
Seq=9
2, 8 b
y t es d
at a
timeout
00
Seq=1 A C K =1
0 0 , 20
b y t es
dat a
X
loss
=12 0
SendBase ACK
= 120
time
Cumulative ACK scenario
TCP Flow Control
flow control
• Receive side of TCP
sender won’t overflow
receiver’s buffer by
connection has a transmitting too much,
receive buffer: too fast
IP
(currently)
unused buffer
TCP data application • speed-matching
datagrams space
(in buffer) process service: matching
send rate to
receiving
application’s drain
• App process may be
rate
slow at reading from
buffer
TCP Flow Control: How it Works
IP
(currently)
unused buffer
TCP data application • Receiver: advertises
datagrams space
(in buffer) process unused buffer space
by including rwnd
rwnd value in segment
RcvBuffer
header
(suppose TCP receiver • sender: limits # of
discards out-of-order unACKed bytes to
segments) rwnd
• unused buffer space: – guarantees receiver’s
= rwnd buffer doesn’t
overflow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
TCP Connection Management
Recall: TCP sender, Three way
receiver establish handshake:
“connection” before
exchanging data Step 1: client host sends TCP
segments SYN segment to server
• initialize TCP variables: – specifies initial seq #
– seq. #s – no data
– buffers, flow control
Step 2: server host receives
info (e.g. RcvWindow) SYN, replies with SYNACK
• client: connection segment
initiator – server allocates buffers
Socket clientSocket = new
– specifies server initial
Socket(“hostname”, port#);
seq. #
• server: contacted by
Step 3: client receives
client
SYNACK, replies with ACK
Socket connectionSocket =
[Link](); segment, which may
contain data
TCP Connection Management (cont.)
Closing a connection: client server
close
client closes socket: FIN
[Link]();
Step 1: client end system
close
sends TCP FIN control ACK
segment to server FIN
Step 2: server receives ACK
timed wait
FIN, replies with ACK.
Closes connection, sends
FIN.
closed
TCP Connection Management (cont.)
Step 3: client receives client server
FIN, replies with ACK. closing
FIN
– Enters “timed wait” -
will respond with ACK
to received FINs
closing
ACK
Step 4: server, receives
ACK. Connection closed. FIN
ACK
timed wait closed
closed
Principles of Congestion
Control
Congestion:
• Informally: “too many sources sending too
much data too fast for network to handle”
• Different from flow control!
• Manifestations:
– Lost packets (buffer overflow at routers)
– Long delays (queueing in router buffers)
• A “top-10” problem!
Causes/costs of Congestion:
Scenario l 1
•
Host A lout
: original data
Two senders, in
two receivers
• One router,
Host B unlimited shared
output link buffers
infinite buffers
• No
retransmission
• Large delays
when
congested
• Maximum
achievable
throughput
Causes/costs of Congestion:
Scenario 2
• One router, finite buffers
• Sender retransmission of lost packet
Host lin : original data lou
A l'in : original data, plus
t
retransmitted data
Host finite shared
B output link
buffers
TCP Congestion Control:
• Goal: TCP sender should transmit as fast as
possible, but without congesting network
- Q: how to find rate just below congestion level?
• Decentralized: each TCP sender sets its own rate,
based on implicit feedback:
- ACK: segment received (a good thing!),
network not congested, so increase sending
rate
- lost segment: assume loss due to congested
network, so decrease sending rate
TCP Congestion Control: Bandwidth
Probing
• “Probing for bandwidth”: increase transmission
rate on receipt of ACK, until eventually loss
occurs, then decrease transmission rate
- continue to increase on ACK, decrease on loss (since
available bandwidth is changing, depending on other
connections in network)
ACKs being received,
X loss, so decrease rate
so increase rate
X
X
X
sending rate
X TCP’s
“sawtooth”
behavior
time
• Q: how fast to increase/decrease?
- details to follow
TCP Congestion Control: details
• sender limits rate by limiting
number of unACKed bytes “in
pipeline”:
LastByteSent-LastByteAcked cwnd
– cwnd: differs from rwnd (how, why?)
– sender limited by min(cwnd,rwnd) cwnd
bytes
• roughly, cwnd
rate = bytes/sec
RTT
RTT
• cwnd is dynamic, function of
ACK(s)
perceived network congestion
TCP Congestion Control: more
details
segment loss event: ACK received: increase
reducing cwnd cwnd
• timeout: no response • slowstart phase:
from receiver - increase exponentially
– cut cwnd to 1 fast (despite name) at
• 3 duplicate ACKs: at connection start, or
following timeout
least some segments
• congestion avoidance:
getting through (recall
- increase linearly
fast retransmit)
– cut cwnd in half, less
aggressively than on
timeout
TCP Slow Start
• when connection begins,
cwnd = 1 MSS Host A Host B
– example: MSS = 500
one s e gm
ent
bytes & RTT = 200 msec
RTT
– initial rate = 20 kbps
• available bandwidth may be
two segm
en ts
>> MSS/RTT
– desirable to quickly ramp
up to respectable rate four segm
ents
• increase rate exponentially
until first loss event or when
threshold reached
– double cwnd every RTT
time
– done by incrementing
cwnd by 1 for every ACK
received
TCP: Congestion Avoidance
• When cwnd > ssthresh AIMD
grow cwnd linearly
• ACKs: increase cwnd
– increase cwnd by 1 by 1 MSS per RTT:
MSS per RTT additive increase
– approach possible • loss: cut cwnd in half
congestion slower (non-timeout-
than in slowstart detected loss ):
– implementation: multiplicative
decrease
cwnd = cwnd + AIMD: Additive Increase
MSS/cwnd for each Multiplicative Decrease
ACK received
Popular “flavors” of TCP
TCP Reno
cwnd window size
(in segments)
ssthresh
ssthresh
TCP Tahoe
Transmission
round
Summary: TCP Congestion Control
• when cwnd < ssthresh, sender in slow-start
phase, window grows exponentially.
• when cwnd >= ssthresh, sender is in
congestion-avoidance phase, window grows
linearly.
• when triple duplicate ACK occurs, ssthresh
set to cwnd/2, cwnd set to ~ ssthresh
• when timeout occurs, ssthresh set to cwnd/2,
cwnd set to 1 MSS.