Chapter 3 V7.03
Chapter 3 V7.03
Transport
Layer
Computer
and delete slides (including this one) and slide content to suit your needs.
They obviously represent a lot of work on our part. In return for use, we only
ask the following:
If you use these slides (e.g., in a class) that you mention their source
Networking: A
(after all, we’d like people to use our book!)
If you post any slides on a www site, that you note that they are adapted
Top Down
from (or perhaps identical to) our slides, and note our copyright of this
material.
Approach
7th Edition, Global Edition
Thanks and enjoy! JFK/KWR
Jim Kurose, Keith Ross
All material copyright 1996-2016 Pearson
J.F Kurose and K.W. Ross, All Rights Reserved April 2016
Transport Layer 2-1
Chapter 3: Transport Layer
our goals:
understand learn about Internet
principles behind transport layer
transport layer protocols:
services: • UDP: connectionless
• multiplexing, transport
demultiplexing • TCP: connection-
• reliable data oriented reliable
transfer transport
• flow control • TCP flow/congestion
• congestion control
control
provide logical n
transport
different hosts
lo
gi
ca
transport protocols run in
enl
end systems
d-
en
• send side: breaks app
d
tr
messages into
a
ns
segments (MTU),
po
passes to network layer
r
t
• rcv side: reassembles applicatio
n
segments into transport
network
messages, passes to data link
physical
app layer
more than one transport
protocol available to apps
• Internet: TCP and UDP
Transport Layer 3-4
Transport vs. network
layer
network layer:
household analogy:
logical
communication 12 kids in Ann’s house
sending letters to 12
between hosts kids in Bill’s house:
transport layer: hosts = houses
logical processes = kids
communication app messages =
between letters in envelopes
transport protocol =
processes Ann and Bill who
• relies on, demux to in-house
enhances, siblings
network layer network-layer protocol
services = postal service
lo
network data link
gi
data link physical
• flow control
ca
physical
network
l en
data link
• connection setup
d-
physical
en
unreliable, network
d
data link
tr
unordered delivery:
a
physical
ns
network
po
UDP data link
r
physical
t
network
• no-frills extension of data link
physical
applicatio
n
“best-effort” IP network
data link transport
network
services not
physical
data link
physical
available:
• delay guarantees
• bandwidth
guarantees Transport Layer 3-6
Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
• segment structure
3.2 multiplexing • reliable data transfer
and • flow control
demultiplexing • connection
3.3 connectionless management
transport: UDP 3.6 principles of
congestion control
3.4 principles of
3.7 TCP congestion
reliable data control
transfer 3.8 Evolution of transport-
layer functionality
P3
application P1
P1 application P2 P4 application
host 2 host 3
host 1
Transport Layer 3-9
How demultiplexing works
host receives IP datagrams 32 bits
• each datagram has source IP
address, destination IP address source port # dest port #
• each datagram carries one
transport-layer segment
• each segment has source, other header fields
destination port number
host uses IP addresses & port
numbers to direct segment to application
appropriate socket
data
(payload)
application
application P1 application
P3 P4
transport
transport transport
network
network link network
link physical link
physical physical
application
application application
P4
P3 P2 P3
transport
transport transport
network
network link network
link physical link
physical server: physical
IP
address
B
host: IP source IP,port: B,80 host: IP
address dest IP,port: A,9157 source IP,port: C,5775 address
A dest IP,port: B,80 C
source IP,port: A,9157
dest IP, port: B,80
source IP,port: C,9157
dest IP,port: B,80
length checksum
why is there a UDP?
no connection
application establishment (which
data can add delay)
(payload) simple: no connection
state at sender,
receiver
small header size
UDP segment format no congestion control:
UDP can blast away as
fast as desired
wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
send receive
side side
sender receiver
sender
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for call Wait for rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
from above ACK or NAK
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for call Wait for rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
from above ACK or NAK
udt_send(sndpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
Wait for ACK Wait for
or NAK 1 call 1 from
rdt_rcv(rcvpkt) && above
( corrupt(rcvpkt) ||
isNAK(rcvpkt) )
rdt_send(data)
sndpkt = make_pkt(1, data, checksum)
udt_send(sndpkt) udt_send(sndpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
Tx Pkt0 Pkt0
Rx Pkt0
reTx Pkt0 timeout Pkt0
ACK0
Tx Pkt1 Pkt1
Rx Pkt1
ACK1
Tx new Pkt0 Pkt0
Miss new Pkt0
L 8000 bits
Dtrans = R = = 8 microsecs
10 9
bits/sec
U sender : utilization – fraction of time sender busy sending
U L/R .008
sender = = = 0.00027
RTT + L / R 30.008
U sender: utilization – fraction of time sender busy sending
if RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec thruput over 1 Gbps link
network protocol limits use of physical resources!
U L/R .008
sender = = = 0.00027
RTT + L / R 30.008
U 3L / R .0024
sender = = = 0.00081
RTT + L / R 30.008
base=1
nextseqnum=1
timeout
start_timer
Wait
udt_send(sndpkt[base])
udt_send(sndpkt[base+1])
rdt_rcv(rcvpkt) …
udt_send(sndpkt[nextseqnum-1])
&& corrupt(rcvpkt)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
base = getacknum(rcvpkt)+1
If (base == nextseqnum)
stop_timer
else
start_timer
udt_send(sndpkt) rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
&& hasseqnum(rcvpkt,expectedseqnum)
Wait
extract(rcvpkt,data)
expectedseqnum=1 deliver_data(data)
sndpkt = make_pkt(expectedseqnum-1, sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
ACK,chksum) expectedseqnum++
rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
&& hasseqnum(rcvpkt,not_expectedseqnum)
udt_send(sndpkt)
timeout
X
retransmit pkt0 X
Q: what 0123012 pkt0
will accept packet
relationship (b) oops!
with seq number 0
User
types
‘C’
Seq=42, ACK=79, data = ‘C’
host ACKs
receipt of
‘C’, echoes
Seq=79, ACK=43, data = ‘C’ back ‘C’
host ACKs
receipt
of echoed
‘C’ Seq=43, ACK=80
350
300
(milliseconds)
RTT
250
RTT (milliseconds)
200
sampleRTT
150
EstimatedRTT
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
time Transport Layer 3-66
SampleRTT Estimated RTT
TCP round trip time,
timeout
timeout interval: EstimatedRTT plus “safety
margin”
• large variation in EstimatedRTT -> larger safety margin
estimate SampleRTT deviation from EstimatedRTT:
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically, = 0.25)
if (y > SendBase) {
SendBase = y
/* SendBase–1: last cumulatively ACKed byte */
if (there are currently not-yet-acked segments)
start timer
else
stop timer Transport Layer 3-71
}
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever) {
TCP
switch(event)
sender
event: data received from application above
create TCP segment with sequence number NextSeqNum (simplified
if (timer currently not running)
start timer
)
pass segment to IP
Comment:
NextSeqNum = NextSeqNum + length(data)
• SendBase-1: last
event: timer timeout cumulatively
retransmit not-yet-acknowledged segment with ack’ed byte
smallest sequence number (not GBN) Example:
start timer • SendBase-1 = 71;
y= 73, so the rcvr
event: ACK received, with ACK field value of y wants 73+ ;
if (y > SendBase) {
y > SendBase, so
SendBase = y
if (there are currently not-yet-acknowledged segments)
that new data is
start timer acked
else
stop timer
} /*check here !!!*/
} /* end of loop forever */ Transport Layer 3-72
TCP: retransmission scenarios
Host A Host B Host A Host B
Seq=9 Seq=9
2 , 8 by 2 , 8 by
t e s da t e s da
Seq=92 timeout
ta Seq= ta
100,
20 by
te s dat
timeout
a
CK=
100
A
00
X K =1 120
C
A AC K =
loss
Seq=9 Seq=9
2 , 8 by 2 , 8 by
t es d a Sendbase t e s da
ta
ta
Seq=92 timeout
= 100
SendBase
= 120 K =12
0
=100 AC
A CK
SendBase
= 100 SendBase
= 120 premature timeout
time time
lost ACK scenario
Transport Layer 3-73
TCP retransmission scenarios
(more)
Host A Host B
Seq=9
2, 8 b
y tes d
at a
=100
timeout
Seq=1 A C K
0 0, 20
by t e s
d at a
X
loss
SendBase CK =120
A
= 120
time
Cumulative ACK scenario
tinygram
tinygram
Check:
TCP_NODELAY in
waiting for Packing waiting for ACKing setsockopt()
Transport Layer 3-76
TCP fast retransmit (Fast
Recovery)
time-out period
often relatively long: TCP fast retransmit
• long delay before
resending lost packet if sender receives
detect lost segments N ACKs for same
via duplicate ACKs. data
(“triple duplicate ACKs”),
ACK=100
timeo
ACK=100
ut
ACK=100
ACK=100
Seq=100, 20 bytes of data
choose x
req_conn(x)
ESTAB
acc_conn(x)
ESTAB
choose x choose x
req_conn(x) req_conn(x)
ESTAB ESTAB
retransmit acc_conn(x) retransmit acc_conn(x)
req_conn( req_conn(
x) x)
ESTAB ESTAB
data(x+1) accept
req_conn(x)
retransmit data(x+1
data(x+1) )
connection connection
client x completes server x completes server
client
terminat forgets x terminat forgets x
es req_conn(x)
es
ESTAB ESTAB
data(x+1) accept
half open connection! data(x+1
(no client!) Phantom )
Transport Layer 3-89
!
TCP Connection Management
Recall: TCP sender, Three way
receiver establish
“connection” before handshake:
exchanging data Step 1: client host sends TCP
segments
SYN segment to server
initialize TCP variables:
• specifies client initial seq #
• seq. #s
• no data
• buffers, flow control
info (e.g. RcvWindow) Step 2: server host receives SYN,
client: connection initiator replies with SYNACK segment
Socket clientSocket = new • server allocates buffers
Socket("hostname","port • specifies server initial seq. #
number");
Step 3: client receives SYNACK,
server: contacted by replies with ACK segment
client which may contain data
Socket connectionSocket =
welcomeSocket.accept();
= 1
ACK
half open
clien DoS
t_isn
+1
serv
e r_isn
+1
Socket clientSocket =
Recv SYN(x) newSocket("hostname","port
number");
SYNACK(seq=y,ACKnum=x+1) x+1
create new socket for listen Send SYN(seq=x)
communication back to client
SYN SYN
rcvd sent
Recv SYNACK(seq=y,ACKnum=x+1)
x+1
ESTAB ACK(ACKnum=y+1)
y+1
Recv ACK(ACKnum=y+1)
y+1
Transport Layer 3-93
TCP: closing a connection
client, server each close their side of
connection
• send TCP segment with FIN bit = 1
respond to received FIN with ACK
• on receiving FIN, ACK can be combined with
own FIN
simultaneous FIN exchanges can be
handled
CLOSED
timed wait
ACK ACK
Closes connection, sends
FIN.
FIN
closed
timed wait
ACK
modification, can handle
simultaneous FINs. closed
closed
TCP server
lifecycle
TCP client
lifecycle
R/2
delay
out
Host A
out
router buffers available
in R/2
A
no buffer space!
Host B
Transport Layer 3-105
Causes/costs of congestion:
scenario 2
Idealization: known loss R/2
packets can be lost, when sending at R/2,
dropped at router due to some packets are
out
full buffers retransmissions but
asymptotic goodput
sender only resends if is still R/2 (why?)
packet known to be lost in R/2
A
free buffer space!
Host B
Transport Layer 3-106
Causes/costs of congestion:
scenario 2
Realistic: duplicates R/2
packets can be lost,
dropped at router due to when sending at R/2,
some packets are
full buffers
out
retransmissions
sender times out including duplicated
that are delivered!
prematurely, sending in R/2
two copies, both of
which are delivered
in
copy
timeo
'in out
ut
A
free buffer space!
Host B
Transport Layer 3-107
Causes/costs of congestion:
scenario 2
Realistic: duplicates R/2
packets can be lost,
dropped at router due when sending at R/2,
some packets are
to full buffers
out
retransmissions
sender times out including duplicated
that are delivered!
prematurely, sending in R/2
two copies, both of
which are delivered
“costs” of congestion:
more work (retrans) for given “goodput”
unneeded retransmissions: link carries multiple copies of pkt
• decreasing goodput
Host D
Host C
in’ C/2
C/2
out
in’ C/2
time
Transport Layer 3-116
TCP Congestion Control:
details
sender sequence number space
cwnd TCP sending rate:
roughly: send
cwnd bytes, wait
last byte last byte RTT for ACKS,
ACKed sent, not- sent
yet then send more
ACKed
(“in-flight bytes cwnd
sender limits
”) transmission: rate ~
~ bytes/sec
RTT
LastByteSent- < cwnd
LastByteAcked
cwnd is dynamic, function of
perceived network congestion
24 Kbytes
16 Kbytes
8 Kbytes
time
RTT
• initially cwnd = 1 MSS
• double cwnd every RTT two segm
en ts
• done by incrementing
cwnd for every ACK
received four segm
summary: initial rate is ents
time
Implementation:
variable ssthresh
on loss event, ssthresh
is set to 1/2 of cwnd
just before loss event
3 W
avg TCP thruput = bytes/sec
4 RTT
W/2
3-127
time
t0 t1 t2 t3 t4
source destination
application application
TCP TCP
network network
link link
physical physical
packet queue almost
never empty, sometimes
overflows packet (loss)
TCP connection 1
bottleneck
TCP
router
connection 2
capacity R
starting
Connection 1 throughput R
ECN=00 ECN=11
ECN=11
IP datagram
Transport Layer 3-142
Explicit Congestion Notification (RFC
3168)
CWR: Contention Window Reduction
ECE: Explicit Congestion Notification
Flag
Header Location Purpose
name
Congestion window
CWR TCP Bit 8
reduced
P min{Q, K 1}
- where Q is the number of times the server idles
if the object were of infinite size.
R
= 2S /R
third w indow
= 4S /R
P
O
delay 2 RTT idleTime p fourth w indow
= 8S /R
R p 1
P
O S S
2 RTT [ RTT 2 k 1 ]
R k 1 R R object
com plete
transm ission
delivered
O S S
2 RTT P[ RTT ] (2 P 1) tim e at
R R R tim e at
client
server
Server idles:
P = min{K-1,Q} times third w indow
= 4S /R
fourth w indow
Example: = 8S /R
• O/S = 15
segments
• K = 4 windows
•Q=2 object
com plete
transm ission
• P = min{K-1,Q} = delivered
tim e at
2 tim e at server
client
How do we calculate K ?
K min{k : 20 S 21 S 2 k 1 S O}
min{k : 20 21 2 k 1 O / S }
k O
min{k : 2 1 }
S
O
min{k : k log 2 ( 1)}
S
O
log 2 ( 1)
S
Network IP IP
TCP handshake
(transport layer) QUIC handshake
data
TLS handshake
(security)
data
GET
HTTP HTTP
GET GET
HTTP
GET QUIC QUIC QUIC QUIC QUIC QUIC
encrypt encrypt encrypt encrypt encrypt encrypt
QUIC QUIC QUIC QUIC QUIC QUIC
TLS encryption TLS encryption RDT RDT RDT RDT
error!
RDT RDT
Notice:
QUIC is application-layer protocol
QUIC congestion control is based on TCP NewReno [RFC 6582]Transport Layer: 3-159
Chapter 3: summary
principles behind transport WELCOME to Internet
layer services:
next:
• multiplexing,
demultiplexing leaving the
• reliable data transfer network “edge”
• flow control (application,
• congestion control transport layers)
instantiation, into the network
implementation in the “core”
Internet two network
• UDP
• TCP
layer chapters:
• data plane
• control plane