0% found this document useful (0 votes)
48 views100 pages

Chapter 3 New

Chapter 3 of ENSF 462 covers the transport layer of networked systems, focusing on transport layer services such as multiplexing, demultiplexing, reliable data transfer, flow control, and congestion control. It discusses two primary Internet transport protocols: TCP, which is connection-oriented and reliable, and UDP, which is connectionless and offers a 'best-effort' service. The chapter also explains the actions taken by transport layer protocols at both the sender and receiver ends, including segment creation and demultiplexing based on header information.

Uploaded by

sravindith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views100 pages

Chapter 3 New

Chapter 3 of ENSF 462 covers the transport layer of networked systems, focusing on transport layer services such as multiplexing, demultiplexing, reliable data transfer, flow control, and congestion control. It discusses two primary Internet transport protocols: TCP, which is connection-oriented and reliable, and UDP, which is connectionless and offers a 'best-effort' service. The chapter also explains the actions taken by transport layer protocols at both the sender and receiver ends, including segment creation and demultiplexing based on header information.

Uploaded by

sravindith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

ENSF 462 – Networked Systems

Chapter 3 – Transport Layer


Chapter Objectives

At the end of this chapter, you will be able to:


▪ understand principles behind transport layer services:
• multiplexing, demultiplexing
• reliable data transfer
• flow control
• congestion control
▪ learn about Internet transport layer protocols:
• UDP: connectionless transport
• TCP: connection-oriented reliable transport
• TCP congestion control

ENSF 462 – Networked Systems 2


Transport layer: roadmap
• Transport-layer services
• Multiplexing and demultiplexing
• Connectionless transport: UDP
• Connection-oriented transport: TCP
• TCP congestion control

ENSF 462 – Networked Systems Transport


Layer: 3-3
Transport services and protocols
application
transport

▪ provide logical communication mobile network


network
data link
physical
between application processes national or global ISP

running on different hosts


▪ transport protocols actions in end
systems: local or
• sender: breaks application messages regional ISP

into segments, passes to network layer home network content


• receiver: reassembles segments into provider
network datacenter
messages, passes to application layer application
transport
network
network

▪ two transport protocols available to data link


physical

Internet applications enterprise


network
• TCP, UDP
ENSF 462 – Networked Systems Transport
Layer: 3-4
Transport vs. network layer services and protocols

household analogy:
12 kids in Ann’s house sending
letters to 12 kids in Bill’s house:
▪ hosts = houses
▪ processes = kids
▪ app messages = letters in envelopes

ENSF 462 – Networked Systems Transport


Layer: 3-5
Transport vs. network layer services and protocols

household analogy:
▪transport layer:
communication between 12 kids in Ann’s house sending
processes letters to 12 kids in Bill’s house:
▪ hosts = houses
• relies on, enhances, network
▪ processes = kids
layer services
▪ app messages = letters in envelopes
▪network layer: ▪ transport protocol = Ann and Bill
who demux to in-house siblings
communication between
hosts ▪ network-layer protocol = postal
service

ENSF 462 – Networked Systems Transport


Layer: 3-6
Transport Layer Actions

Sender:
application ▪ is passed an application- application
app. msg
layer message
transport ▪ determines segment TThhtransport
app. msg
header fields values
network (IP) ▪ creates segment network (IP)

link ▪ passes segment to IP link

physical physical

ENSF 462 – Networked Systems Transport


Layer: 3-7
Transport Layer Actions

Receiver:
application ▪ receives segment from IP application
▪ checks header values
transport
app. msg ▪ extracts application-layer transport
message
network (IP) network (IP)
▪ demultiplexes message up
link to application via socket link

physical physical
Th app. msg

ENSF 462 – Networked Systems Transport


Layer: 3-8
Two principal Internet transport protocols
application
transport

▪ TCP: Transmission Control Protocol mobile network


network
data link
physical
• reliable, in-order delivery national or global ISP

• congestion control
• flow control
• connection setup
local or
▪ UDP: User Datagram Protocol regional ISP

• unreliable, unordered delivery home network content


• no-frills extension of “best-effort” IP provider
network datacenter
application
network
▪ services not available: transport
network
data link

• delay guarantees physical

• bandwidth guarantees enterprise


network

ENSF 462 – Networked Systems Transport


Layer: 3-9
Chapter 3: Outline

• Transport-layer services
• Multiplexing and demultiplexing
• Connectionless transport: UDP
• Connection-oriented transport: TCP
• TCP congestion control

ENSF 462 – Networked Systems Transport


Layer: 3-10
Multiplexing/demultiplexing
multiplexing at sender: demultiplexing at receiver:
handle data from multiple use header info to deliver
sockets, add transport header received segments to correct
(later used for demultiplexing) socket

application

application P1 P2 application socket


P3 transport P4
process
transport network transport
network link network
link physical link
physical physical

ENSF 462 – Networked Systems Transport


Layer: 3-11
Multiplexing

application

transport

multiplexing

ENSF 462 – Networked Systems


Multiplexing
ENSF 462 – Networked Systems
De-multiplexing

application

? transport

de-multiplexing

ENSF 462 – Networked Systems


Demultiplexing
ENSF 462 – Networked Systems
How demultiplexing works

▪ host receives IP datagrams 32 bits


• each datagram has source IP source port # dest port #
address, destination IP address
• each datagram carries one other header fields
transport-layer segment
• each segment has source, application
destination port number data
▪ host uses IP addresses & port (payload)
numbers to direct segment to
appropriate socket TCP/UDP segment format

ENSF 462 – Networked Systems Transport


Layer: 3-16
Connectionless demultiplexing
Recall:
▪ when creating UDP socket, must have host-local port #:
clientSocket = socket(AF_INET, SOCK_DGRAM)
clientSocket.bind((’’, 19157))
▪ when creating datagram to send into UDP socket, must specify
• destination IP address
• destination port #
clientSocket.sendto(message,(serverName, serverPort))
▪ when receiving host receives UDP segment:
• checks destination port # in segment
• directs UDP segment to socket with that port #
ENSF 462 – Networked Systems Transport
Layer: 3-17
Connectionless demultiplexing: an example
mySocket =
mySocket = socket(AF_INET,SOCK_DGRAM) mySocket =
socket(AF_INET,SOCK_STREAM) mySocket.bind(myaddr,6428); socket(AF_INET,SOCK_STREAM)
mySocket.bind(myaddr,9157); mySocket.bind(myaddr,5775);
application
application application
P1
P3 P4
transport
transport transport
network
network link network
link physical link
physical physical

B D
source port: 6428 source port: ?
dest port: 9157 dest port: ?

A C
source port: 9157 source port: ?
dest port: 6428 dest port: ?

IP/UDP datagrams with same dest. port #, but different source IP addresses
and/or source port numbers will be directed to same socket at receiving host
ENSF 462 – Networked Systems
Connection-oriented demultiplexing

▪ UDP socket is fully ▪ demux: receiver uses all


identified by 2-tuple: four values (4-tuple) to
• dest IP address direct segment to
• dest port number appropriate socket
▪ TCP socket is identified by ▪ server may support many
4-tuple: simultaneous TCP sockets:
• source IP address
• each socket identified by its
• source port number
own 4-tuple
• dest IP address
• dest port number • each socket associated with
a different connecting client

ENSF 462 – Networked Systems Transport


Layer: 3-19
Connection-oriented demultiplexing: example
application
application P4 P5 P6 application
P1 P2 P3
transport
transport transport
network
network link network
link physical link
physical server: IP physical
address B
host: IP
host: IP address C
address A source IP,port: B,80
dest IP,port: A,9157 source IP,port: C,5775
dest IP,port: B,80
source IP,port: A,9157
dest IP, port: B,80
source IP,port: C,9157
dest IP,port: B,80
Three segments, all destined to IP address: B,
dest port: 80, are demultiplexed to different sockets
ENSF 462 – Networked Systems Transport
Layer: 3-20
Summary
▪ Multiplexing, demultiplexing: based on segment, datagram
header field values
▪ UDP: demultiplexing using destination port number (only)
▪ TCP: demultiplexing using 4-tuple: source and destination
IP addresses, and port numbers
▪ Multiplexing/demultiplexing happen at all layers

ENSF 462 – Networked Systems Transport


Layer: 3-21
Chapter 3: Outline

• Transport-layer services
• Multiplexing and demultiplexing
• Connectionless transport: UDP
• Connection-oriented transport: TCP
• TCP congestion control

ENSF 462 – Networked Systems Transport


Layer: 3-22
UDP: User Datagram Protocol
Why is there a UDP?
▪ “no frills,” “bare bones”
Internet transport protocol ▪ no connection
establishment (which can
▪ “best effort” service, UDP add RTT delay)
segments may be: ▪ simple: no connection state
• lost at sender, receiver
• delivered out-of-order to app ▪ small header size
▪ connectionless: ▪ no congestion control
• no handshaking between UDP ▪ UDP can blast away as fast as
desired!
sender, receiver
▪ can function in the face of
• each UDP segment handled congestion
independently of others
ENSF 462 – Networked Systems Transport
Layer: 3-23
UDP: User Datagram Protocol

▪ UDP use:
▪ streaming multimedia apps (loss tolerant, rate sensitive)
▪ DNS
▪ HTTP/3
▪ if reliable transfer needed over UDP (e.g., HTTP/3):
▪ add needed reliability at application layer
▪ add congestion control at application layer

ENSF 462 – Networked Systems Transport


Layer: 3-24
Knowledge Check

• It is possible for two UDP segments to be sent from the same socket
with source port 5723 at a server to two different clients. True or False?

• It is possible for two TCP segments with source port 80 to be sent by


the sending host to different clients. True or False?

ENSF 462 – Networked Systems 25


UDP: Transport Layer Actions

SNMP client SNMP server

application application

transport transport
(UDP) (UDP)

network (IP) network (IP)

link link

physical physical

ENSF 462 – Networked Systems Transport


Layer: 3-26
UDP: Transport Layer Actions

SNMP client SNMP server


UDP sender actions:
application ▪ is passed an application- application
SNMP msg
layer message
transport transport
▪ determines UDP segment UDP
UDPhh SNMP msg
(UDP) header fields values (UDP)

network (IP) ▪ creates UDP segment network (IP)

link ▪ passes segment to IP link

physical physical

ENSF 462 – Networked Systems Transport


Layer: 3-27
UDP: Transport Layer Actions

SNMP client SNMP server


UDP receiver actions:
application ▪ receives segment from IP application
▪ checks UDP checksum
transport transport
SNMP msg header value
(UDP) (UDP)
▪ extracts application-layer
network
UDP h SNMP(IP)
msg message network (IP)
▪ demultiplexes message up
link link
to application via socket
physical physical

ENSF 462 – Networked Systems Transport


Layer: 3-28
UDP segment header

32 bits
source port # dest port #
length checksum

application length, in bytes of


data UDP segment,
(payload) including header

data to/from
UDP segment format application layer

ENSF 462 – Networked Systems Transport


Layer: 3-29
UDP checksum
Goal: detect errors (i.e., flipped bits) in transmitted segment
1st number 2nd number sum

Transmitted: 5 6 11

Received: 4 6 11

receiver-computed sender-computed
checksum
= checksum (as received)

ENSF 462 – Networked Systems Transport


Layer: 3-30
Internet checksum
Goal: detect errors (i.e., flipped bits) in transmitted segment
sender: receiver:
▪ treat contents of UDP ▪ compute checksum of received
segment (including UDP header segment
fields and IP addresses) as
sequence of 16-bit integers ▪ check if computed checksum equals
▪ checksum: addition (one’s checksum field value:
complement sum) of segment • not equal - error detected
content • equal - no error detected. But maybe
▪ checksum value put into errors nonetheless? More later ….
UDP checksum field

ENSF 462 – Networked Systems Transport


Layer: 3-31
Internet checksum: an example

example: add two 16-bit integers


1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

sum 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

Note: when adding numbers, a carryout from the most significant bit needs to be
added to the result

ENSF 462 – Networked Systems Transport


Layer: 3-32
Internet checksum: weak protection!

example: add two 16-bit integers


0 1
1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 Even though
numbers have
sum 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 changed (bit
flips), no change
checksum 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 in checksum!

ENSF 462 – Networked Systems Transport


Layer: 3-33
Summary: UDP
▪ “no frills” protocol:
• segments may be lost, delivered out of order
• best effort service: “send and hope for the best”
▪ UDP has its plusses:
• no setup/handshaking needed (no RTT incurred)
• can function when network service is compromised
• helps with reliability (checksum)
▪ build additional functionality on top of UDP in application layer
(e.g., HTTP/3)
ENSF 462 – Networked Systems
Chapter 3: Outline

• Transport-layer services
• Multiplexing and demultiplexing
• Connectionless transport: UDP
• Connection-oriented transport: TCP
• segment structure
• reliable data transfer
• flow control
• connection management
• TCP congestion control

ENSF 462 – Networked Systems Transport


Layer: 3-35
TCP: overview RFCs: 793,1122, 2018, 5681, 7323

▪ point-to-point: ▪ cumulative ACKs


• one sender, one receiver ▪ pipelining:
▪ reliable, in-order byte • TCP congestion and flow control
steam: set window size
▪ full duplex data: ▪ connection-oriented:
• bi-directional data flow in • handshaking (exchange of control
same connection messages) initializes sender,
• MSS: maximum segment size receiver state before data exchange
▪ flow controlled:
• sender will not overwhelm receiver

ENSF 462 – Networked Systems Transport


Layer: 3-36
TCP segment structure
32 bits

source port # dest port # segment seq #: counting


ACK: seq # of next expected sequence number bytes of data into bytestream
byte; A bit: this is an ACK (not segments!)
acknowledgement number
head not
length (of TCP header) len used C EUAP R SF receive window flow control: # bytes
Internet checksum checksum Urg data pointer receiver willing to accept

options (variable length)


C, E: congestion notification
TCP options
application data sent by
RST, SYN, FIN: connection data application into
management (variable length) TCP socket

ENSF 462 – Networked Systems Transport


Layer: 3-37
TCP sequence numbers, ACKs
outgoing segment from sender
Sequence numbers: source port # dest port #
sequence number
• byte stream “number” of acknowledgement number
rwnd
first byte in segment’s data checksum urg pointer

window size
Acknowledgements: N

• seq # of next byte expected


from other side sender sequence number space

• cumulative ACK sent sent, not- usable not


ACKed yet ACKed but not usable
(“in-flight”) yet sent
Q: how receiver handles out-of-
order segments outgoing segment from receiver

• A: TCP spec doesn’t say, - up


source port # dest port #
sequence number

to implementor acknowledgement number


A rwnd
checksum urg pointer
ENSF 462 – Networked Systems Transport
Layer: 3-38
TCP sequence numbers, ACKs
Host A Host B

User types‘C’
Seq=42, ACK=79, data = ‘C’
host ACKs receipt
of‘C’, echoes back ‘C’
Seq=79, ACK=43, data = ‘C’
host ACKs receipt
of echoed ‘C’
Seq=43, ACK=80

ENSF 462 – Networked Systems Transport


Layer: 3-39
TCP round trip time, timeout
Q: how to set TCP timeout Q: how to estimate RTT?
value? ▪ SampleRTT:measured time
▪ longer than RTT, but RTT varies! from segment transmission until
ACK receipt
▪ too short: premature timeout,
• ignore retransmissions
unnecessary retransmissions
▪ SampleRTT will vary, want
▪ too long: slow reaction to estimated RTT “smoother”
segment loss • average several recent
measurements, not just current
SampleRTT

ENSF 462 – Networked Systems Transport


Layer: 3-40
TCP round trip time, timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
▪ exponential weighted moving average (EWMA)
▪ influence of past sample decreases exponentially fast
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
▪ typical value:  = 0.125 350

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

RTT (milliseconds)
300

250

RTT (milliseconds)
200

sampleRTT
150

EstimatedRTT

100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
time (seconds)
ENSF 462 – Networked Systems SampleRTT Estimated RTT Transport
Layer: 3-41
TCP round trip time, timeout
▪ timeout interval: EstimatedRTT plus “safety margin”
• large variation in EstimatedRTT: want a larger safety margin
TimeoutInterval = EstimatedRTT + 4*DevRTT

estimated RTT “safety margin”

▪ DevRTT: EWMA of SampleRTT deviation from EstimatedRTT:


DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|
(typically,  = 0.25)

ENSF 462 – Networked Systems Transport


Layer: 3-42
TCP: retransmission scenarios
Host A Host B Host A Host B

SendBase=92
Seq=92, 8 bytes of data Seq=92, 8 bytes of data

Seq=100, 20 bytes of data


timeout

timeout
ACK=100
X
ACK=100
ACK=120

Seq=92, 8 bytes of data Seq=92, 8


SendBase=100 bytes of data send cumulative
SendBase=120 ACK for 120
ACK=100
ACK=120

SendBase=120

lost ACK scenario premature timeout

ENSF 462 – Networked Systems Transport


Layer: 3-43
TCP: retransmission scenarios
Host A Host B

Seq=92, 8 bytes of data

Seq=100, 20 bytes of data


ACK=100
X
ACK=120

Seq=120, 15 bytes of data

cumulative ACK covers


for earlier lost ACK

ENSF 462 – Networked Systems Transport


Layer: 3-44
TCP fast retransmit
Host A Host B
TCP fast retransmit
if sender receives 3 additional
ACKs for same data (“triple
duplicate ACKs”), resend unACKed
segment with smallest seq # X
▪ likely that unACKed segment lost,
so don’t wait for timeout

timeout
Receipt of three duplicate ACKs
indicates 3 segments received Seq=100, 20 bytes of data

after a missing segment – lost


segment is likely. So retransmit!

ENSF 462 – Networked Systems Transport


Layer: 3-45
Chapter 3: Outline

• Transport-layer services
• Multiplexing and demultiplexing
• Connectionless transport: UDP
• Connection-oriented transport: TCP
• segment structure
• reliable data transfer
• flow control
• connection management
• TCP congestion control

ENSF 462 – Networked Systems Transport


Layer: 3-46
TCP flow control
application
Q: What happens if network Application removing
process
layer delivers data faster than data from TCP socket
buffers
application layer removes TCP socket
data from socket buffers? receiver buffers

TCP
code
Network layer
delivering IP datagram
payload into TCP
IP
socket buffers
code

from sender

receiver protocol stack

ENSF 462 – Networked Systems Transport


Layer: 3-47
ENSF 462 – Networked Systems 48
TCP flow control
application
Q: What happens if network Application removing
process
layer delivers data faster than data from TCP socket
buffers
application layer removes TCP socket
data from socket buffers? receiver buffers

TCP
code
Network layer
receive window
flow control: # bytes delivering IP datagram
receiver willing to accept payload into TCP
IP
socket buffers
code

from sender

receiver protocol stack

ENSF 462 – Networked Systems Transport


Layer: 3-49
TCP flow control
application
Q: What happens if network Application removing
process
layer delivers data faster than data from TCP socket
buffers
application layer removes TCP socket
data from socket buffers? receiver buffers

TCP
code
flow control Network layer
receiver controls sender, so delivering IP datagram
payload into TCP
sender won’t overflow socket buffers IP
code
receiver’s buffer by
transmitting too much, too fast
from sender

receiver protocol stack

ENSF 462 – Networked Systems Transport


Layer: 3-50
TCP flow control
▪ TCP receiver “advertises” free buffer
space in rwnd field in TCP header to application process
• RcvBuffer size set via socket
options (typical default is 4096 bytes) RcvBuffer buffered data
• many operating systems auto-adjust
RcvBuffer rwnd free buffer space
▪ sender limits amount of unACKed
(“in-flight”) data to the received
TCP segment payloads
rwnd
▪ guarantees receive buffer will not TCP receiver-side buffering
overflow

ENSF 462 – Networked Systems Transport


Layer: 3-51
TCP flow control
flow control: # bytes receiver willing to accept

▪ TCP receiver “advertises” free buffer


space in rwnd field in TCP header
• RcvBuffer size set via socket
receive window
options (typical default is 4096 bytes)
• many operating systems auto-adjust
RcvBuffer
▪ sender limits amount of unACKed
(“in-flight”) data to received rwnd
▪ guarantees receive buffer will not
overflow
TCP segment format

ENSF 462 – Networked Systems Transport


Layer: 3-52
Knowledge Check

• Consider the TCP scenario below. Why is it that the receiver sends an ACK
that is one larger than the sequence number in the received datagram?
a. Because TCP sequence numbers
always increase by 1, with every new
segment, and the TCP receiver always
send the sequence number of the
next expected segment
b. Because the send-to-receiver segment
carries only one byte of data, and
after that segment is received, the
next expected byte of data is just the
next byte in the data stream.

ENSF 462 – Networked Systems 53


Knowledge Check

• Consider the TCP Telnet scenario below (from Fig. 3.36 in text). What
timer-related action does the sender take on the receipt of ACK 120?
a. Leaves any currently-running timers running.
b. Cancels any running timers.
c. Restarts a timer for the segment with
sequence number 92.

ENSF 462 – Networked Systems 54


Knowledge Check
• Suppose that TCP’s current estimated values for the round-trip time
(estimatedRTT) and deviation in the RTT (DevRTT) are 300 msec and 13 msec,
respectively. Suppose that the next measured RTT is 330 msec. What are the new
values for devRTT, estimatedRTT, and TCP’s timeout value in msec? Use the values
of α = 0.125, β = 0.25. (Note that given a new measured RTT, you should first
compute devRTT, then estimatedRTT, and then (lastly) the timeout interval.)

ENSF 462 – Networked Systems 55


Knowledge Check

• With TCP‘s flow control mechanism, where the receiver tells the
sender how much free buffer space it has, and the sender always
limits the amount of unACKed, in-flight data to less than this amount,
it is not possible for the sender to send more data than the receiver
has room to buffer. True or False?

ENSF 462 – Networked Systems 56


TCP connection management
before exchanging data, sender/receiver “handshake”:
▪ agree to establish connection (each knowing the other willing to establish connection)
▪ agree on connection parameters (e.g., starting seq #s)

application application

connection state: ESTAB connection state: ESTAB


connection variables: connection Variables:
seq # client-to-server seq # client-to-server
server-to-client server-to-client
rcvBuffer size rcvBuffer size
at server,client at server,client

network network

Socket clientSocket = Socket connectionSocket =


newSocket("hostname","port number"); welcomeSocket.accept();
ENSF 462 – Networked Systems Transport
Layer: 3-57
Agreeing to establish a connection
2-way handshake:

Q: will 2-way handshake always


Let’s talk work in network?
ESTAB
ESTAB
OK ▪ variable delays
▪ retransmitted messages (e.g.
req_conn(x)) due to message loss
▪ message reordering
choose x
req_conn(x) ▪ can’t “see” other side
ESTAB
acc_conn(x)
ESTAB

ENSF 462 – Networked Systems Transport


Layer: 3-58
x completes

2-way handshake scenarios

choose x
req_conn(x)
ESTAB
acc_conn(x)

ESTAB
data(x+1) accept
data(x+1)
ACK(x+1)

No problem!

ENSF 462 – Networked Systems Transport


Layer: 3-59
2-way handshake scenarios

choose x
req_conn(x)
ESTAB
retransmit acc_conn(x)
req_conn(x)

ESTAB
req_conn(x)

connection
client x completes server
terminates forgets x

ESTAB
acc_conn(x)

Problem: half open


connection! (no client)
ENSF 462 – Networked Systems Transport
Layer: 3-60
2-way handshake scenarios

choose x
req_conn(x)
ESTAB
retransmit acc_conn(x)
req_conn(x)

ESTAB
data(x+1) accept
data(x+1)
retransmit
data(x+1)
connection
x completes server
client
terminates forgets x
req_conn(x)
ESTAB
data(x+1) accept
data(x+1)
Problem: dup data
ENSF 462 – Networked Systems accepted!
TCP 3-way handshake
32 bits Server state
Client state serverSocket = socket(AF_INET,SOCK_STREAM)
clientSocket = socket(AF_INET, SOCK_STREAM) serverSocket.bind((‘’,serverPort))
serverSocket.listen(1)
LISTEN
clientSocket.connect((serverName,serverPort)) LISTEN
choose init seq num, x
send TCP SYN msg
connectionSocket, addr = serverSocket.accept()
SYNSENT R S F Seq=x
SYNbit=1,
choose init seq num, y
send TCP SYNACK
msg, acking SYN SYN RCVD
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
received SYNACK(x)
ESTAB indicates server is live;
send ACK for SYNACK;
this segment may contain ACKbit=1, ACKnum=y+1
client-to-server data
received ACK(y)
indicates client is live
RST, SYN, FIN: ESTAB
connection management
ENSF 462 – Networked Systems Transport
Layer: 3-62
Closing a TCP connection
▪ client, server each can close their
side of connection
• send TCP segment with FIN bit = 1
▪ respond to received FIN with ACK
• on receiving FIN, ACK can be
combined with own FIN
▪ simultaneous FIN exchanges can
be handled

ENSF 462 – Networked Systems Transport


Layer: 3-63
Chapter 3: Outline

• Transport-layer services
• Multiplexing and demultiplexing
• Connectionless transport: UDP
• Connection-oriented transport: TCP
• TCP congestion control

ENSF 462 – Networked Systems Transport


Layer: 3-64
TCP congestion control: AIMD
• End-to-end congestion control
– the network layer provides no explicit support
– network congestion inferred based only on observed network behavior
• approach: senders can increase sending rate until packet loss
(congestion) occurs, then decrease sending rate on loss event
Additive Increase Multiplicative Decrease
increase sending rate cut sending rate in half
by 1 segment every at each loss event
TCP sender Sending rate

RTT until loss detected

AIMD sawtooth behavior:


Probing for bandwidth
ENSF 462 – Networked Systems time Transport
Layer: 3-65
TCP AIMD: more
Multiplicative decrease detail: sending rate is
▪ Cut in half on loss detected by triple duplicate ACK
▪ Cut to 1 MSS (maximum segment size) when loss detected by timeout

Why AIMD?
▪ AIMD – a distributed, asynchronous algorithm – has been shown to:
• optimize congested flow rates network wide!
• have desirable stability properties

ENSF 462 – Networked Systems Transport


Layer: 3-66
TCP congestion control: details
sender sequence number space
cwnd TCP sending behavior:
▪ roughly: send cwnd bytes,
wait RTT for ACKS, then
send more bytes
last byte
available but ~
cwnd
ACKed sent, but not- TCP rate ~ bytes/sec
yet ACKed not used RTT
(“in-flight”) last byte sent

▪ TCP sender limits transmission: LastByteSent- LastByteAcked < cwnd

▪ cwnd is dynamically adjusted in response to observed


network congestion (implementing TCP congestion control)
ENSF 462 – Networked Systems Transport
Layer: 3-67
1. TCP slow start
Host A Host B
▪ when connection begins,
increase rate exponentially
until first loss event:

RTT
• initially cwnd = 1 MSS
• Increase cwnd by 1 MSS for
every ACK received
• double cwnd every RTT
▪ summary: initial rate is
slow, but ramps up
exponentially fast time

ENSF 462 – Networked Systems Transport


Layer: 3-68
1. TCP slow start
Host A Host B

RTT
Discover available bandwidth fast - desirable to
quickly ramp up to respectable rate
- When cwnd > threshold, move to
congestion avoidance phase to slow down
the sending rate

time
2. Congestion avoidance

- When Cwnd is above a threshold


cwnd
- Cwnd is incremented by one segment
every RTT (= for every window of ACKs it 8 threshold
receives) → linear increase
- Cwnd continues to increase (linearly) until
loss is detected 4
- TCP spends most of its time in this phase 2
1 RTTs
3. Reaction to congestion
• At any time, when congestion occurs, decrease the window size.
• How TCP recognizes congestion? Two congestion indication mechanisms
1. Three duplicated ACKs:
Duplicate ACKs means the receiver got all packets up to the
gap and is actually receiving packets → at least network
capable of delivering some segments. Could be due to
temporary congestion
- Reduce Cwnd but not aggressively
- Congestion threshold = cwnd/2 and new cwnd = threshold
- Stay in congestion avoidance phase
3. Reaction to congestion
2. Timeout
No response from receiver - more likely due to significant
congestion → reduce cwnd aggressively

- Congestion threshold = cwnd/2 and new cwnd = 1 max.


segment size (MSS)
- Go back to slow start phase

• Most of the time the window will be like a sawtooth


Additive Increase/Multiplicative Decrease (AIMD): cwnd
increases by 1 every RTT, cwnd decreases by a factor of
two with every loss, and repeat
TCP: from slow start to congestion avoidance
Q: when should the exponential
increase switch to linear?
X
A: when cwnd gets to 1/2 of its
value before timeout.

Implementation:
▪ variable ssthresh
▪ tracks the transition point between
slow start and congestion avoidance
▪ on loss event, ssthresh is set to
1/2 of cwnd just before loss event

ENSF 462 – Networked Systems Transport


Layer: 3-73
TCP: from slow start to congestion avoidance

ENSF 462 – Networked Systems Transport


Layer: 3-74
TCP CUBIC
▪ Is there a better way than AIMD to “probe” for usable bandwidth?
▪ Insight/intuition:
• Wmax: sending rate at which congestion loss was detected
• congestion state of bottleneck link probably (?) hasn’t changed much
• after cutting rate/window in half on loss, initially ramp to Wmax faster, but then
approach Wmax more slowly

Wmax classic TCP

TCP CUBIC - higher


Wmax/2 throughput in this
example

ENSF 462 – Networked Systems Transport


Layer: 3-75
TCP and the congested “bottleneck link”
▪ TCP (classic, CUBIC) increase TCP’s sending rate until packet loss occurs
at some router’s output: the bottleneck link

source destination
application application
TCP TCP
network network
link link
physical physical
packet queue almost
never empty, sometimes
overflows packet (loss)

bottleneck link (almost always busy)


ENSF 462 – Networked Systems Transport
Layer: 3-76
TCP and the congested “bottleneck link”
▪ TCP (classic, CUBIC) increase TCP’s sending rate until packet loss occurs
at some router’s output: the bottleneck link
▪ understanding congestion: useful to focus on congested bottleneck link

insight: increasing TCP sending rate will


source not increase end-end throughout destination
with congested bottleneck
application application
TCP TCP
network network
link link
physical physical

insight: increasing TCP


sending rate will
increase measured RTT
Goal: “keep the end-end pipe just full, but not fuller”
RTT
ENSF 462 – Networked Systems Transport
Layer: 3-77
Delay-based TCP congestion control
Keeping sender-to-receiver pipe “just full enough, but no fuller”: keep
bottleneck link busy transmitting, but avoid high delays/buffering
# bytes sent in
measured last RTT interval
RTTmeasured throughput =
RTTmeasured
Delay-based approach:
▪ RTTmin - minimum observed RTT (uncongested path)
▪ uncongested throughput with congestion window cwnd is cwnd/RTTmin
if measured throughput “very close” to uncongested throughput
increase cwnd linearly /* since path not congested */
else if measured throughput “far below” uncongested throughout
decrease cwnd linearly /* since path is congested */
ENSF 462 – Networked Systems Transport
Layer: 3-78
Delay-based TCP congestion control

▪ congestion control without inducing/forcing loss


▪ maximizing throughout (“keeping the just pipe full… ”) while keeping
delay low (“…but not fuller”)
▪ a number of deployed TCPs take a delay-based approach
▪ BBR deployed on Google’s (internal) backbone network

ENSF 462 – Networked Systems Transport


Layer: 3-79
Explicit congestion notification (ECN)
TCP deployments often implement network-assisted congestion control:
▪ two bits in IP header (ToS field) marked by network router to indicate congestion
• policy to determine marking chosen by network operator
▪ congestion indication carried to destination
▪ destination sets ECE bit on ACK segment to notify sender of congestion
▪ involves both IP (IP header ECN bit marking) and TCP (TCP header C,E bit marking)
source TCP ACK segment
destination
application application
TCP ECE=1
TCP
network network
link link
physical physical

ECN=10 ECN=11

IP datagram
ENSF 462 – Networked Systems Transport
Layer: 3-80
TCP fairness
Fairness goal: if K TCP sessions share same bottleneck link of
bandwidth R, each should have average rate of R/K
TCP connection 1

bottleneck
TCP connection 2 router
capacity R

ENSF 462 – Networked Systems Transport


Layer: 3-81
Q: is TCP Fair?
Example: two competing TCP sessions:
▪ additive increase gives slope of 1, as throughout increases
▪ multiplicative decrease decreases throughput proportionally
Full bandwidth
R utilization line equal bandwidth share
Is TCP fair?
A: Yes, under idealized
loss: decrease window by factor of 2 assumptions:
congestion avoidance: additive increase ▪ same RTT
loss: decrease window by factor of 2
congestion avoidance: additive increase ▪ fixed number of sessions
only in congestion
avoidance

Connection 1 throughput R
ENSF 462 – Networked Systems Transport
Layer: 3-82
Fairness: must all network apps be “fair”?
Fairness and UDP Fairness, parallel TCP
▪ multimedia apps often do not connections
use TCP ▪ application can open multiple
• do not want rate throttled by
congestion control parallel connections between two
hosts
▪ instead use UDP:
• send audio/video at constant rate, ▪ web browsers do this , e.g., link of
tolerate packet loss rate R with 9 existing connections:
▪ there is no “Internet police” • new app asks for 1 TCP, gets rate R/10
policing use of congestion • new app asks for 11 TCPs, gets R/2
control

ENSF 462 – Networked Systems Transport


Layer: 3-83
Go-Back-N: sender
▪ sender: “window” of up to N, consecutive transmitted but unACKed pkts
• k-bit seq # in pkt header

▪ cumulative ACK: ACK(n): ACKs all packets up to, including seq # n


• on receiving ACK(n): move window forward to begin at n+1
▪ timer for oldest in-flight packet
▪ timeout(𝑛): retransmit packet 𝑛 and all higher seq # packets in window
ENSF 462 – Networked Systems Transport
Layer: 3-84
Go-Back-N: receiver
▪ ACK-only: always send ACK for correctly-received packet so far, with
highest in-order seq #
• may generate duplicate ACKs
• need only remember rcv_base
▪ on receipt of out-of-order packet:
• can discard (don’t buffer) or buffer: an implementation decision
• re-ACK pkt with highest in-order seq #

Receiver view of sequence number space:


received and ACKed

… … Out-of-order: received but not ACKed

rcv_base
Not received
ENSF 462 – Networked Systems Transport
Layer: 3-85
Go-Back-N in action
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
012345678 send pkt2 receive pkt0, send ack0
012345678 send pkt3 Xloss receive pkt1, send ack1
(wait)
receive pkt3, discard,
012345678 rcv ack0, send pkt4 (re)send ack1
012345678 rcv ack1, send pkt5 receive pkt4, discard,
(re)send ack1
ignore duplicate ACK receive pkt5, discard,
(re)send ack1
pkt 2 timeout
012345678 send pkt2
012345678 send pkt3
012345678 send pkt4 rcv pkt2, deliver, send ack2
012345678 send pkt5 rcv pkt3, deliver, send ack3
rcv pkt4, deliver, send ack4
rcv pkt5, deliver, send ack5

ENSF 462 – Networked Systems Transport


Layer: 3-86
Selective repeat: the approach
▪pipelining: multiple packets in flight
▪receiver individually ACKs all correctly received packets
• buffers packets, as needed, for in-order delivery to upper layer
▪sender:
• maintains (conceptually) a timer for each unACKed pkt
• timeout: retransmits single unACKed packet associated with timeout
• maintains (conceptually) “window” over N consecutive seq #s
• limits pipelined, “in flight” packets to be within this window

ENSF 462 – Networked Systems Transport


Layer: 3-87
Selective repeat: sender, receiver windows

ENSF 462 – Networked Systems Transport


Layer: 3-88
Selective repeat: sender and receiver
sender receiver
data from above: packet n in [rcvbase, rcvbase+N-1]
▪ if next available seq # in ▪ send ACK(n)
window, send packet ▪ out-of-order: buffer
timeout(n): ▪ in-order: deliver (also deliver
buffered, in-order packets),
▪ resend packet n, restart timer advance window to next not-yet-
ACK(n) in [sendbase,sendbase+N-1]: received packet
▪ mark packet n as received packet n in [rcvbase-N,rcvbase-1]
▪ if n smallest unACKed packet, ▪ ACK(n)
advance window base to next otherwise:
unACKed seq # ▪ ignore

ENSF 462 – Networked Systems Transport


Layer: 3-89
Selective Repeat in action
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
012345678 send pkt2 receive pkt0, send ack0
012345678 send pkt3 Xloss receive pkt1, send ack1
(wait)
receive pkt3, buffer,
012345678 rcv ack0, send pkt4 send ack3
012345678 rcv ack1, send pkt5
receive pkt4, buffer,
record ack3 arrived send ack4
receive pkt5, buffer,
pkt 2 timeout send ack5
012345678 send pkt2
012345678 (but not 3,4,5)
012345678 rcv pkt2; deliver pkt2,
012345678 pkt3, pkt4, pkt5; send ack2

Q: what happens when ack2 arrives?

ENSF 462 – Networked Systems Transport


Layer: 3-90
Selective repeat: a dilemma!
sender window receiver window
(after receipt) (after receipt)

0123012 pkt0
example: 0123012 pkt1 0123012
0123012 pkt2 0123012
▪ seq #s: 0, 1, 2, 3 (base 4 counting) 0123012
pkt3
▪ window size=3
0123012
X
0123012
pkt0 will accept packet
with seq number 0
(a) no problem
0123012 pkt0
0123012 pkt1 0123012
0123012 pkt2 X 0123012
X 0123012
X
timeout
retransmit pkt0
0123012 pkt0
will accept packet
with seq number 0
(b) oops!
ENSF 462 – Networked Systems Transport
Layer: 3-91
Selective repeat: a dilemma!
sender window receiver window
(after receipt) (after receipt)

0123012 pkt0
example: 0123012 pkt1 0123012
pkt2 0123012
▪ seq #s: 0, 1, 2, 3 (base 4 counting) ▪0receiver
123012
can’t
0123012
0see
1 2 3sender side
pkt3
▪ window size=3
012
▪0receiver X
123012
behavior pkt0 will accept packet
with seq number 0
identical in both
(a) no problem
cases!
▪0something’s
123012 pkt0
Q: what relationship is needed 0(very)
1 2 3 0 1wrong!
2 pkt1 0123012
pkt2 X
between sequence # size and 0123012
X
0123012
0123012
window size to avoid problem X
timeout
in scenario (b)? retransmit pkt0
0123012 pkt0
will accept packet
with seq number 0
(b) oops!
ENSF 462 – Networked Systems Transport
Layer: 3-92
Knowledge Check

• Which of the following statements about TCP‘s Additive-increase-


multiplicative-decrease (AIMD) algorithm are true?
a) AIMD is a network-assisted approach to congestion control.
b) AIMD is an end-end approach to congestion control.
c) AIMD always cuts the congestion window size, cwnd, in half whenever loss
is detected.
d) AIMD cuts the congestion window size, cwnd, in half whenever loss is
detected by a triple duplicate ACK.
e) AIMD cuts the congestion window size, cwnd, i to 1 whenever a timeout
occurs.
f) AIMD uses the measured RTT delay to detect congestion.
g) AIMD uses observed packet loss to detect congestion.

ENSF 462 – Networked Systems 93


Knowledge Check
• Suppose the initial value of the sequence number is 0 and
every segment sent to the receiver each contains 100 bytes.
The delay between the sender and receiver is 5 time units,
and so the first segment arrives at the receiver at t = 6. The
segment sent at t=4 is lost, as is the ACK segment sent at t=7.
• What is the sender action at t = 11 upon receipt of the ACK?
a) Increase the congestion window size, move the window base
forward by 2, and send new segments, as available and as
allowed by the congestion window
b) Increase the congestion window size, move the window base
forward by 1, and send new segments, as available and as
allowed by the congestion window
c) Keep the congestion window size the same but send new
segments, as available and as allowed by the congestion window.
d) Do nothing.
e) Send an ACK to the ACK.
ENSF 462 – Networked Systems 94
Knowledge Check
• Suppose the initial value of the sequence number is 0 and
every segment sent to the receiver each contains 100 bytes.
The delay between the sender and receiver is 5 time units,
and so the first segment arrives at the receiver at t = 6. The
segment sent at t=4 is lost, as is the ACK segment sent at t=7.
• What is the sender action at t = 13 upon receipt of the ACK?
a) Increase the congestion window size, move the window base
forward by 2, and send new segments, as available and as
allowed by the congestion window
b) Increase the congestion window size, move the window base
forward by 1, and send new segments, as available and as
allowed by the congestion window
c) Keep the congestion window size the same but send new
segments, as available and as allowed by the congestion window.
d) Do nothing.
e) Send an ACK to the ACK.
ENSF 462 – Networked Systems 95
Knowledge Check
• Suppose the initial value of the sequence number is 0 and
every segment sent to the receiver each contains 100 bytes.
The delay between the sender and receiver is 5 time units,
and so the first segment arrives at the receiver at t = 6. The
segment sent at t=4 is lost, as is the ACK segment sent at t=7.
• What does the sender do at t=16? You can assume for this
question that no timeouts have occurred.
a) Sets its cwnd window value to 1, and retransmit the segment
with sequence number 300
b) Cut its value of cwnd in half, and retransmit the segment with
sequence number 300
c) Inform the upper layer that the connection is terminated, and
close the socket.
d) Do nothing except increment the number of duplicate ACKs
received by 1.

ENSF 462 – Networked Systems 96


Knowledge Check
• Suppose the initial value of the sequence number is 0 and
every segment sent to the receiver each contains 100 bytes.
The delay between the sender and receiver is 5 time units,
and so the first segment arrives at the receiver at t = 6. The
segment sent at t=4 is lost, as is the ACK segment sent at t=7.
• What does the sender do at t=17? You can assume for this
question that no timeouts have occurred.
a) Sets its cwnd window value to 1, and retransmit the segment
with sequence number 300
b) Cut its value of cwnd in half, and retransmit the segment with
sequence number 300
c) Inform the upper layer that the connection is terminated, and
close the socket.
d) Do nothing except increment the number of duplicate ACKs
received by 1.

ENSF 462 – Networked Systems 97


Chapter 3: summary
▪ principles behind transport Up next:
layer services: ▪ leaving the network
• multiplexing, demultiplexing “edge” (application,
• reliable data transfer transport layers)
• flow control ▪ into the network “core”
• congestion control
▪ two network-layer
▪ instantiation, implementation chapters:
in the Internet • data plane
• UDP • control plane
• TCP

ENSF 462 – Networked Systems Transport


Layer: 3-98
Additional Chapter 3 slides

ENSF 462 – Networked Systems 99


Explaining TCP sequence number
• Sending a stream of data (a file), consisting of 500,000 bytes
• MSS = 1,000 bytes, the first byte of the data stream is numbered 0
• ➔ TCP constructs 500 segments out of the data stream.
• The seq # for a segment is the byte-stream number of the first byte in the segment.
• ➔ The 1st segment gets assigned sequence number 0, the 2nd segment: seq # 1,000, the 3rd
segment: seq # 2,000, and so on.
• Each sequence number is inserted in the sequence number field in the header of the
appropriate TCP segment.

ENSF 462 – Networked Systems 100

You might also like