Linux Network Stack - EN
Linux Network Stack - EN
BACKEND
TX
VM Paravirtual network card KVM
X
T
TXqueue → Rows: Recv-Q i Send-Q [Link].rmem_max [Link].wmem_max
FRONTEND
VirtIO
( virtio_net)
System calls (SCI) & Netlink sockets
Device driver
Pseudo functionality RX [Link].rmem_default [Link].wmem_default
→[ eth1 ]
SKB (sk_buff)
R
X
(virtio_net) (Hypervisor)
VirtIO
(depends on implementation)
X
T
Socket buffer
LRO, TSO,
RX/TX Checksum offload, rmem wmem
VLAN, Multi-Queue, … TX RX (SKB) (SKB) TCP Window scaling (on/off) set in: TCP UDP
TSO TX
R
X
net.ipv4.tcp_window_scaling net.ipv4.tcp_mem net.ipv4.udp_mem
(VM) Virtual Machine
TX
X
T
TXqueue Selected algorithm for congestion control set in:
Checksum
net.ipv4.tcp_rmem net.ipv4.udp_rmem_min
(Depends on driver)
X
T
Ethernet
Device driver
Timers defined in:
TCP/UDP/IP
net.ipv4.tcp_wmem net.ipv4.udp_wmem_min
Checksum
Network stack
scheduler (qdisc)
FIN_WAIT_2 : net.ipv4.tcp_fin_timeout
NAPI
TIME_WAIT= 2 x FIN_WAIT_2
Network statistics on this layer: /proc/softirqs ,
Network
R
X
rmem wmem net.ipv4.tcp_tw_reuse
RX and /proc/net/netstat or /proc/net/snmp
Limiting total number of conn. in TCP opening state:
rmem
wmem
R
X
QEMU EMULATOR
® Timers
Net options
(eg. Intel 82540EM) TCP queue & Limits Port range for all connections (per unique IP address): net.ipv4.ip_local_port_range
buffer
GSO
buffer
UDP
Applications
IP
Network stack
Receive Packet Steering (RPS) - for check and change: /sys/class/net/eth0/queues/
SKB
GRO
Transmit Packet Steering (XPS) - for check and change (queue number XX):
Socket
Netlink sockets
RFS
Congestion Control
/sys/class/net/eth0/queues/tx-XX/xps_cpus
Window scaling
Timers
TCP queue & Limits
RFS
GSO
SKB
Generic segmentation offload (GSO) - for enabling: ethtool -K eth0 gso on
TCP
RPS
Kernel 6.3+ BIG TCP (GRO i GSO) ip link eth0 gro_max_size XY gso_max_size XY
buffer
Socket
GRO Buffer for options saved for every packet (eg. interface name, headers, errors, ...) : [Link].optmem_max
RX TX
TXqueue
TAP (RX) Backlog buffer for all network interfaces: [Link].netdev_max_backlog
br ta g buffe
rs
(LXC) container
NetLink
Active scheduler for all network interfaces (sysctl):
Mount namespace
eBPF
5e Sta
Cgroup namespace
NetLink
User namespace
5e nSt 0
UTS namespace
TXqueue
VETH pair 2 ea ck:
a ea ck: [Link].default_qdisc
namespace
TXqueue Bonding/teaming component (eg. LACP) For specific interface; eg. eth0 for fq scheduler:
[veth4@if13] VETH pair 2 f6 f6 ifenslave bond0 eth0 eth1 tc qdisc add dev eth0 root fq
(ID-2) e- e- TC (traffic control)
INGRESS
EGRESS Statistics are visible in: /proc/net/bonding/bondXY
For QDISC buffer - tc option memory_limit
[veth3@if12] 7e 7e RX TX Network
Network (ID-2) scheduler Network scheduler (QDISC) i
namespace TAP (qdisc)
Number of packets that can be buffered before NAPI retrieval.
TXqueue
QDISC međumemorija
NAPI Pooling
If we have LRO and GRO turned on then it refers to
[tap0] TX queue
CGROUPS (Limits):RAM/CPU/DISK/NET TX (tc opcija memory_limit) aggregated/merged packages: [Link].dev_weight
qu
eu Total number of packets that can be fetched at once,
Device driver ele
nb
uff TXquelen (FIFO) buffer common to all network interfaces on the system:
ixgbe → eth0 er Possible to change size (XX):
TXqueue [Link].netdev_budget
qvo65eeaf6e-7e qvb65eeaf6e-7e NAPI ip link set eth0 txqueuelen XX
(depends on device driver) Time in μs for pooling packages:
VETH pair 1 VETH pair 1 Linux bridge Low layer device driver
NAPI Pooling [Link].netdev_budget_usecs
[qvbxxx] or [veth1@if4] eg. [br0] or [vmbr0] (RX) XDP eXpress Data Path
For network statistics on thiy layer (for eth0) , we can use commands like:
[qvoxxx] or [veth2@if5] Work with interrupt req. (IRQ) (if supported)
(ID-1) (ID-1) ip -s link,sar -n EDEV also ethtool -S eth0 for low layer of network.
VETH Tunnel
TXqueue TXqueue
Linux bridge T Or watching directly from files: /proc/net/dev or /proc/net/netstat or
R
OVS Bridge or X X (H
W
/proc/net/snmp
eg. [br-int] OVS bridge FIF
t
H
O)
irec
Interrupt requests (IRQ) per CPU affinity (for IRQ number XY) are set up in:
or VET
or [br-tun], [vmbr0]… Ri
Buffers ng /proc/irq/XY/smp_affinity or with Irqbalance service (/etc/default/irqbalance)
Open vSwitch (OVS) bu
TCP/UDP/IP ffe
Checksum rs
It is possible to increase/decrease this buffer up to a certain hardware limit
br-int (for RX=YY and for TX=ZZ) with command:
qvoxxx (ID1) (visible directly from Linux)
ethtool -G eth0 rx YY tx ZZ
LRO
vmbr0 (OVS Bridge) VLAN Tag xy
br-int (OVS Bridge) patch-tun Hardware accelerated functionality of network cards (eg. eth0), (if exists):
TSO
(visible directly from Linux) (internal patch interface)
(visible directly from Linux)
Receive Side Scaling (RSS) - increasing RX/TX queue(multiqueue) to XX/YY: ethtool -L eth0 rx XX tx YY
Hardware Receive Flow Steering (aRFS) - for enabling: ethtool -K eth0 ntuple on
patch-tun patch-int
RSS
Large receive offload (LRO) - for enabling: ethtool -K eth0 lro on
Implementation
implementation
(visible directly from Linux) Hardware support for VLAN (802.1Q) for TX and RX : ethtool -K eth0 rxvlan on txvlan on
aRFS
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
patch-int
RX
VLAN-Xy Network card speed and duplex setting; eg. 1000Mbps (1Gbps), full duplex:
(port) br-tun (OVS Bridge) (internal VLAN interface) ethtool -s eth0 speed 1000 duplex full
vxlan-c0a8c864 Physical network interfaces (eg. eth0) are connected with TAP, VETH and other
License:
VLAN, VxLAN,
RX
TX
(internal VxLAN interface) logical interfaces with Linux bridge (eg. br0,vmbr) or OVS bridge interface!
VLAN xy GRE, ESP, … VLAN-s are used for network isolation, while VxLAN tunnels (in OVS) usually
or VLAN xy (port) VxLAN x (port) VxLAN y (port)
Connection to: Tunnel to: Tunnel to:
interconnecting servers, by standard OpenStack implementation.
Hrvoje Horvat
VxLAN xy vxlan-Xy Ethernet All physical and logical Linux network interfaces can be connected to Linux .
VM (NOVA) server Xy VM (NOVA) server Xy NET (Neutron) server Xy v.1.45
RX
TX
OID:
tap1 Linux TAP interface name tapXYZ OpenStack TAP interface name
VLAN or VxLAN or direct connection RX TX