0% found this document useful (0 votes)
28 views1 page

Linux Network Stack - EN

The document outlines various networking concepts and configurations within a virtualized environment, focusing on components like VirtIO, network interfaces, and packet handling. It discusses system calls, socket buffers, and performance optimizations such as offloading techniques and congestion control settings. Additionally, it covers network statistics and management commands for monitoring and adjusting network performance in Linux systems.

Uploaded by

digidocs1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views1 page

Linux Network Stack - EN

The document outlines various networking concepts and configurations within a virtualized environment, focusing on components like VirtIO, network interfaces, and packet handling. It discusses system calls, socket buffers, and performance optimizations such as offloading techniques and congestion control settings. Additionally, it covers network statistics and management commands for monitoring and adjusting network performance in Linux systems.

Uploaded by

digidocs1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

r = receive (RX)

Aplications netstat -tunap,ss -anm Sysctl variables w = write (TX)

BACKEND
TX
VM Paravirtual network card KVM

X
T
TXqueue → Rows: Recv-Q i Send-Q [Link].rmem_max [Link].wmem_max

FRONTEND

VirtIO
( virtio_net)
System calls (SCI) & Netlink sockets

Device driver
Pseudo functionality RX [Link].rmem_default [Link].wmem_default

→[ eth1 ]
SKB (sk_buff)

R
X
(virtio_net) (Hypervisor)

VirtIO
(depends on implementation)

X
T
Socket buffer
LRO, TSO,
RX/TX Checksum offload, rmem wmem
VLAN, Multi-Queue, … TX RX (SKB) (SKB) TCP Window scaling (on/off) set in: TCP UDP
TSO TX
R
X
net.ipv4.tcp_window_scaling net.ipv4.tcp_mem net.ipv4.udp_mem
(VM) Virtual Machine

TX

X
T
TXqueue Selected algorithm for congestion control set in:

Low layer of device driver


(e1000e) → [ eth0 ]
TX net.ipv4.tcp_congestion_control
Linux Socket

Checksum
net.ipv4.tcp_rmem net.ipv4.udp_rmem_min

(Depends on driver)
X
T

Ethernet
Device driver
Timers defined in:

TCP/UDP/IP
net.ipv4.tcp_wmem net.ipv4.udp_wmem_min

Checksum
Network stack
scheduler (qdisc)

FIN_WAIT_2 : net.ipv4.tcp_fin_timeout

NAPI
TIME_WAIT= 2 x FIN_WAIT_2
Network statistics on this layer: /proc/softirqs ,
Network

Connections in Time_wait state reuse:


/proc/net/softnet_stat,/proc/net/dev

R
X
rmem wmem net.ipv4.tcp_tw_reuse
RX and /proc/net/netstat or /proc/net/snmp
Limiting total number of conn. in TCP opening state:

rmem

wmem
R
X

RX TCP Visible with the next commands:


[Link]
VM virtual (emulated) netstat -i, netstat -s and ss and also
+Bridge,Bond,…

Window scaling TCP Syn queue buffer


NetLink

UDP with : nstat and sar -n DEV


network card
Netfilter

(Max number of half-open connections per port):


Congestion Control
net.ipv4.tcp_max_syn_backlog

QEMU EMULATOR
® Timers
Net options

(eg. Intel 82540EM) TCP queue & Limits Port range for all connections (per unique IP address): net.ipv4.ip_local_port_range
buffer

Routing between network interfaces defined here: net.ipv4.ip_forward


Backlog

GSO
buffer

Enabling of non local IP binding (eg. for VRRP protocol): net.ipv4.ip_nonlocal_bind , . . .

UDP

Applications
IP

Network stack
Receive Packet Steering (RPS) - for check and change: /sys/class/net/eth0/queues/

SKB
GRO

System calls (SCI) and


RPS

Transmit Packet Steering (XPS) - for check and change (queue number XX):

Socket

Netlink sockets
RFS

Congestion Control
/sys/class/net/eth0/queues/tx-XX/xps_cpus

Window scaling
Timers
TCP queue & Limits
RFS

GSO

Generic receive offload (GRO) - for enabling: ethtool -K eth0 gro on


IP

SKB
Generic segmentation offload (GSO) - for enabling: ethtool -K eth0 gso on

TCP
RPS
Kernel 6.3+ BIG TCP (GRO i GSO) ip link eth0 gro_max_size XY gso_max_size XY

buffer
Socket
GRO Buffer for options saved for every packet (eg. interface name, headers, errors, ...) : [Link].optmem_max
RX TX

TXqueue
TAP (RX) Backlog buffer for all network interfaces: [Link].netdev_max_backlog
br ta g buffe
rs
(LXC) container

Linux inside (LXC) eth0 Linux correlation: or p0 [tap1] (RX) B


acklo
X@if12→Y@if13 or 0, tap1 Netfilter for packet filtering (ebtables, arptables, iptables, nf_tables, conntrack, logging, … )
container cat /sys/class/net/veth3/if{index,link} qb for vm ta fo
r
r6 Ope br p6 Open Bridge component

NetLink
Active scheduler for all network interfaces (sysctl):
Mount namespace

eBPF
5e Sta
Cgroup namespace

NetLink
User namespace

5e nSt 0
UTS namespace

VLAN, VxLAN, VETH, MACVLAN, …


Proces & PID

TXqueue
VETH pair 2 ea ck:
a ea ck: [Link].default_qdisc
namespace

TXqueue Bonding/teaming component (eg. LACP) For specific interface; eg. eth0 for fq scheduler:
[veth4@if13] VETH pair 2 f6 f6 ifenslave bond0 eth0 eth1 tc qdisc add dev eth0 root fq
(ID-2) e- e- TC (traffic control)
INGRESS
EGRESS Statistics are visible in: /proc/net/bonding/bondXY
For QDISC buffer - tc option memory_limit
[veth3@if12] 7e 7e RX TX Network
Network (ID-2) scheduler Network scheduler (QDISC) i
namespace TAP (qdisc)
Number of packets that can be buffered before NAPI retrieval.
TXqueue
QDISC međumemorija

NAPI Pooling
If we have LRO and GRO turned on then it refers to
[tap0] TX queue
CGROUPS (Limits):RAM/CPU/DISK/NET TX (tc opcija memory_limit) aggregated/merged packages: [Link].dev_weight
qu
eu Total number of packets that can be fetched at once,
Device driver ele
nb
uff TXquelen (FIFO) buffer common to all network interfaces on the system:
ixgbe → eth0 er Possible to change size (XX):
TXqueue [Link].netdev_budget
qvo65eeaf6e-7e qvb65eeaf6e-7e NAPI ip link set eth0 txqueuelen XX
(depends on device driver) Time in μs for pooling packages:

VETH pair 1 VETH pair 1 Linux bridge Low layer device driver
NAPI Pooling [Link].netdev_budget_usecs

[qvbxxx] or [veth1@if4] eg. [br0] or [vmbr0] (RX) XDP eXpress Data Path
For network statistics on thiy layer (for eth0) , we can use commands like:
[qvoxxx] or [veth2@if5] Work with interrupt req. (IRQ) (if supported)
(ID-1) (ID-1) ip -s link,sar -n EDEV also ethtool -S eth0 for low layer of network.
VETH Tunnel
TXqueue TXqueue
Linux bridge T Or watching directly from files: /proc/net/dev or /proc/net/netstat or
R
OVS Bridge or X X (H
W
/proc/net/snmp
eg. [br-int] OVS bridge FIF
t
H

O)
irec

Interrupt requests (IRQ) per CPU affinity (for IRQ number XY) are set up in:
or VET

or [br-tun], [vmbr0]… Ri
Buffers ng /proc/irq/XY/smp_affinity or with Irqbalance service (/etc/default/irqbalance)
Open vSwitch (OVS) bu

Network Interface (eg. Intel® i82599)


or d

TCP/UDP/IP ffe
Checksum rs
It is possible to increase/decrease this buffer up to a certain hardware limit
br-int (for RX=YY and for TX=ZZ) with command:
qvoxxx (ID1) (visible directly from Linux)
ethtool -G eth0 rx YY tx ZZ

LRO
vmbr0 (OVS Bridge) VLAN Tag xy
br-int (OVS Bridge) patch-tun Hardware accelerated functionality of network cards (eg. eth0), (if exists):

TSO
(visible directly from Linux) (internal patch interface)
(visible directly from Linux)
Receive Side Scaling (RSS) - increasing RX/TX queue(multiqueue) to XX/YY: ethtool -L eth0 rx XX tx YY
Hardware Receive Flow Steering (aRFS) - for enabling: ethtool -K eth0 ntuple on
patch-tun patch-int
RSS
Large receive offload (LRO) - for enabling: ethtool -K eth0 lro on
Implementation

implementation

(port) (internal patch interface)


TCP segmentation offload (TSO) - for enabling: ethtool -K eth0 tso on
Checksum offload - enabling for receive (RX) and transmit (TX): ethtool -K eth0 rx on tx on
br-tun Scatter-gather - for enabling : ethtool -K eth0 sg on
OpenStack
i / ili

(visible directly from Linux) Hardware support for VLAN (802.1Q) for TX and RX : ethtool -K eth0 rxvlan on txvlan on
aRFS

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

patch-int
RX

VLAN-Xy Network card speed and duplex setting; eg. 1000Mbps (1Gbps), full duplex:
(port) br-tun (OVS Bridge) (internal VLAN interface) ethtool -s eth0 speed 1000 duplex full

vxlan-c0a8c864 Physical network interfaces (eg. eth0) are connected with TAP, VETH and other
License:
VLAN, VxLAN,
RX

TX

(internal VxLAN interface) logical interfaces with Linux bridge (eg. br0,vmbr) or OVS bridge interface!
VLAN xy GRE, ESP, … VLAN-s are used for network isolation, while VxLAN tunnels (in OVS) usually
or VLAN xy (port) VxLAN x (port) VxLAN y (port)
Connection to: Tunnel to: Tunnel to:
interconnecting servers, by standard OpenStack implementation.
Hrvoje Horvat
VxLAN xy vxlan-Xy Ethernet All physical and logical Linux network interfaces can be connected to Linux .
VM (NOVA) server Xy VM (NOVA) server Xy NET (Neutron) server Xy v.1.45
RX

TX

Firewall since they are passing through the Netfilter component.


(Internal Port) (internal VxLAN interface) Checksum MTU size exists and can be changed on any network interface in Linux!

OID:
tap1 Linux TAP interface name tapXYZ OpenStack TAP interface name
VLAN or VxLAN or direct connection RX TX

You might also like