When you run a DPDK application with the AF_XDP poll-mode driver, every packet destined for your application travels from the NIC through the kernel’s XDP hook and into userspace via an AF_XDP socket. That includes packets your application could forward without ever touching userspace — if only the kernel knew how.
In this post I describe how to write a BPF program that sits in front of DPDK’s AF_XDP datapath and offloads packet rewriting to the kernel. Matched flows get their IP/UDP headers rewritten and forwarded at XDP speed, while unmatched traffic passes through to DPDK as usual. The result: the kernel handles the fast path, and DPDK handles the control path.
I’ll use a UDP proxy as the running example — an application that bridges traffic between clients and servers by rewriting IP/UDP headers on every packet. The same technique applies to any DPDK AF_XDP application that forwards known flows with predictable header transformations.
The Problem
Consider a proxy that sits between clients and backend servers. For every active session, it rewrites IP addresses and UDP ports on both legs — essentially a specialized NAT. In a pure DPDK setup, every single packet flows through userspace:
For a simple header rewrite on a known flow, that round trip through userspace is unnecessary overhead. The kernel’s XDP hook can do the same rewrite at line rate, before the packet ever reaches the AF_XDP socket.
The goal is a hybrid datapath where:
- Known flows (established sessions) get rewritten and forwarded entirely in XDP
- Unknown flows (new connections, control traffic) pass through to DPDK for full processing
- Non-matching traffic (SSH, management) goes to the kernel stack as usual
Architecture Overview
The XDP program runs at the earliest point in the kernel’s receive path, before the normal networking stack and before AF_XDP delivery. It makes a per-packet decision:
Three XDP actions do the heavy lifting:
- XDP_PASS — hand the packet to the kernel networking stack (for management traffic not destined to our application)
- XDP_TX — transmit back out the same interface the packet arrived on
- XDP_REDIRECT — forward to a different interface, or into an AF_XDP socket for DPDK
BPF Maps: The Shared State
BPF maps are the communication channel between the XDP program (kernel) and the DPDK application (userspace). We define four maps, each with a distinct role.
Rules Map — The Forwarding Table
The core data structure is a hash map keyed by a 5-tuple (interface index + source/destination IP and port). The value contains the rewrite target and per-flow packet counters:
/* 5-tuple match key. All fields in network byte order. */
struct flow_key {
__u32 ifindex; /* ingress interface */
__u32 srcip;
__u32 dstip;
__u16 srcport;
__u16 dstport;
};
/* Rewrite target + counters. */
struct flow_value {
struct flow_key rewrite; /* new headers */
__u64 packets;
__u64 bytes;
};
The map definition uses LIBBPF_PIN_BY_NAME so it persists across program reloads and is accessible from userspace via /sys/fs/bpf/xdp_fwd_rules:
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 524288);
__type(key, struct flow_key);
__type(value, struct flow_value);
__uint(pinning, LIBBPF_PIN_BY_NAME);
} xdp_fwd_rules SEC(".maps");
Each bidirectional flow creates two rules (ingress and egress), so the 524K entry limit supports up to 262K concurrent streams.
IP Allowlist — Steering Traffic to DPDK
Not every packet with a matching destination IP should be rewritten. New flows and control traffic need to reach DPDK for processing. The IP allowlist map controls which destination IPs get redirected to AF_XDP sockets when they don’t match a rewrite rule:
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1024);
__type(key, __u32); /* IPv4 address */
__type(value, struct ip_counter); /* packet counter */
__uint(pinning, LIBBPF_PIN_BY_NAME);
} xdp_local_ips SEC(".maps");
The userspace application populates this map with all IP addresses it serves. If a packet’s destination IP is in this map, it gets redirected to DPDK via AF_XDP. If not, it passes to the kernel stack — ensuring SSH, DNS, and other management traffic is unaffected.
AF_XDP Socket Map
The xsks_map is an XSKMAP type that DPDK’s AF_XDP PMD manages automatically. It maps RX queue indices to AF_XDP socket file descriptors:
struct {
__uint(type, BPF_MAP_TYPE_XSKMAP);
__uint(max_entries, 256);
__type(key, __u32); /* RX queue index */
__type(value, __u32); /* XSK file descriptor */
} xsks_map SEC(".maps");
We never write to this map from our application — DPDK populates it when it opens AF_XDP sockets during port initialization.
Per-CPU Stats
A PERCPU_ARRAY map tracks datapath counters without any locking:
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__uint(max_entries, XDP_STAT_MAX);
__type(key, __u32);
__type(value, __u64);
__uint(pinning, LIBBPF_PIN_BY_NAME);
} xdp_counters SEC(".maps");
Each CPU gets its own copy of the counters. Userspace sums across all CPUs when reading. The counter indices track each decision point in the datapath:
enum xdp_stat {
XDP_STAT_REWRITE_HIT, /* rule matched, packet rewritten */
XDP_STAT_REWRITE_MISS, /* no rule for this UDP flow */
XDP_STAT_FIB_DROP, /* FIB lookup failed */
XDP_STAT_HAIRPIN_TX, /* forwarded on same interface */
XDP_STAT_REDIRECT_OUT, /* forwarded to different interface */
XDP_STAT_AFXDP_REDIRECT, /* sent to DPDK via AF_XDP */
XDP_STAT_PASS_TO_KERNEL, /* passed to kernel stack */
XDP_STAT_MAX,
};
Sharing Structures Between Kernel and Userspace
One practical challenge: the BPF program uses kernel types (__u32, __u64), while the userspace C application uses standard types (uint32_t, uint64_t). A shared header bridges both worlds with conditional typedefs:
#ifdef __BPF__
#include <linux/types.h>
typedef __u32 fwd_u32;
typedef __u64 fwd_u64;
#else
#include <stdint.h>
typedef uint32_t fwd_u32;
typedef uint64_t fwd_u64;
#endif
The BPF Makefile passes -D__BPF__ when compiling the kernel program, while the userspace build gets the standard types. Both see identical struct layouts — critical for correct map access from either side.
The XDP Program
Entry Point
The XDP entry function xdp_main() implements the decision tree from the architecture diagram. It parses headers, attempts a rewrite, and falls back to AF_XDP redirection:
SEC("xdp")
int xdp_main(struct xdp_md *ctx)
{
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;
struct iphdr *iph = extract_ip_hdr(data, data_end);
if (!iph)
return XDP_PASS;
if (iph->protocol == IPPROTO_UDP &&
(void *)iph + iph->ihl * 4 + sizeof(struct udphdr) < data_end)
{
struct udphdr *udp = (void *)iph + iph->ihl * 4;
if (lookup_and_rewrite(ctx->ingress_ifindex, iph, udp, data_end) != -1)
return resolve_and_forward(ctx, data, iph);
}
return steer_to_dpdk(iph, ctx);
}
The logic is intentionally flat: parse, try rewrite, fall back. Non-IPv4 packets pass to the kernel immediately. Non-UDP packets (like ICMP) skip the rewrite attempt and go straight to the AF_XDP redirection check.
Packet Parsing
BPF programs must prove to the verifier that every pointer dereference is within packet bounds. The extract_ip_hdr() helper does this for Ethernet + IPv4:
static __always_inline struct iphdr *extract_ip_hdr(void *data, void *data_end)
{
struct ethhdr *eth = data;
if ((void *)(eth + 1) >= data_end)
return NULL;
if (eth->h_proto != bpf_htons(ETH_P_IP))
return NULL;
struct iphdr *iph = (void *)(eth + 1);
if ((void *)(iph + 1) >= data_end)
return NULL;
return iph;
}
Each bounds check satisfies the verifier and also serves as a protocol filter — only IPv4 packets proceed.
Rewriting Headers
lookup_and_rewrite() is the core of the offload. It builds a 5-tuple key from the packet, looks up the rules map, and rewrites in place:
static __always_inline int lookup_and_rewrite(
__u32 ifindex, struct iphdr *ip, struct udphdr *udp, void *data_end)
{
struct flow_key key = {
.ifindex = ifindex,
.srcip = ip->saddr,
.dstip = ip->daddr,
.srcport = udp->source,
.dstport = udp->dest,
};
struct flow_value *value =
bpf_map_lookup_elem(&xdp_fwd_rules, &key);
if (!value)
return -1;
/* Rewrite IP and UDP headers */
ip->saddr = value->rewrite.srcip;
ip->daddr = value->rewrite.dstip;
udp->source = value->rewrite.srcport;
udp->dest = value->rewrite.dstport;
/* ... checksum fixup and counter updates ... */
return value->rewrite.ifindex;
}
The function returns the target interface index on a hit (used by resolve_and_forward()), or -1 on a miss.
Incremental Checksum Update
After rewriting IP and UDP headers, the checksums must be recalculated. Rather than recomputing from scratch, we use bpf_csum_diff() for an incremental update — only the changed fields contribute to the new checksum:
/* L3 checksum: IP addresses changed */
__u32 l3_old[2] = { key.srcip, key.dstip };
__u32 l3_new[2] = { value->rewrite.srcip, value->rewrite.dstip };
__s64 l3_diff = bpf_csum_diff(l3_old, 8, l3_new, 8, 0);
/* L4 checksum: ports changed (only if UDP checksum is non-zero) */
if (udp->check != 0)
{
__u32 l4_old = (key.dstport << 16) + key.srcport;
__u32 l4_new = (value->rewrite.dstport << 16) + value->rewrite.srcport;
__s64 l4_diff = bpf_csum_diff(&l4_old, 4, &l4_new, 4, l3_diff);
udp->check = fold_csum((0xFFFF & ~udp->check) + l4_diff);
}
ip->check = fold_csum((0xFFFF & ~ip->check) + l3_diff);
The UDP checksum is optional for IPv4 (a zero value means “not computed”), so we skip it when the original checksum is zero. The fold_csum() folds a 64-bit intermediate back to 16 bits:
static __always_inline __u16 fold_csum(__u64 csum)
{
int i;
#pragma unroll
for (i = 0; i < 4; i++)
{
if (csum >> 16)
csum = (csum & 0xffff) + (csum >> 16);
}
return ~csum;
}
FIB Forwarding
After rewriting, the packet has new destination IP addresses but stale MAC addresses. The bpf_fib_lookup() helper queries the kernel’s routing table (FIB) to resolve the next-hop MAC addresses:
static __always_inline int resolve_and_forward(
struct xdp_md *ctx, void *data, struct iphdr *iph)
{
struct bpf_fib_lookup s = { 0 };
s.family = 2; /* AF_INET */
s.tos = iph->tos;
s.l4_protocol = iph->protocol;
s.tot_len = bpf_ntohs(iph->tot_len);
s.ifindex = ctx->ingress_ifindex;
s.ipv4_src = iph->saddr;
s.ipv4_dst = iph->daddr;
int ret = bpf_fib_lookup(ctx, &s, sizeof(s), 0);
if (ret != 0)
return XDP_DROP;
struct ethhdr *eth = data;
__builtin_memcpy(eth->h_dest, s.dmac, 6);
__builtin_memcpy(eth->h_source, s.smac, 6);
if (ctx->ingress_ifindex == s.ifindex)
return XDP_TX; /* hairpin: same interface */
return bpf_redirect(s.ifindex, 0); /* cross-interface forward */
}
One important detail: the flags parameter to bpf_fib_lookup() is set to 0 (not BPF_FIB_LOOKUP_OUTPUT). With BPF_FIB_LOOKUP_OUTPUT, the kernel constrains the lookup to the input interface, which prevents cross-interface forwarding. Without that flag, the FIB lookup can resolve routes through any interface — essential when ingress and egress use different NICs.
AF_XDP Redirection
When a packet doesn’t match a rewrite rule, it may still be destined for our application. The steer_to_dpdk() function checks the IP allowlist and redirects matching traffic to DPDK:
static __always_inline int steer_to_dpdk(
struct iphdr *iph, struct xdp_md *ctx)
{
struct ip_counter *val =
bpf_map_lookup_elem(&xdp_local_ips, &iph->daddr);
if (!val)
return XDP_PASS; /* not our IP, let the kernel handle it */
__sync_fetch_and_add(&val->packets, 1);
return bpf_redirect_map(&xsks_map, ctx->rx_queue_index, XDP_PASS);
}
bpf_redirect_map() sends the packet to the AF_XDP socket for the appropriate RX queue. The third argument (XDP_PASS) is the fallback action if the queue has no socket — in that case the packet goes to the kernel stack.
Building the BPF Program
The BPF object file is compiled with clang targeting the BPF backend:
xdp_fwd.o: xdp_fwd.c xdp_maps.h xdp_common.h xdp_shared.h
clang -O2 -target bpf -D__BPF__ $(INCLUDES) -c $< -o $@ -g
Key compiler flags:
-target bpf— emit BPF bytecode instead of native code-O2— required for the BPF verifier to accept the program (without optimization, the code often contains constructs the verifier rejects)-D__BPF__— activates kernel-side type definitions in the shared header-g— includes debug info forbpftoolintrospection
How DPDK Loads the XDP Program
An important detail: you don’t load the XDP program yourself. DPDK’s AF_XDP PMD handles this automatically.
When you pass --vdev=net_af_xdp0,iface=eth0,xdp_prog=xdp_fwd.o to your DPDK application’s EAL arguments, the AF_XDP PMD:
- Opens the specified
.ofile and loads the XDP program onto the network interface - Creates AF_XDP sockets (one per RX queue) and populates the
xsks_mapwith their file descriptors - Attaches the XDP program to the interface so it runs on every incoming packet
From your application’s perspective, DPDK starts receiving packets through rte_eth_rx_burst() as usual — but now the XDP program is running in front of it, and any BPF maps defined with LIBBPF_PIN_BY_NAME are automatically pinned to /sys/fs/bpf/.
This means the deployment workflow is straightforward:
- Compile the BPF program:
make(producesxdp_fwd.o) - Start your DPDK application with the AF_XDP vdev argument pointing to the
.ofile - DPDK loads the program, creates sockets, pins maps — all transparently
- Your application opens the pinned maps and manages rules at runtime
No ip link set dev eth0 xdp obj ..., no bpftool prog load, no manual socket creation. The AF_XDP PMD is a single integration point that wires together the XDP program, the AF_XDP sockets, and the DPDK poll loop.
Userspace Control Plane
The DPDK application manages the BPF maps at runtime — adding rules when flows are established, deleting them on teardown, and reading stats back. All of this happens through libbpf’s map manipulation API on pinned file descriptors.
Detecting AF_XDP Mode
When DPDK initializes its Ethernet ports, you can query each port’s driver name. If it reports net_af_xdp, you know an XDP program is loaded on the interface and the BPF maps are available:
static int detect_afxdp(uint16_t port_id)
{
struct rte_eth_dev_info dev_info;
if (rte_eth_dev_info_get(port_id, &dev_info) != 0)
return 0;
return strcmp(dev_info.driver_name, "net_af_xdp") == 0;
}
This check gates all offload logic — when DPDK uses a hardware PMD (e.g., mlx5, ixgbe), the BPF maps don’t exist and there’s nothing to manage.
Accessing Pinned Maps
BPF maps declared with LIBBPF_PIN_BY_NAME are pinned to /sys/fs/bpf/<map_name>. From userspace, bpf_obj_get() opens a pinned map and returns a file descriptor you can use with all bpf_map_* functions:
#include <bpf/bpf.h>
int rules_fd = bpf_obj_get("/sys/fs/bpf/xdp_fwd_rules");
if (rules_fd < 0) {
fprintf(stderr, "Cannot open rules map: %s\n", strerror(errno));
return -1;
}
/* Use rules_fd with bpf_map_update_elem / bpf_map_lookup_elem / etc. */
close(rules_fd); /* close when done */
This is the bridge between kernel and userspace — the same hash table the XDP program reads at packet speed is directly writable from your DPDK application.
Installing a Rule
To offload a flow, build the match key and rewrite value, then insert into the rules map. Here’s an example that offloads a bidirectional UDP flow between a client (10.0.1.50:20000) and a server (10.0.2.1:40000), with the proxy listening on 10.0.0.100:30000:
#include <arpa/inet.h>
/* Ingress rule: client → proxy, rewrite to → server */
struct flow_key ingress_key = {
.ifindex = 3, /* rx interface */
.srcip = inet_addr("10.0.1.50"),
.dstip = inet_addr("10.0.0.100"),
.srcport = htons(20000),
.dstport = htons(30000),
};
struct flow_value ingress_val = {
.rewrite = {
.ifindex = 4, /* tx interface */
.srcip = inet_addr("10.0.2.1"),
.dstip = inet_addr("10.0.1.50"),
.srcport = htons(40000),
.dstport = htons(20000),
},
};
bpf_map_update_elem(rules_fd, &ingress_key, &ingress_val, BPF_ANY);
/* Egress rule: server → proxy, rewrite to → client */
struct flow_key egress_key = {
.ifindex = 4,
.srcip = inet_addr("10.0.1.50"),
.dstip = inet_addr("10.0.2.1"),
.srcport = htons(20000),
.dstport = htons(40000),
};
struct flow_value egress_val = {
.rewrite = {
.ifindex = 3,
.srcip = inet_addr("10.0.0.100"),
.dstip = inet_addr("10.0.1.50"),
.srcport = htons(30000),
.dstport = htons(20000),
},
};
bpf_map_update_elem(rules_fd, &egress_key, &egress_val, BPF_ANY);
Each bidirectional flow needs two rules — one per direction. When the XDP program matches the ingress key, it rewrites the packet and forwards it out interface 4 toward the server. The egress rule handles the return path.
Lazy Offload
A subtlety: you often can’t install rules immediately when a flow is created. If the client is behind NAT, you don’t know its real source IP/port until the first packet arrives. A practical pattern is to let the first packet pass through DPDK (via AF_XDP), learn the NAT-translated address from the packet headers, then install the XDP rule so all subsequent packets bypass userspace:
void on_first_packet(struct flow_entry *flow, struct rte_mbuf *pkt)
{
struct rte_ipv4_hdr *ip = rte_pktmbuf_mtod_offset(pkt,
struct rte_ipv4_hdr *, sizeof(struct rte_ether_hdr));
struct rte_udp_hdr *udp = (struct rte_udp_hdr *)((char *)ip + sizeof(*ip));
/* Now we know the real client address (NAT resolved) */
flow->client_real_ip = ip->src_addr;
flow->client_real_port = udp->src_port;
/* Install both XDP offload rules */
install_xdp_rules(rules_fd, flow);
flow->offloaded = 1;
}
After this point, the XDP program handles the flow at kernel speed. DPDK only sees the first packet.
Deleting Rules
When a flow ends (session teardown, timeout, etc.), delete both rules from the map:
void teardown_flow(int rules_fd, struct flow_entry *flow)
{
bpf_map_delete_elem(rules_fd, &flow->ingress_key);
bpf_map_delete_elem(rules_fd, &flow->egress_key);
flow->offloaded = 0;
}
If NAT information changes mid-flow (the client’s address shifts), delete the old rules and install new ones with the updated addresses.
Reading Per-Flow Stats
Since the XDP program updates counters inside the rule values, userspace can read them back at any time with a simple lookup:
struct flow_value val;
if (bpf_map_lookup_elem(rules_fd, &ingress_key, &val) == 0)
{
printf("packets: %llu, bytes: %llu\n", val.packets, val.bytes);
}
A background timer (every few seconds) can iterate all offloaded flows and sync their BPF-side stats into whatever reporting structures your application uses.
Reading Per-CPU Stats
The xdp_counters map is a PERCPU_ARRAY — each bpf_map_lookup_elem call returns an array of values, one per CPU. Sum them for the global total:
int stats_fd = bpf_obj_get("/sys/fs/bpf/xdp_counters");
int num_cpus = sysconf(_SC_NPROCESSORS_CONF);
uint64_t percpu[num_cpus];
for (uint32_t key = 0; key < XDP_STAT_MAX; key++)
{
uint64_t total = 0;
if (bpf_map_lookup_elem(stats_fd, &key, percpu) == 0)
for (int i = 0; i < num_cpus; i++)
total += percpu[i];
printf("stat[%u] = %llu\n", key, total);
}
IP Allowlist Management
The xdp_local_ips map controls which destination IPs get steered to AF_XDP. Populate it with every IP your application listens on:
int ips_fd = bpf_obj_get("/sys/fs/bpf/xdp_local_ips");
/* Add an IP to the allowlist */
uint32_t ip = inet_addr("10.0.0.100");
struct ip_counter val = { .packets = 0 };
bpf_map_update_elem(ips_fd, &ip, &val, BPF_ANY);
/* Remove an IP */
bpf_map_delete_elem(ips_fd, &ip);
When your set of served IPs changes, iterate the existing map entries, delete any that are no longer needed, and add the new ones. This ensures packets to removed IPs fall back to the kernel stack immediately.
Monitoring
The per-CPU stats map and per-rule counters enable real-time monitoring. Userspace sums the per-CPU values and formats a dashboard:
XDP datapath stats:
Rewrite hits 1284923
Rewrite misses 42
FIB drops 0
Hairpin TX 641200
Redirect out 643723
AF_XDP -> DPDK 38
Pass to kernel 157
Per-rule detail includes live PPS and BPS:
Rule Detail
----------------------------------------
Match (Key):
Interface: ifindex 3
Source: 10.0.1.50:20000
Destination: 10.0.0.100:30000
Rewrite (Value):
Interface: ifindex 4
Source: 10.0.2.1:40000
Destination: 10.0.1.50:20000
Counters:
Packets: 641200
Bytes: 51296000
PPS: 50
BPS: 40000
Summary
The hybrid XDP + AF_XDP architecture gives us the best of both worlds:
- Write a BPF program that does fast-path packet rewriting using hash-map lookups
- Define shared structures in a header compiled by both
clang -target bpfand your C compiler - Let DPDK load the program — the AF_XDP PMD handles XDP attachment, socket creation, and map pinning
- Pin BPF maps to
/sys/fs/bpf/so your userspace application can manage rules at runtime - Use the IP allowlist to steer traffic: matched IPs go to AF_XDP (DPDK), everything else to the kernel
- Offload lazily — let the first packet through DPDK to resolve NAT, then install XDP rules
The key insight is that XDP and AF_XDP are not competing technologies — they compose naturally. AF_XDP gives your DPDK application a kernel-bypass receive path, and XDP gives you a programmable fast path in front of it. By combining both, the kernel handles the steady-state data plane while DPDK handles the exceptions.
References
- BPF and XDP Reference Guide — comprehensive overview of BPF program types, map types, and helper functions
- AF_XDP — scalable packet processing in the kernel — official kernel documentation for AF_XDP sockets
- XDP — eXpress Data Path — kernel documentation covering XDP actions, attachment modes, and driver support
- libbpf API documentation — reference for
bpf_obj_get(),bpf_map_update_elem(), and other userspace BPF functions - DPDK AF_XDP Poll Mode Driver — DPDK documentation for the
net_af_xdpPMD, includingxdp_progparameter and socket map handling - bpf_fib_lookup() helper — man page for BPF helper functions including FIB lookup, checksum diff, and redirect
- XDP Tutorial — hands-on exercises for XDP programming, packet parsing, and map usage
- LIBBPF_PIN_BY_NAME and map pinning — explanation of BPF object lifetime and pinning to
/sys/fs/bpf/











Overview






