TAGS: | | | | |

Routing Forwarding Engines: Part 4 – Discarding, Punting, and DPDK

Russ White

For the penultimate post in this series on getting from the routing to the forwarding table, let’s look at the diagram below and ask a question.

What happens if the device ingests a packet that the Forwarding Engine (FE) cannot process for some reason? Perhaps the packet requires deep packet inspection or contains an IP header like the IPv6 hop-by-hop header. There are two options: discarding and punting.

Source: Russ White

 

 

First, the device could discard the packet. Discarding packets is much more common than you might think. A lot of high-end routers will discard packets the FE does not support. Many of these devices, especially routers and switches, are designed to process packets consistently and quickly. Pulling packets off the FE data plane onto some processor for switching can cause large delays in packet delivery, cause massive amounts of jitter, and create out-of-order packet processing. At higher packet-switching speeds, the device’s general-purpose processor (or processor) cannot come close to the switching speed needed to support a fraction of the traffic flowing through the data plane.

Second, the device could “punt” the packet to the processor. In fact, older devices, such as the Cisco AGS+, 2500, and 7200, did not have FEs. These devices had two switching paths:

  • Interrupt, or fast, switching. The processor is interrupted for each group of packets received. The processor attempts to switch the entire packet during this interrupt.
  • Process switching. The process context switches, loading a packet switching process.

The term “punt” refers to a packet being handed off to the processor by the FE and a packet being handed off from the fast path to the process switching path—essentially, any time a packet is “handed up the stack” for processing, it is punted.

As FEs have become feature-rich and packet processing speed has become more critical, punting has become rarer. However, not all devices are designed purely to forward packets from one interface to another.
Not every device has a full-blown FE, and most modern (kernel-based) operating systems (such as Linux) are not designed to allow fast switching or switching a packet in a single interrupt of the main processor. For instance, a database service doesn’t forward packets, but it does consume and transmit a lot of packets. How can these devices support higher packet forwarding rates?

These devices need to blend offloading some packet switching chores to whatever hardware they happen to have on board along with high-speed packet processing on the main processor.

Data Plane Development Kit (DPDK)

The Data Plane Development Kit (DPDK) was developed to increase packet switching and processing speed in kernel-based operating systems such as Linux. What must a process running in user space do to process a packet?

  1. Move the pointer to the packet buffer off of the receive ring into a queue in kernel memory
  2. Because user-space applications cannot access kernel memory space, where the packet is stored, they have to copy the packet’s contents into a buffer mapped to the switching process’ memory space (with the support of the Memory Mapping Unit, or MMU)
  3. Switch control from the kernel process to the user space application
  4. The user space application must make whatever modifications are needed to the packet
  5. Copy the packet from user space back into the correct memory location in the kernel (with support from the MMU)
  6. Switch control from the user space application to the kernel process
  7. Continue running the kernel protocol stack thread to copy the packet to the correct output queue

This packet switching process is very context-switch and memory-copy intensive. In fact, the memory copies can, by themselves, take up a lot of time. DPDK solves this problem in an interesting way—it maps a common memory space between the kernel and a packet-processing application in user space.

Source: Russ White

 

A developer can write a piece of software that handles packets as needed, incorporating the DPDK library into their application. Applications switch packets in one of two modes: polling or interrupt-driven. Most DPDK applications choose to implement polling mode. To switch packets using this application:

  • Packets are received on the inbound interface
  • Pointers to the packet are moved from the interface’s receive ring onto a DPDK-managed ring buffer (or First In First Out [FIFO] queue)
  • The kernel schedules the switching application to run periodically on the processor
  • Once the application runs, it polls the queue, receiving information about all the packets waiting to be processed
  • The application can either process the packets directly on the ring buffer or consume the packets by copying them someplace else in the application’s memory
  • The application can also insert new packets—for instance, generated by a local database application—directly into the ring buffer
  • The application stops processing (or is moved off the processor by the scheduler)
  • The protocol stack, running in the kernel, moves the pointers to each processed packet buffer off of the ring buffer and onto the correct output queue

Context switches are still required to process packets with DPDK, but packets are copied once off the wire and once back onto the wire. Any given packet remains in the same memory location and is accessible to both kernel and user space code for the entire time the packet is processed.

Some network processors can offload most of the kernel’s work into hardware, greatly increasing the processing speed. DPDK implementations can process up to around 15 million packets per second, reaching wire speed on some kinds of interfaces.

DPDK maps the packet buffers normally stored in kernel memory into the user application space on Linux kernels and allows some processing to be offloaded to hardware processors on network interface cards.

So why not move processing from user space into the kernel? In the last post in this series, we’ll look at a system that does just that—eBPF.

Leave a Comment