Skip to content

Comments

Collect MTU from skb->_skb_refdst#388

Merged
brb merged 1 commit intocilium:mainfrom
jschwinger233:gray/mtu
Jul 7, 2024
Merged

Collect MTU from skb->_skb_refdst#388
brb merged 1 commit intocilium:mainfrom
jschwinger233:gray/mtu

Conversation

@jschwinger233
Copy link
Member

@jschwinger233 jschwinger233 commented Jun 14, 2024

According to source code, kernel uses MTU from skb->_skb_refdst. Let pwru collect MTU from there using the same logic.

It requires to cast skb->_skb_refdst to dst_entry*, then fetch dst_metric_raw(dst, RTAX_MTU) and dst->dev->mtu.

With this patch, pwru can output the correct MTU used by OS.

Case 1: Cilium could reduce the route MTU to 1423 inside a pod. This couldn't be detected by pwru until this patch because it only checked link MTU.

Case 2: Xfrm could reduce the route MTU to 1446 in ip_forward(). Pwru must inspect route MTU from skb->_skb_dstref to understand that, this patch makes it happen.

// https://elixir.bootlin.com/linux/v6.5/source/net/ipv4/ip_forward.c#L86
// net/ipv4/ip_forward.c
int ip_forward(struct sk_buff *skb)
{
[...]
	rt = skb_rtable(skb);
[...]
	mtu = ip_dst_mtu_maybe_forward(&rt->dst, true);
	if (ip_exceeds_mtu(skb, mtu)) {
		IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS);
		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
			  htonl(mtu));
		SKB_DR_SET(reason, PKT_TOO_BIG);
		goto drop;
	}
[...]
}

// include/linux/skbuff.h
static inline struct rtable *skb_rtable(const struct sk_buff *skb)
{
	return (struct rtable *)skb_dst(skb);
}

// include/linux/skbuff.h
static inline struct dst_entry *skb_dst(const struct sk_buff *skb)
{
[...]
	return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK);
}

// include/net/ip.h
static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
						    bool forwarding)
{
[...]
	mtu = dst_metric_raw(dst, RTAX_MTU);
	if (mtu)
		goto out;

	mtu = READ_ONCE(dst->dev->mtu);
[...]

// include/net/dst.h
#define DST_METRICS_FLAGS		0x3UL
#define __DST_METRICS_PTR(Y)	\
	((u32 *)((Y) & ~DST_METRICS_FLAGS))
#define DST_METRICS_PTR(X)	__DST_METRICS_PTR((X)->_metrics)

}

@jschwinger233 jschwinger233 marked this pull request as ready for review June 14, 2024 10:49
@jschwinger233 jschwinger233 requested a review from a team as a code owner June 14, 2024 10:49
@jschwinger233 jschwinger233 requested review from brb and removed request for a team June 14, 2024 10:49
@jschwinger233 jschwinger233 marked this pull request as draft June 17, 2024 17:30
According to source code, kernel uses MTU from skb->_skb_refdst. Let
pwru collect MTU from there using the same logic.

It requires to cast skb->_skb_refdst to dst_entry*, then fetch
dst_metric_raw(dst, RTAX_MTU) and dst->dev->mtu.

```
// https://elixir.bootlin.com/linux/v6.5/source/net/ipv4/ip_forward.c#L86
// net/ipv4/ip_forward.c
int ip_forward(struct sk_buff *skb)
{
[...]
	rt = skb_rtable(skb);
[...]
	mtu = ip_dst_mtu_maybe_forward(&rt->dst, true);
	if (ip_exceeds_mtu(skb, mtu)) {
		IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS);
		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
			  htonl(mtu));
		SKB_DR_SET(reason, PKT_TOO_BIG);
		goto drop;
	}
[...]
}

// include/linux/skbuff.h
static inline struct rtable *skb_rtable(const struct sk_buff *skb)
{
	return (struct rtable *)skb_dst(skb);
}

// include/linux/skbuff.h
static inline struct dst_entry *skb_dst(const struct sk_buff *skb)
{
[...]
	return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK);
}

// include/net/ip.h
static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
						    bool forwarding)
{
[...]
	mtu = dst_metric_raw(dst, RTAX_MTU);
	if (mtu)
		goto out;

	mtu = READ_ONCE(dst->dev->mtu);
[...]
}

// include/net/dst.h
	((u32 *)((Y) & ~DST_METRICS_FLAGS))
```

With this patch, pwru can output the correct MTU used by OS.

Case 1: Cilium could reduce the route MTU to 1423 inside a pod. This
can't be detected by pwru because it only checks link MTU.

Case 2: Xfrm could reduce the route MTU to 1446 in ip_forward(). Pwru
must inspect route MTU from skb->_skb_dstref to understand that.

Signed-off-by: gray <[email protected]>
@jschwinger233 jschwinger233 marked this pull request as ready for review June 18, 2024 05:28
Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@brb brb merged commit b33c3bb into cilium:main Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants