Skip to content

portmap: fix nftables backend#1116

Merged
squeed merged 1 commit intocontainernetworking:mainfrom
champtar:portmap-nftables-prerouting-hook
Nov 18, 2024
Merged

portmap: fix nftables backend#1116
squeed merged 1 commit intocontainernetworking:mainfrom
champtar:portmap-nftables-prerouting-hook

Conversation

@champtar
Copy link
Copy Markdown
Contributor

@champtar champtar commented Nov 5, 2024

We can't use dnat from the input hook,
depending on nftables (and kernel ?) version we get
"Error: Could not process rule: Operation not supported"
iptables backend also uses prerouting.

Also 'ip6 protocol tcp' is invalid, so rework / simplify the rules

Fixes 01a94e1
Fixes #1115

@champtar champtar force-pushed the portmap-nftables-prerouting-hook branch 2 times, most recently from 2e32a3f to 92965ed Compare November 5, 2024 21:41
@champtar champtar changed the title portmap: use prerouting hook in nft rules portmap: fix nftables backend Nov 5, 2024
We can't use dnat from the input hook,
depending on nftables (and kernel ?) version we get
"Error: Could not process rule: Operation not supported"
iptables backend also uses prerouting.

Also 'ip6 protocol tcp' is invalid, so rework / simplify the rules

Fixes 01a94e1

Signed-off-by: Etienne Champetier <[email protected]>
@champtar champtar force-pushed the portmap-nftables-prerouting-hook branch from 92965ed to 32d1489 Compare November 5, 2024 22:09
@champtar
Copy link
Copy Markdown
Contributor Author

champtar commented Nov 6, 2024

Tested with

{
  "type": "portmap",
  "capabilities": {"portMappings": true},
  "backend": "nftables",
  "conditionsV4": ["ip", "daddr", "!=", "{ 127.0.0.0/8, 198.19.254.254 }"]
},

@champtar champtar marked this pull request as ready for review November 6, 2024 01:43
Copy link
Copy Markdown
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Name: "prerouting",
Type: knftables.PtrTo(knftables.NATType),
Hook: knftables.PtrTo(knftables.InputHook),
Hook: knftables.PtrTo(knftables.PreroutingHook),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I know what I was confused about: in prerouting it hasn't yet decided where it's going to route the packet to, so you can't use oifname. I was thinking that meant you couldn't use fib daddr type either, but that's wrong; fib daddr type answers "what does the routing table say we should do with this packet?", not "what are we actually going to do with this packet?", so it doesn't depend on the routing decision.

"th dport", e.HostPort,
"dnat", ipX, "addr . port", "to", containerNet.IP, ".", e.ContainerPort,
e.Protocol, "dport", e.HostPort,
"dnat to", net.JoinHostPort(containerNet.IP.String(), strconv.Itoa(e.ContainerPort)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(yeah, this rule was complicated because it's derived from a kube-proxy rule where we look up the "addr . protocol . port" in a map rather than just having a static rule)

@squeed squeed merged commit 9296c5f into containernetworking:main Nov 18, 2024
@champtar champtar deleted the portmap-nftables-prerouting-hook branch November 18, 2024 19:34
@agusdallalba
Copy link
Copy Markdown

agusdallalba commented Oct 17, 2025

This PR introduced a regression.

Let's say I want to forward incoming traffic to UDP port 53 to a pod with IP 2001:db8::53 on port 1053.

The cni_hostport table ends up looking like this:

table ip6 cni_hostport {
	comment "CNI portmap plugin"
	chain hostports {
		udp dport 53 dnat to [2001:db8::53]:1053 comment "d26f04dc7a080fb543e92051a5a5784923334f0713b72ae1c1349c1830a5e889"
	}

	chain hostip_hostports {
	}

	chain prerouting {
		type nat hook prerouting priority -100; policy accept;
		jump hostip_hostports
		jump hostports
	}

	chain output {
		type nat hook output priority -100; policy accept;
		jump hostip_hostports
		fib daddr type local jump hostports
	}

	chain masquerading {
		type nat hook postrouting priority 100; policy accept;
		ip6 saddr 2001:db8::53 ip6 daddr 2001:db8::53 masquerade comment "d26f04dc7a080fb543e92051a5a5784923334f0713b72ae1c1349c1830a5e889"
	}
}

The prerouting chain is not filtered in any way! All UDP 53 traffic passing through the node (including internal pod traffic) will be forwarded to 2001:db8::53:1053, not just traffic destined to the node.

In my case this broke DNS resolution inside the pods. If I had forwarded TCP port 8080 I would have broken internal services using that port.

@danwinship
Copy link
Copy Markdown
Contributor

This PR introduced a regression.

You should open a new issue rather than commenting on a year-old PR.

@champtar
Copy link
Copy Markdown
Contributor Author

I'll try to have a look next week / compare with iptables behavior

@agusdallalba
Copy link
Copy Markdown

Thanks for the reply!

I didn't open a new issue because I'm a bit burned trying to contribute to kubernetes core projects. Most of them demand a lot of work and then ignore it and make you keep on top of the issue/PR for months to years while appeasing the stale bot.

I'll look into the issue next week, I suspect all that's needed is to add the fib daddr type local condition to the prerouting chain.

Cheers!

@champtar
Copy link
Copy Markdown
Contributor Author

I'll look into the issue next week, I suspect all that's needed is to add the fib daddr type local condition to the prerouting chain.

Looking at a system running iptables mode (with iptables-nft), when I run nft list ruleset I get

# Warning: table ip nat is managed by iptables-nft, do not touch!
table ip nat {
...
    	chain OUTPUT {
		type nat hook output priority dstnat; policy accept;
		 counter packets 635 bytes 33142 jump KUBE-SERVICES
		fib daddr type local counter packets 374 bytes 19448 jump CNI-HOSTPORT-DNAT
	}

	chain PREROUTING {
		type nat hook prerouting priority dstnat; policy accept;
		 counter packets 3228 bytes 225010 jump KUBE-SERVICES
		fib daddr type local counter packets 3009 bytes 180540 jump CNI-HOSTPORT-DNAT
	}
...
# Warning: table ip6 nat is managed by iptables-nft, do not touch!
table ip6 nat {
...
	chain OUTPUT {
		type nat hook output priority dstnat; policy accept;
		 counter packets 0 bytes 0 jump KUBE-SERVICES
		fib daddr type local counter packets 0 bytes 0 jump CNI-HOSTPORT-DNAT
	}

	chain PREROUTING {
		type nat hook prerouting priority dstnat; policy accept;
		 counter packets 1 bytes 136 jump KUBE-SERVICES
		fib daddr type local counter packets 0 bytes 0 jump CNI-HOSTPORT-DNAT
	}

So yes we just need fib daddr type local

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

portmap nftables backend invalid rules (dnat from input hook + invalid ipv6 rules)

4 participants