fix GetOutboundIp auto-detection on ipv6-only systems

cristicbz commented

2026-02-07 15:29:49 +00:00

Contributor

On an IPv6-only node, the daemon would fail to enable the cache, with

Could not start the cache server, cache will be disabled: unable to determine outbound IP address

and, similarly, exec would just crash on start-up with Error: unable to determine outbound IP address.

The issue stems from GetOutboundIp assuming 8.8.8.8 being reachable is equivalent to the node having internet access (which is not true for IPv6 nodes of course). I changed this (in the first commit) to try the IPv6 of Google's public DNS as well, and took the opportunity to fallback to cloudflare's as well.

A related issue (handled in the second commit) is that artifactcache.StartHandler was always passed "" (forcing auto-detection) even if cache.host was set in the config. I don't claim to fully understand what the different components are, but forwarding cache.host to that seems to work?

Scaleway offers dirt cheap (<1 EUR/month) IPv6-only nodes. I wanted to use some for my runners.

other
- PR: fix GetOutboundIp auto-detection on ipv6-only systems

On an IPv6-only node, the `daemon` would fail to enable the cache, with ``` Could not start the cache server, cache will be disabled: unable to determine outbound IP address ``` and, similarly, `exec` would just crash on start-up with `Error: unable to determine outbound IP address`. The issue stems from `GetOutboundIp` assuming 8.8.8.8 being reachable is equivalent to the node having internet access (which is not true for IPv6 nodes of course). I changed this (in the first commit) to try the IPv6 of Google's public DNS as well, and took the opportunity to fallback to cloudflare's as well. ~A related issue (handled in the second commit) is that `artifactcache.StartHandler` was always passed `""` (forcing auto-detection) even if `cache.host` was set in the config. I don't claim to fully understand what the different components are, but forwarding `cache.host` to that seems to work?~ Scaleway offers dirt cheap (<1 EUR/month) IPv6-only nodes. I wanted to use some for my runners.   - other - [PR](https://code.forgejo.org/forgejo/runner/pulls/1356): fix GetOutboundIp auto-detection on ipv6-only systems

cristicbz requested review from mfenniak

2026-02-07 15:51:41 +00:00

cristicbz changed title from ~~fix GetOutbooundIp auto-detection on ipv6-only systems~~ to fix GetOutboundIp auto-detection on ipv6-only systems

2026-02-07 15:53:18 +00:00

mfenniak commented

2026-02-07 16:54:06 +00:00

Owner

I think the IPv6-capable detection seems like a good change. The overall approach to detection is a bit too clever for my liking, but adding additional IPv6 addresses seems like an OK workaround for the gap you've identified. I don't think we can make a practical automated test for this, so this part of the change is good as-is.

The cache config change would be a regression of #1088. There are two different servers being run by the runner; a proxy that the actions connect to, and a cache server that the proxy connects to. You're changing the cache server, which isn't what cfg.Cache.Host is intended & documented to control.

I think the IPv6-capable detection seems like a good change. The overall approach to detection is a bit too clever for my liking, but adding additional IPv6 addresses seems like an OK workaround for the gap you've identified. I don't think we can make a practical automated test for this, so this part of the change is good as-is. The cache config change would be a regression of https://code.forgejo.org/forgejo/runner/pulls/1088. There are two different servers being run by the runner; a proxy that the actions connect to, and a cache server that the proxy connects to. You're changing the cache server, which isn't what `cfg.Cache.Host` is intended & documented to control.

cristicbz force-pushed try-ipv6-for-outbound from 23840a560b

issue-labels / release-notes (pull_request_target) Successful in 5s

Details

checks / Build Forgejo Runner (pull_request) Has been cancelled

Details

checks / Build unsupported platforms (pull_request) Has been cancelled

Details

checks / runner exec tests (pull_request) Has been cancelled

Details

checks / integration tests (docker-latest) (pull_request) Has been cancelled

Details

checks / integration tests (docker-stable) (pull_request) Has been cancelled

Details

checks / validate mocks (pull_request) Has been cancelled

Details

checks / validate pre-commit-hooks file (pull_request) Has been cancelled

Details

to f2200fee29

cascade / forgejo (pull_request_target) Has been skipped

Details

cascade / end-to-end (pull_request_target) Has been skipped

Details

cascade / debug (pull_request_target) Has been skipped

Details

issue-labels / release-notes (pull_request_target) Successful in 4s

Details

checks / Build Forgejo Runner (pull_request) Failing after 33s

Details

checks / runner exec tests (pull_request) Has been skipped

Details

checks / Build unsupported platforms (pull_request) Has been skipped

Details

checks / integration tests (docker-stable) (pull_request) Has been skipped

Details

checks / integration tests (docker-latest) (pull_request) Has been skipped

Details

checks / validate pre-commit-hooks file (pull_request) Successful in 47s

Details

checks / validate mocks (pull_request) Failing after 47s

Details

2026-02-07 17:20:08 +00:00

Compare

cristicbz commented

2026-02-07 17:21:50 +00:00

Author

Contributor

@mfenniak thanks for the explanation, I reverted the second commit.

There's something about the overall architecture i don't get---why any of these internal services need an external IP adress vs being able to just use loopback; it seems to extend the attack surface unnecessarily to listen on public interfaces.

@mfenniak thanks for the explanation, I reverted the second commit. There's something about the overall architecture i don't get---why any of these internal services need an external IP adress vs being able to just use loopback; it seems to extend the attack surface unnecessarily to listen on public interfaces.

mfenniak commented

2026-02-07 17:38:42 +00:00

Owner

@cristicbz wrote in #1356 (comment):

There's something about the overall architecture i don't get---why any of these internal services need an external IP adress vs being able to just use loopback; it seems to extend the attack surface unnecessarily to listen on public interfaces.

It's complicated. Here's my understanding, which may be incomplete.

The first requirement is to create a proxy network port which can be accessed by a job container. A loopback address wouldn't work for this because localhost would reference the container, not the runner. The runner itself may be within a container, and therefore we need the container's IP address and not an IP from the host.

The "outbound" IP is not necessarily a public IP. It's the IP address of the interface which the default route is applied to; in an IPv4 network that's usually a non-routable RFC1918 address. It isn't really important that it is used for internet access... it serves more of a role of "this is a reasonable default IP address which I can access both from the runner, and from the

The second requirement is to create a cache server network port which can be accessed by the runner (after the proxy is hit). This has different requirements depending on whether the cache server is internal, or external. If it's external, it needs to be accessible from the cache clients (the proxies). If it's internal, it could probably be a loopback address.

@cristicbz wrote in https://code.forgejo.org/forgejo/runner/pulls/1356#issuecomment-77199: > There's something about the overall architecture i don't get---why any of these internal services need an external IP adress vs being able to just use loopback; it seems to extend the attack surface unnecessarily to listen on public interfaces. It's complicated. Here's my understanding, which may be incomplete. The first requirement is to create a proxy network port which can be accessed by a job container. A loopback address wouldn't work for this because `localhost` would reference the container, not the runner. The runner itself may be within a container, and therefore we need the container's IP address and not an IP from the host. The "outbound" IP is not necessarily a public IP. It's the IP address of the interface which the default route is applied to; in an IPv4 network that's usually a non-routable RFC1918 address. It isn't really important that it is used for internet access... it serves more of a role of "this is a reasonable default IP address which I can access *both* from the runner, and from the The second requirement is to create a cache server network port which can be accessed by the runner (after the proxy is hit). This has different requirements depending on whether the cache server is internal, or external. If it's external, it needs to be accessible from the cache clients (the proxies). If it's internal, it could probably be a loopback address.

mfenniak commented

2026-02-07 17:40:47 +00:00

Owner

This patch is failing CI as-is and needs a make fmt run on it.

This patch is failing CI as-is and needs a `make fmt` run on it.

cristicbz force-pushed try-ipv6-for-outbound from f2200fee29

cascade / forgejo (pull_request_target) Has been skipped

Details

cascade / end-to-end (pull_request_target) Has been skipped

Details

cascade / debug (pull_request_target) Has been skipped

Details

issue-labels / release-notes (pull_request_target) Successful in 4s

Details

checks / Build Forgejo Runner (pull_request) Failing after 33s

Details

checks / runner exec tests (pull_request) Has been skipped

Details

checks / Build unsupported platforms (pull_request) Has been skipped

Details

checks / integration tests (docker-stable) (pull_request) Has been skipped

Details

checks / integration tests (docker-latest) (pull_request) Has been skipped

Details

checks / validate pre-commit-hooks file (pull_request) Successful in 47s

Details

checks / validate mocks (pull_request) Failing after 47s

Details

to 4d573ee4de

issue-labels / release-notes (pull_request_target) Successful in 9s

Details

checks / Build Forgejo Runner (pull_request) Successful in 33s

Details

checks / validate pre-commit-hooks file (pull_request) Successful in 47s

Details

checks / validate mocks (pull_request) Successful in 52s

Details

checks / Build unsupported platforms (pull_request) Successful in 27s

Details

checks / runner exec tests (pull_request) Successful in 38s

Details

Integration tests for the release process / release-simulation (pull_request) Successful in 5m12s

Details

checks / integration tests (docker-latest) (pull_request) Successful in 13m21s

Details

checks / integration tests (docker-stable) (pull_request) Successful in 14m56s

Details

cascade / debug (pull_request_target) Has been skipped

Details

cascade / end-to-end (pull_request_target) Successful in 9s

Details

cascade / forgejo (pull_request_target) Successful in 33s

Details

2026-02-07 17:41:36 +00:00

Compare

cristicbz commented

2026-02-07 17:42:09 +00:00

Author

Contributor

Done. Sorry, I ran go fmt, which didn't complain, didn't realize this had stricter linting requirements

Done. Sorry, I ran `go fmt`, which didn't complain, didn't realize this had stricter linting requirements

mfenniak merged commit e07587daa6 into main

2026-02-07 18:19:07 +00:00

mfenniak referenced this pull request from a commit

2026-02-07 18:19:08 +00:00

fix GetOutboundIp auto-detection on ipv6-only systems (#1356)

mfenniak commented

2026-02-07 18:21:03 +00:00

Owner

Because we're not actually doing anything with the UDP network connection -- it's UDP which is stateless, and nothing is sent, and it's not even to a DNS port 🤣 -- in retrospect adding in Cloudflare's addresses is probably not meaningful. It's really just a hack to get the outbound IP which does nothing but resolve the route table. But I think it has no harm either, so all good. 👍

cristicbz commented

2026-02-07 19:38:00 +00:00

Author

Contributor

@mfenniak

The "outbound" IP is not necessarily a public IP. It's the IP address of the interface which the default route is applied to; in an IPv4 network that's usually a non-routable RFC1918 address. It isn't really important that it is used for internet access... it serves more of a role of "this is a reasonable default IP address which I can access both from the runner, and from the

I'm not set up to trivially check this for an IPv4 node, but for my little IPv6 only node at least, it does seem to resolve to the public IP address:

root@***:~# ss -lntup | grep forgejo
tcp   LISTEN 0      4096               *:42689            *:*    users:(("forgejo-runner",pid=943,fd=8))
tcp   LISTEN 0      4096               *:33341            *:*    users:(("forgejo-runner",pid=943,fd=7))

And I can confirm that if I curl the VPS's public IP address from my machine, i get a 404 Not Found from the services running on there.

Don't know if you have an IPv4 node where you can check that assertion empirically, but this seems like a significant vulnerability if I'm right.

@mfenniak > The "outbound" IP is not necessarily a public IP. It's the IP address of the interface which the default route is applied to; in an IPv4 network that's usually a non-routable RFC1918 address. It isn't really important that it is used for internet access... it serves more of a role of "this is a reasonable default IP address which I can access both from the runner, and from the I'm not set up to trivially check this for an IPv4 node, but for my little IPv6 only node at least, it does seem to resolve to the public IP address: ``` root@***:~# ss -lntup | grep forgejo tcp LISTEN 0 4096 *:42689 *:* users:(("forgejo-runner",pid=943,fd=8)) tcp LISTEN 0 4096 *:33341 *:* users:(("forgejo-runner",pid=943,fd=7)) ``` And I can confirm that if I `curl` the VPS's public IP address from my machine, i get a 404 Not Found from the services running on there. Don't know if you have an IPv4 node where you can check that assertion empirically, but this seems like a significant vulnerability if I'm right.

cristicbz commented

2026-02-07 19:56:48 +00:00

Author

Contributor

I can confirm that this seems to happen on a node with IPv4 as well, do you see something else if you try? I was expecting the runner to not listen on ANY ports on a public interface. I'm running:

Latest forgejo-runner built from main
podman 5.4.2 with rootless containers
No config.yml

Ok. This is clearly intended behaviour, I just don't understand how this is safe:

	listener, err := net.Listen("tcp", fmt.Sprintf(":%d", port)) // listen on all interfaces
	if err != nil {
		return nil, err
	}
	server := &http.Server{
		ReadHeaderTimeout: 2 * time.Second,
		Handler:           router,
	}

(in both cacheproxy & artifactcache)

I would expect these services to listen on the network's gateway, (i.e. docker network inspect bridge -f "{{ (index .IPAM.Config 0).Gateway }}"); of course this gets messier with ephemeral networks

I can confirm that this seems to happen on a node with IPv4 as well, do you see something else if you try? I was expecting the runner to not listen on ANY ports on a public interface. I'm running: 1. Latest `forgejo-runner` built from `main` 2. podman 5.4.2 with rootless containers 3. No `config.yml` Ok. This is clearly intended behaviour, I just don't understand how this is safe: ``` listener, err := net.Listen("tcp", fmt.Sprintf(":%d", port)) // listen on all interfaces if err != nil { return nil, err } server := &http.Server{ ReadHeaderTimeout: 2 * time.Second, Handler: router, } ``` (in both cacheproxy & artifactcache) I would expect these services to listen on the network's gateway, (i.e. `docker network inspect bridge -f "{{ (index .IPAM.Config 0).Gateway }}"`); of course this gets messier with ephemeral networks

mfenniak commented

2026-02-07 20:50:24 +00:00

Owner

Ah, I see your concern. The cache handlers always bind themselves on all network interfaces. This doesn't relate to GetOutboundIP directly; that IP is just used as an address that can be used to access the network ports once they're open.

listener, err := net.Listen("tcp", fmt.Sprintf(":%d", port)) // listen on all interfaces

Within the designed scope of the APIs that are exposed for caching, they are fairly safe for exposure -- there should be no way to access cached data without a credential that is embedded into the ACTIONS_CACHE_URL env variable. As a workaround if you want to maximize security against design flaws and vulnerabilities, you can consider a host-based firewall -- a default-DENY incoming connection on your public interface should address the issue, but you'll have to test that it doesn't affect traffic from your containers (or local jobs, if you're using a host-based executor model).

It would be difficult to design a better auto-detection solution that would meet the runner's needs, and so the only plausible direction to me to improve this would require administrators to manually configure the bind address. I think adding manual configuration options like that would be reasonable. I wish it could be automatic instead, but I can't imagine a plausible way.

This discussion should probably continue in a separate issue if you'd like to file one, in order to make it more searchable for future users.

Ah, I see your concern. The cache handlers always bind themselves on all network interfaces. This doesn't relate to `GetOutboundIP` directly; that IP is just used as an address that can be used to access the network ports once they're open. https://code.forgejo.org/forgejo/runner/src/commit/52767745cc696967439470176195b3b950107679/act/artifactcache/handler.go#L93 https://code.forgejo.org/forgejo/runner/src/commit/52767745cc696967439470176195b3b950107679/act/cacheproxy/handler.go#L98 Within the designed scope of the APIs that are exposed for caching, they are fairly safe for exposure -- there should be no way to access cached data without a credential that is embedded into the `ACTIONS_CACHE_URL` env variable. As a workaround if you want to maximize security against design flaws and vulnerabilities, you can consider a host-based firewall -- a default-DENY incoming connection on your public interface should address the issue, but you'll have to test that it doesn't affect traffic from your containers (or local jobs, if you're using a host-based executor model). It would be difficult to design a better auto-detection solution that would meet the runner's needs, and so the only plausible direction to me to improve this would require administrators to manually configure the bind address. I think adding manual configuration options like that would be reasonable. I wish it could be automatic instead, but I can't imagine a plausible way. This discussion should probably continue in a separate issue if you'd like to file one, in order to make it more searchable for future users.

aahlenst commented

2026-02-07 22:26:03 +00:00

Member

I believe #229 requests the option to set a bind address.

I believe https://code.forgejo.org/forgejo/runner/issues/229 requests the option to set a bind address.

👍 2