fix: use automatic IP detection instead of cache proxy host config value for internal cache server #1088

Merged
earl-warren merged 2 commits from drewcassidy/runner:fix-cache-server-address into main 2025-10-16 07:42:37 +00:00
Contributor

The description for the cache.host config value is

  # The IP or hostname (195.84.20.30 or example.com) to use when constructing
  # ACTIONS_CACHE_URL which is the URL of the cache proxy.

however the internal cache server was being started using that value, despite it clearly stating its for the proxy.

While the address used to contact the cache proxy from the job is usually the same as the address for the cache proxy to contact the server, that is not the case with custom /etc/hosts entries, like what is used for the host.containers.internal address, which is needed for contacting the cache proxy on rootless podman (see #1075).

Since the cache proxy and internal cache server are always running on the same host (or container), it's safe to use localhost for the server address instead.

  • bug fixes
    • PR: fix: use automatic IP detection instead of cache proxy host config value for internal cache server
The description for the `cache.host` config value is ``` # The IP or hostname (195.84.20.30 or example.com) to use when constructing # ACTIONS_CACHE_URL which is the URL of the cache proxy. ``` however the internal cache *server* was being started using that value, despite it clearly stating its for the *proxy*. While the address used to contact the cache proxy from the job is *usually* the same as the address for the cache proxy to contact the server, that is not the case with custom /etc/hosts entries, like what is used for the `host.containers.internal` address, which is needed for contacting the cache proxy on rootless podman (see #1075). Since the cache proxy and internal cache server are always running on the same host (or container), it's safe to use localhost for the server address instead. <!--start release-notes-assistant--> <!--URL:https://code.forgejo.org/forgejo/runner--> - bug fixes - [PR](https://code.forgejo.org/forgejo/runner/pulls/1088): <!--number 1088 --><!--line 0 --><!--description Zml4OiB1c2UgYXV0b21hdGljIElQIGRldGVjdGlvbiBpbnN0ZWFkIG9mIGNhY2hlIHByb3h5IGhvc3QgY29uZmlnIHZhbHVlIGZvciBpbnRlcm5hbCBjYWNoZSBzZXJ2ZXI=-->fix: use automatic IP detection instead of cache proxy host config value for internal cache server<!--description--> <!--end release-notes-assistant-->
Use localhost instead of the cache proxy address
All checks were successful
checks / validate pre-commit-hooks file (pull_request) Successful in 1m18s
checks / validate mocks (pull_request) Successful in 1m55s
checks / build and test (pull_request) Successful in 4m4s
checks / runner exec tests (pull_request) Successful in 45s
Integration tests for the release process / release-simulation (pull_request) Successful in 7m21s
checks / runner integration tests (pull_request) Successful in 9m16s
checks / integration tests (pull_request) Successful in 19m55s
cascade / forgejo (pull_request_target) Has been skipped
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
issue-labels / release-notes (pull_request_target) Successful in 6s
bfb2637600
cache proxy always runs on the same machine as the internal cache server
@ -118,3 +118,3 @@
cacheServer, err := artifactcache.StartHandler(
cfg.Cache.Dir,
cfg.Cache.Host,
"localhost",
Owner

are you sure this works for external cache server too?

are you sure this works for external cache server too?
Author
Contributor

this is the call to start the internal cache server. it's skipped when an external cache server is used

this is the call to *start* the internal cache server. it's skipped when an external cache server is used
Contributor

This is indeed inconsistent, good catch.

I think better in this case to leave it empty so that it behaves as if it was not set. Also"localhost" is not guaranteed to exist so it would be better to have 127.0.0.1, if it is set explicitly.

	cacheServer, err := artifactcache.StartHandler(
			cfg.Cache.Dir,
			"", // automatically detect

This should have been caught by a test but there currently are no test verifying those border cases. I tried to figure out how to test that but doing so would require groundwork that is out of proportion for this simple bug fix.

I really do not like this situation because it does nothing to improve how cache servers are tested.

Manual testing is the only way out. @drewcassidy could you confirm that this fix is manually tested to work for you and explain the steps you took to do that?

This is indeed inconsistent, good catch. I think better in this case to leave it empty so that it behaves as if it was not set. Also"localhost" is not guaranteed to exist so it would be better to have 127.0.0.1, if it is set explicitly. ```go cacheServer, err := artifactcache.StartHandler( cfg.Cache.Dir, "", // automatically detect ``` This should have been caught by a test but there currently are no test verifying those border cases. I tried to figure out how to test that but doing so would require groundwork that is out of proportion for this simple bug fix. I really do not like this situation because it does nothing to improve how cache servers are tested. Manual testing is the only way out. @drewcassidy could you confirm that this fix is manually tested to work for you and explain the steps you took to do that?
Owner

127.0.0.1 doesn't work on ipv6 only networking or does it?

`127.0.0.1` doesn't work on ipv6 only networking or does it?
Contributor

right, I did not think of that. Leaving it empty is best then.

right, I did not think of that. Leaving it empty is best then.
Author
Contributor

Right, was waffling myself between automatic and localhost, I figured localhost would at least not expose the cache server to the outside world if the user isnt running a firewall.

Sockets would be the better answer, but I dont think those work on Windows?

I'll change it to the empty string, test, and report back.

Right, was waffling myself between automatic and localhost, I figured localhost would at least not expose the cache server to the outside world if the user isnt running a firewall. Sockets would be the better answer, but I dont think those work on Windows? I'll change it to the empty string, test, and report back.
Contributor

Even if the cache is exposed publicly, it cannot be used without the secret (autogenerated or set in the config file) so it is not a concern.

Even if the cache is exposed publicly, it cannot be used without the secret (autogenerated or set in the config file) so it is not a concern.
Author
Contributor

Changed it to auto-detect. I tested by running a workflow that uses actions/cache with all config settings default except with cache.host set to host.containers.internal, and making sure the cache proxy doesnt error contacting the cache server.

Changed it to auto-detect. I tested by running a workflow that uses `actions/cache` with all config settings default except with `cache.host` set to `host.containers.internal`, and making sure the cache proxy doesnt error contacting the cache server.
Contributor

Thanks for doing the manual testing.

Thanks for doing the manual testing.
viceice marked this conversation as resolved
First-time contributor

This would technically be breaking change I think (though I can't imagine a valid situation where someone is relying on this behavior)? Note that despite using localhost for local cache server, we're still binding it to a port specified in configuration. This seems to make little sense, as we are now exposing ability to define port for a service that will only ever be accessible locally for sole purpose of cache proxy connecting to it.

This would technically be breaking change I think (though I can't imagine a valid situation where someone is relying on this behavior)? Note that despite using `localhost` for local cache server, we're still binding it to a port specified in configuration. This seems to make little sense, as we are now exposing ability to define port for a service that will only ever be accessible locally for sole purpose of cache proxy connecting to it.
fnetX requested review from Kwonunn 2025-10-14 15:23:28 +00:00
Author
Contributor

It still makes sense to have the port configurable. There could be other services (like other runner instances) on the same server with conflicting ports bound to localhost.

It still makes sense to have the port configurable. There could be other services (like other runner instances) on the same server with conflicting ports bound to localhost.
mfenniak approved these changes 2025-10-15 01:34:19 +00:00
mfenniak left a comment
Owner

This seems right to me, and consistent with the existing documentation. I would like another reviewer to agree with me, so I'll give it a day to see if anyone else does or does not. 🙂

This seems right to me, and consistent with the existing documentation. I would like another reviewer to agree with me, so I'll give it a day to see if anyone else does or does not. 🙂
Kwonunn approved these changes 2025-10-15 11:05:28 +00:00
Kwonunn left a comment
Member

this looks correct to me, though i haven't had a chance to test it.

this looks correct to me, though i haven't had a chance to test it.
auto-detect artifact cache IP instead of using localhost
All checks were successful
checks / validate mocks (pull_request) Successful in 59s
checks / validate pre-commit-hooks file (pull_request) Successful in 38s
checks / build and test (pull_request) Successful in 3m6s
checks / runner exec tests (pull_request) Successful in 25s
Integration tests for the release process / release-simulation (pull_request) Successful in 4m44s
issue-labels / release-notes (pull_request_target) Successful in 4s
checks / runner integration tests (pull_request) Successful in 5m25s
checks / integration tests (pull_request) Successful in 11m15s
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Successful in 5s
cascade / forgejo (pull_request_target) Successful in 35s
ca5f1437b6
localhost isnt guranteed to exist
drewcassidy changed title from fix: use localhost instead of cache proxy host for internal cache server to fix: use automatic IP detection instead of cache proxy host config value for internal cache server 2025-10-16 07:11:24 +00:00
viceice approved these changes 2025-10-16 07:16:29 +00:00
earl-warren deleted branch fix-cache-server-address 2025-10-16 07:42:37 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
6 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo/runner!1088
No description provided.