refactor: support multiple clients in Poller with independent fetch intervals #1311

mfenniak · 2026-01-20T03:56:52Z

mfenniak commented

2026-01-20 03:56:52 +00:00

Adds infrastructure support in runner for fetching tasks from multiple different Forgejo instances, extending Poller to supporting an array of Client interfaces. Each instance can be accessed with a separate fetch interval, but otherwise all jobs are executed in the same manner with a common pool of capacity.

No user-facing functionality is included in the PR to exploit this capability.

other
- PR: refactor: support multiple clients in Poller with independent fetch intervals

Adds infrastructure support in runner for fetching tasks from multiple different Forgejo instances, extending `Poller` to supporting an array of `Client` interfaces. Each instance can be accessed with a separate fetch interval, but otherwise all jobs are executed in the same manner with a common pool of capacity. No user-facing functionality is included in the PR to exploit this capability.   - other - [PR](https://code.forgejo.org/forgejo/runner/pulls/1311): refactor: support multiple clients in Poller with independent fetch intervals

mfenniak added 1 commit

2026-01-20 03:56:53 +00:00

refactor: add FetchInterval to Client interface 3f2141a004

mfenniak commented

2026-01-20 03:58:34 +00:00

@aahlenst I'm not sure if you've proceeded with any work on multiple runners after we discussed a potential CLI and config file interface for them. I thought I'd take a stab at the backend portion of polling the runners, and we could coordinate next steps from here assuming it isn't stepping on any work you've already started.

mfenniak force-pushed multiple-clients from c5791d3c02

issue-labels / release-notes (pull_request_target) Successful in 4s

Details

checks / Build Forgejo Runner (pull_request) Failing after 32s

Details

checks / validate pre-commit-hooks file (pull_request) Successful in 34s

Details

checks / integration tests (docker-latest) (pull_request) Has been skipped

Details

checks / Build unsupported platforms (pull_request) Has been skipped

Details

checks / runner exec tests (pull_request) Has been skipped

Details

checks / integration tests (docker-stable) (pull_request) Has been skipped

Details

checks / validate mocks (pull_request) Successful in 39s

Details

to c430a675bb

cascade / debug (pull_request_target) Has been skipped

Details

cascade / end-to-end (pull_request_target) Has been skipped

Details

cascade / forgejo (pull_request_target) Has been skipped

Details

checks / validate mocks (pull_request) Successful in 50s

Details

checks / validate pre-commit-hooks file (pull_request) Successful in 52s

Details

checks / Build Forgejo Runner (pull_request) Successful in 1m0s

Details

issue-labels / release-notes (pull_request_target) Successful in 6s

Details

checks / runner exec tests (pull_request) Successful in 58s

Details

checks / Build unsupported platforms (pull_request) Successful in 1m16s

Details

checks / integration tests (docker-latest) (pull_request) Successful in 10m41s

Details

checks / integration tests (docker-stable) (pull_request) Successful in 12m40s

Details

2026-01-20 04:00:18 +00:00

Compare

aahlenst commented

2026-01-20 11:29:51 +00:00

@mfenniak wrote in #1311 (comment):

@aahlenst I'm not sure if you've proceeded with any work on multiple runners after we discussed a potential CLI and config file interface for them. I thought I'd take a stab at the backend portion of polling the runners, and we could coordinate next steps from here assuming it isn't stepping on any work you've already started.

I am working on it. The results so far: a couple of tests, some refactoring, and the realization that my current plan is not good. I'll post my current thinking in the associated feature request soon-ish. I'll also look at this PR in detail.

@mfenniak wrote in https://code.forgejo.org/forgejo/runner/pulls/1311#issuecomment-74424: > @aahlenst I'm not sure if you've proceeded with any work on multiple runners after we discussed a potential CLI and config file interface for them. I thought I'd take a stab at the backend portion of polling the runners, and we could coordinate next steps from here assuming it isn't stepping on any work you've already started. I am working on it. The results so far: a couple of tests, some refactoring, and the realization that my current plan is not good. I'll post my current thinking in the [associated feature request](https://code.forgejo.org/forgejo/forgejo-actions-feature-requests/issues/88) soon-ish. I'll also look at this PR in detail.

aahlenst reviewed

2026-01-21 14:01:04 +00:00

internal/app/cmd/create-runner-file.go Outdated

					
				@ -74,6 +75,7 @@ func ping(cfg *config.Config, reg *config.Registration) error {

						"",

						"",

						ver.Version(),

						time.Second,

I'd prefer a constant with a descriptive name, like DefaultFetchInterval.

Is cfg.Runner.FetchInterval not an option here?

I'd prefer a constant with a descriptive name, like `DefaultFetchInterval`. Is `cfg.Runner.FetchInterval` not an option here?

Sure, cfg.Runner.FetchInterval can be used here. The value provided doesn't matter today as it's not actually used in ping, but there's no reason we can't provide the right config value.

Sure, `cfg.Runner.FetchInterval` can be used here. The value provided doesn't matter today as it's not actually used in `ping`, but there's no reason we can't provide the right config value.

mfenniak marked this conversation as resolved

internal/app/poll/poller.go Outdated

					
				@ -147,8 +168,8 @@ func (p *poller) fetchTasks(ctx context.Context, availableCapacity int64) ([]*ru

					defer cancel()

					// Load the version value that was in the cache when the request was sent.

I'm unable to figure out what cache the comment is talking about.

Not new in this PR, but I've removed it.

mfenniak marked this conversation as resolved

internal/app/poll/poller.go Outdated

					
				@ -100,2 +88,2 @@

							}

						}

					for i := range p.clients {

						wg.Go(func() {

If I understand the code correctly, pollForClient() is called concurrently for each client. To prevent the invocations from overloading the runner, fetching tasks is protected by a mutex. The remainder of pollForClient() only spawns a goroutine for each job which seems to be cheap. So while its implemented correctly, it seems overly complicated because the expensive fetchTasks() is in the critical section. Doing it sequentially would be sufficient and make testing much easier.

If I understand the code correctly, `pollForClient()` is called concurrently for each client. To prevent the invocations from overloading the runner, fetching tasks is protected by a mutex. The remainder of `pollForClient()` only spawns a goroutine for each job which seems to be cheap. So while its implemented correctly, it seems overly complicated because the expensive `fetchTasks()` is in the critical section. Doing it sequentially would be sufficient and make testing much easier.

I think that this analysis misses the implementation of the fetch interval. Most of the time in this routine is actually spent on limiter.Wait(p.pollingCtx) which needs to occur independently for each client in order to support separate fetch intervals for each client. A sequential approach, rather than a concurrent approach, would be possible we could "select" against a list of rate limits and suspend the process until the next rate limit expires, but that isn't possible with the rate limiting library that is in-use.

I think that this analysis misses the implementation of the fetch interval. Most of the time in this routine is actually spent on `limiter.Wait(p.pollingCtx)` which needs to occur independently for each client in order to support separate fetch intervals for each client. A sequential approach, rather than a concurrent approach, would be possible we could "select" against a list of rate limits and suspend the process until the next rate limit expires, but that isn't possible with the rate limiting library that is in-use.

Ah. I missed that the rate limiter is used for waiting until the interval has elapsed. Then, it makes sense.

in order to support separate fetch intervals for each client

Right now, there's only one global setting. To have separate fetch intervals for each client, we would have to add it to each connection. Which would make the connection configuration even more awkward.

Ah. I missed that the rate limiter is used for waiting until the interval has elapsed. Then, it makes sense. > in order to support separate fetch intervals for each client Right now, there's only one global setting. To have separate fetch intervals for each client, we would have to add it to each connection. Which would make the connection configuration even more awkward.

Yes, there's only one global setting. But even if we maintain only one global setting, runner has logic to automatically "tune" the fetch interval on Codeberg. (This is an ugly hack, but, I don't want be responsible for removing it. 😉)

 func (c *Config) Tune(instanceURL string) {
 	if instanceURL == "https://codeberg.org" {
 		if c.Runner.FetchInterval < 30*time.Second {
 			log.Info("The runner is configured to be used by a public instance, fetch interval is set to 30 seconds.")
 			c.Runner.FetchInterval = 30 * time.Second
 		}
 	}
 }

I think a likely place we'd end up is a global config, but any Codeberg clients are automatically overridden to minimum 30 seconds.

Yes, there's only one global setting. But even if we maintain only one global setting, runner has logic to automatically "tune" the fetch interval on Codeberg. (This is an ugly hack, but, I don't want be responsible for removing it. 😉) https://code.forgejo.org/forgejo/runner/src/commit/6997e049596f154e93d8a3d2a214eeebe150fa00/internal/pkg/config/config.go#L88-L95 I think a likely place we'd end up is a global config, but any Codeberg clients are automatically overridden to minimum 30 seconds.

aahlenst marked this conversation as resolved

internal/app/poll/poller.go Outdated

					
				@ -108,6 +98,37 @@ func (p *poller) Poll() {

					close(p.done)

				}

				func (p *poller) pollForClient(limiter *rate.Limiter, client client.Client, capacity int64, fetchMutex chan any, taskVersions, inProgressTasks *atomic.Int64, wg *sync.WaitGroup) {

I'm wondering whether some of the arguments should be struct fields because they are part of poller's internal state or derived from it: capacity, fetchMutex, inProgressTasks, and wg.

I'm wondering whether some of the arguments should be struct fields because they are part of poller's internal state or derived from it: `capacity`, `fetchMutex`, `inProgressTasks`, and `wg`.

This is tricky to change in a way that won't trigger the go data race detector -- we can't read and mutate the poller struct from multiple different goroutines. Using local storage in the Poll method that is referenced in this way avoids this risk. I think it could be a worthwhile change but should be isolated from this PR, so that here we're adding functionality, there we're refactoring with no intent of functional change.

This is tricky to change in a way that won't trigger the go data race detector -- we can't read and mutate the `poller` struct from multiple different goroutines. Using local storage in the `Poll` method that is referenced in this way avoids this risk. I think it could be a worthwhile change but should be isolated from this PR, so that here we're adding functionality, there we're refactoring with no intent of functional change.

aahlenst marked this conversation as resolved

internal/app/poll/poller_test.go Outdated

					
				@ -488,0 +528,4 @@

					})

					// invocations of `fetchTasks` are rate limited per configuration

					t.Run("fetchTasks rate limited separate intervals", func(t *testing.T) {