refactor: support multiple clients in Poller with independent fetch intervals #1311

Merged
mfenniak merged 2 commits from mfenniak/forgejo-runner:multiple-clients into main 2026-01-22 21:35:25 +00:00
Owner

Adds infrastructure support in runner for fetching tasks from multiple different Forgejo instances, extending Poller to supporting an array of Client interfaces. Each instance can be accessed with a separate fetch interval, but otherwise all jobs are executed in the same manner with a common pool of capacity.

No user-facing functionality is included in the PR to exploit this capability.

  • other
    • PR: refactor: support multiple clients in Poller with independent fetch intervals
Adds infrastructure support in runner for fetching tasks from multiple different Forgejo instances, extending `Poller` to supporting an array of `Client` interfaces. Each instance can be accessed with a separate fetch interval, but otherwise all jobs are executed in the same manner with a common pool of capacity. No user-facing functionality is included in the PR to exploit this capability. <!--start release-notes-assistant--> <!--URL:https://code.forgejo.org/forgejo/runner--> - other - [PR](https://code.forgejo.org/forgejo/runner/pulls/1311): <!--number 1311 --><!--line 0 --><!--description cmVmYWN0b3I6IHN1cHBvcnQgbXVsdGlwbGUgY2xpZW50cyBpbiBQb2xsZXIgd2l0aCBpbmRlcGVuZGVudCBmZXRjaCBpbnRlcnZhbHM=-->refactor: support multiple clients in Poller with independent fetch intervals<!--description--> <!--end release-notes-assistant-->
Author
Owner

@aahlenst I'm not sure if you've proceeded with any work on multiple runners after we discussed a potential CLI and config file interface for them. I thought I'd take a stab at the backend portion of polling the runners, and we could coordinate next steps from here assuming it isn't stepping on any work you've already started.

@aahlenst I'm not sure if you've proceeded with any work on multiple runners after we discussed a potential CLI and config file interface for them. I thought I'd take a stab at the backend portion of polling the runners, and we could coordinate next steps from here assuming it isn't stepping on any work you've already started.
mfenniak force-pushed multiple-clients from c5791d3c02
Some checks failed
issue-labels / release-notes (pull_request_target) Successful in 4s
checks / Build Forgejo Runner (pull_request) Failing after 32s
checks / validate pre-commit-hooks file (pull_request) Successful in 34s
checks / integration tests (docker-latest) (pull_request) Has been skipped
checks / Build unsupported platforms (pull_request) Has been skipped
checks / runner exec tests (pull_request) Has been skipped
checks / integration tests (docker-stable) (pull_request) Has been skipped
checks / validate mocks (pull_request) Successful in 39s
to c430a675bb
All checks were successful
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
cascade / forgejo (pull_request_target) Has been skipped
checks / validate mocks (pull_request) Successful in 50s
checks / validate pre-commit-hooks file (pull_request) Successful in 52s
checks / Build Forgejo Runner (pull_request) Successful in 1m0s
issue-labels / release-notes (pull_request_target) Successful in 6s
checks / runner exec tests (pull_request) Successful in 58s
checks / Build unsupported platforms (pull_request) Successful in 1m16s
checks / integration tests (docker-latest) (pull_request) Successful in 10m41s
checks / integration tests (docker-stable) (pull_request) Successful in 12m40s
2026-01-20 04:00:18 +00:00
Compare
Member

@mfenniak wrote in #1311 (comment):

@aahlenst I'm not sure if you've proceeded with any work on multiple runners after we discussed a potential CLI and config file interface for them. I thought I'd take a stab at the backend portion of polling the runners, and we could coordinate next steps from here assuming it isn't stepping on any work you've already started.

I am working on it. The results so far: a couple of tests, some refactoring, and the realization that my current plan is not good. I'll post my current thinking in the associated feature request soon-ish. I'll also look at this PR in detail.

@mfenniak wrote in https://code.forgejo.org/forgejo/runner/pulls/1311#issuecomment-74424: > @aahlenst I'm not sure if you've proceeded with any work on multiple runners after we discussed a potential CLI and config file interface for them. I thought I'd take a stab at the backend portion of polling the runners, and we could coordinate next steps from here assuming it isn't stepping on any work you've already started. I am working on it. The results so far: a couple of tests, some refactoring, and the realization that my current plan is not good. I'll post my current thinking in the [associated feature request](https://code.forgejo.org/forgejo/forgejo-actions-feature-requests/issues/88) soon-ish. I'll also look at this PR in detail.
@ -74,6 +75,7 @@ func ping(cfg *config.Config, reg *config.Registration) error {
"",
"",
ver.Version(),
time.Second,
Member

I'd prefer a constant with a descriptive name, like DefaultFetchInterval.

Is cfg.Runner.FetchInterval not an option here?

I'd prefer a constant with a descriptive name, like `DefaultFetchInterval`. Is `cfg.Runner.FetchInterval` not an option here?
Author
Owner

Sure, cfg.Runner.FetchInterval can be used here. The value provided doesn't matter today as it's not actually used in ping, but there's no reason we can't provide the right config value.

Sure, `cfg.Runner.FetchInterval` can be used here. The value provided doesn't matter today as it's not actually used in `ping`, but there's no reason we can't provide the right config value.
mfenniak marked this conversation as resolved
@ -147,8 +168,8 @@ func (p *poller) fetchTasks(ctx context.Context, availableCapacity int64) ([]*ru
defer cancel()
// Load the version value that was in the cache when the request was sent.
Member

I'm unable to figure out what cache the comment is talking about.

I'm unable to figure out what cache the comment is talking about.
Author
Owner

Not new in this PR, but I've removed it.

Not new in this PR, but I've removed it.
mfenniak marked this conversation as resolved
@ -100,2 +88,2 @@
}
}
for i := range p.clients {
wg.Go(func() {
Member

If I understand the code correctly, pollForClient() is called concurrently for each client. To prevent the invocations from overloading the runner, fetching tasks is protected by a mutex. The remainder of pollForClient() only spawns a goroutine for each job which seems to be cheap. So while its implemented correctly, it seems overly complicated because the expensive fetchTasks() is in the critical section. Doing it sequentially would be sufficient and make testing much easier.

If I understand the code correctly, `pollForClient()` is called concurrently for each client. To prevent the invocations from overloading the runner, fetching tasks is protected by a mutex. The remainder of `pollForClient()` only spawns a goroutine for each job which seems to be cheap. So while its implemented correctly, it seems overly complicated because the expensive `fetchTasks()` is in the critical section. Doing it sequentially would be sufficient and make testing much easier.
Author
Owner

I think that this analysis misses the implementation of the fetch interval. Most of the time in this routine is actually spent on limiter.Wait(p.pollingCtx) which needs to occur independently for each client in order to support separate fetch intervals for each client. A sequential approach, rather than a concurrent approach, would be possible we could "select" against a list of rate limits and suspend the process until the next rate limit expires, but that isn't possible with the rate limiting library that is in-use.

I think that this analysis misses the implementation of the fetch interval. Most of the time in this routine is actually spent on `limiter.Wait(p.pollingCtx)` which needs to occur independently for each client in order to support separate fetch intervals for each client. A sequential approach, rather than a concurrent approach, would be possible we could "select" against a list of rate limits and suspend the process until the next rate limit expires, but that isn't possible with the rate limiting library that is in-use.
Member

Ah. I missed that the rate limiter is used for waiting until the interval has elapsed. Then, it makes sense.

in order to support separate fetch intervals for each client

Right now, there's only one global setting. To have separate fetch intervals for each client, we would have to add it to each connection. Which would make the connection configuration even more awkward.

Ah. I missed that the rate limiter is used for waiting until the interval has elapsed. Then, it makes sense. > in order to support separate fetch intervals for each client Right now, there's only one global setting. To have separate fetch intervals for each client, we would have to add it to each connection. Which would make the connection configuration even more awkward.
Author
Owner

Yes, there's only one global setting. But even if we maintain only one global setting, runner has logic to automatically "tune" the fetch interval on Codeberg. (This is an ugly hack, but, I don't want be responsible for removing it. 😉)

func (c *Config) Tune(instanceURL string) {
if instanceURL == "https://codeberg.org" {
if c.Runner.FetchInterval < 30*time.Second {
log.Info("The runner is configured to be used by a public instance, fetch interval is set to 30 seconds.")
c.Runner.FetchInterval = 30 * time.Second
}
}
}

I think a likely place we'd end up is a global config, but any Codeberg clients are automatically overridden to minimum 30 seconds.

Yes, there's only one global setting. But even if we maintain only one global setting, runner has logic to automatically "tune" the fetch interval on Codeberg. (This is an ugly hack, but, I don't want be responsible for removing it. 😉) https://code.forgejo.org/forgejo/runner/src/commit/6997e049596f154e93d8a3d2a214eeebe150fa00/internal/pkg/config/config.go#L88-L95 I think a likely place we'd end up is a global config, but any Codeberg clients are automatically overridden to minimum 30 seconds.
aahlenst marked this conversation as resolved
@ -108,6 +98,37 @@ func (p *poller) Poll() {
close(p.done)
}
func (p *poller) pollForClient(limiter *rate.Limiter, client client.Client, capacity int64, fetchMutex chan any, taskVersions, inProgressTasks *atomic.Int64, wg *sync.WaitGroup) {
Member

I'm wondering whether some of the arguments should be struct fields because they are part of poller's internal state or derived from it: capacity, fetchMutex, inProgressTasks, and wg.

I'm wondering whether some of the arguments should be struct fields because they are part of poller's internal state or derived from it: `capacity`, `fetchMutex`, `inProgressTasks`, and `wg`.
Author
Owner

This is tricky to change in a way that won't trigger the go data race detector -- we can't read and mutate the poller struct from multiple different goroutines. Using local storage in the Poll method that is referenced in this way avoids this risk. I think it could be a worthwhile change but should be isolated from this PR, so that here we're adding functionality, there we're refactoring with no intent of functional change.

This is tricky to change in a way that won't trigger the go data race detector -- we can't read and mutate the `poller` struct from multiple different goroutines. Using local storage in the `Poll` method that is referenced in this way avoids this risk. I think it could be a worthwhile change but should be isolated from this PR, so that here we're adding functionality, there we're refactoring with no intent of functional change.
aahlenst marked this conversation as resolved
@ -488,0 +528,4 @@
})
// invocations of `fetchTasks` are rate limited per configuration
t.Run("fetchTasks rate limited separate intervals", func(t *testing.T) {
Member

Sounds like the comment should be the test name 😁

Sounds like the comment should be the test name 😁
mfenniak marked this conversation as resolved
mfenniak force-pushed multiple-clients from c430a675bb
All checks were successful
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
cascade / forgejo (pull_request_target) Has been skipped
checks / validate mocks (pull_request) Successful in 50s
checks / validate pre-commit-hooks file (pull_request) Successful in 52s
checks / Build Forgejo Runner (pull_request) Successful in 1m0s
issue-labels / release-notes (pull_request_target) Successful in 6s
checks / runner exec tests (pull_request) Successful in 58s
checks / Build unsupported platforms (pull_request) Successful in 1m16s
checks / integration tests (docker-latest) (pull_request) Successful in 10m41s
checks / integration tests (docker-stable) (pull_request) Successful in 12m40s
to 49c1087101
Some checks failed
checks / Build Forgejo Runner (pull_request) Successful in 34s
checks / validate mocks (pull_request) Successful in 40s
checks / validate pre-commit-hooks file (pull_request) Successful in 41s
checks / Build unsupported platforms (pull_request) Successful in 32s
checks / runner exec tests (pull_request) Successful in 33s
example / docker-build-push-action-in-lxc (pull_request) Failing after 1m29s
Integration tests for the release process / release-simulation (pull_request) Failing after 3m28s
checks / integration tests (docker-latest) (pull_request) Successful in 8m59s
checks / integration tests (docker-stable) (pull_request) Successful in 9m34s
cascade / forgejo (pull_request_target) Has been skipped
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
issue-labels / release-notes (pull_request_target) Successful in 7s
2026-01-22 20:08:01 +00:00
Compare
aahlenst approved these changes 2026-01-22 21:29:58 +00:00
mfenniak deleted branch multiple-clients 2026-01-22 21:35:25 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo/runner!1311
No description provided.