feat: implement "concurrency" block in Forgejo Actions at the workflow level #9434
No reviewers
Labels
No labels
arch
riscv64
backport/v1.19
backport/v1.20
backport/v1.21/forgejo
backport/v10.0/forgejo
backport/v11.0/forgejo
backport/v12.0/forgejo
backport/v13.0/forgejo
backport/v14.0/forgejo
backport/v7.0/forgejo
backport/v8.0/forgejo
backport/v9.0/forgejo
breaking
bug
bug
confirmed
bug
duplicate
bug
needs-more-info
bug
new-report
bug
reported-upstream
code/actions
code/api
code/auth
code/auth/faidp
code/auth/farp
code/email
code/federation
code/git
code/migrations
code/packages
code/wiki
database
MySQL
database
PostgreSQL
database
SQLite
dependency-upgrade
dependency
certmagic
dependency
chart.js
dependency
Chi
dependency
Chroma
dependency
citation.js
dependency
codespell
dependency
css-loader
dependency
devcontainers
dependency
dropzone
dependency
editorconfig-checker
dependency
elasticsearch
dependency
enmime
dependency
F3
dependency
ForgeFed
dependency
garage
dependency
Git
dependency
git-backporting
dependency
Gitea
dependency
gitignore
dependency
go-ap
dependency
go-enry
dependency
go-gitlab
dependency
Go-org
dependency
go-rpmutils
dependency
go-sql-driver mysql
dependency
go-swagger
dependency
go-version
dependency
go-webauthn
dependency
gocron
dependency
Golang
dependency
goldmark
dependency
goquery
dependency
Goth
dependency
grpc-go
dependency
happy-dom
dependency
Helm
dependency
image-spec
dependency
jsonschema
dependency
KaTeX
dependency
lint
dependency
MariaDB
dependency
Mermaid
dependency
minio-go
dependency
misspell
dependency
Monaco
dependency
PDFobject
dependency
playwright
dependency
postcss
dependency
postcss-plugins
dependency
pprof
dependency
prometheus client_golang
dependency
protobuf
dependency
relative-time-element
dependency
renovate
dependency
reply
dependency
ssh
dependency
swagger-ui
dependency
tailwind
dependency
temporal-polyfill
dependency
terminal-to-html
dependency
tests-only
dependency
text-expander-element
dependency
urfave
dependency
vfsgen
dependency
vite
dependency
Woodpecker CI
dependency
x tools
dependency
XORM
Discussion
duplicate
enhancement/feature
forgejo/accessibility
forgejo/branding
forgejo/ci
forgejo/commit-graph
forgejo/documentation
forgejo/furnace cleanup
forgejo/i18n
forgejo/interop
forgejo/moderation
forgejo/privacy
forgejo/release
forgejo/scaling
forgejo/security
forgejo/ui
Gain
High
Gain
Nice to have
Gain
Undefined
Gain
Very High
good first issue
i18n/backport-stable
impact
large
impact
medium
impact
small
impact
unknown
Incompatible license
issue
closed
issue
do-not-exist-yet
issue
open
manual test
Manually tested during feature freeze
OS
FreeBSD
OS
Linux
OS
macOS
OS
Windows
problem
QA
regression
release blocker
Release Cycle
Feature Freeze
release-blocker
v7.0
release-blocker
v7.0.1
release-blocker
v7.0.2
release-blocker
v7.0.3
release-blocker
v7.0.4
release-blocker
v8.0.0
release-blocker/v9.0.0
run-all-playwright-tests
run-end-to-end-tests
test
manual
test
needed
test
needs-help
test
not-needed
test
present
untested
User research - time-tracker
valuable code
worth a release-note
User research - Accessibility
User research - Blocked
User research - Community
User research - Config (instance)
User research - Errors
User research - Filters
User research - Future backlog
User research - Git workflow
User research - Labels
User research - Moderation
User research - Needs input
User research - Notifications/Dashboard
User research - Rendering
User research - Repo creation
User research - Repo units
User research - Security
User research - Settings (in-app)
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo/forgejo!9434
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "mfenniak/forgejo:workflow-level-concurrency"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Currently references a pre-release version of
code.forgejo.org/forgejo/runner/v11, pending release of https://code.forgejo.org/forgejo/runner/pulls/1026.Fixes #5914.
This PR is quite large, but it can be reviewed commit-by-commit in relatively small, logical chunks.
Adds support for workflows with a
concurrencyblock, and submembersgroupandcancel-in-progress. For example:The concurrency block effectively ends up with four supported behaviors that users will want to choose from:
cancel-in-progressvalue is set tofalseand nogroupis provided, then the previously described Forgejo behavior will be disabled and an unlimited number of workflows can be executed simultaneously (to the maximum supported by the Forgejo Runner capacity).groupis provided andcancel-in-progress: falseis set, then every new action run with in the same repository with the same group value will be queued behind previous workflow runs, allowing only one workflow to execute at a time in the group, but allowing all workflows to finish naturally.groupis provided andcancel-in-progress: trueis set, then every new action run with in the same repository with the same group value will cause previously queued or running runs to be cancelled, allowing only one workflow to execute at a time in the group, but preferring execution of the most recent workflow.Both the
groupandcancel-in-progressvalues can access values from thegithub,inputsandvarscontext for dynamic behavior.Checklist
The contributor guide contains information that will be helpful to first time contributors. There also are a few conditions for merging Pull Requests in Forgejo repositories. You are also welcome to join the Forgejo development chatroom.
Tests
*_test.gofor unit tests.tests/integrationdirectory if it involves interactions with a live Forgejo server.web_src/js/*.test.jsif it can be unit tested.tests/e2e/*.test.e2e.jsif it requires interactions with a live Forgejo server (see also the developer guide for JavaScript testing).Documentation
Release notes
release-notes/<pull request number>.mdto be be used for the release notes instead of the title.Release notes
50a4c2e428211abc077e211abc077e8e2fb30e2bconcurrency#1513@mfenniak wrote in #9434 (comment):
I have not observed that it actually works on pull requests. Do you have a reproducer where a push to a pull request cancels the previous run?
https://code.forgejo.org/forgejo/end-to-end/pulls/135/files is a tentative reproducer from some time ago but it would need to be updated.
@earl-warren wrote in #9434 (comment):
This statement reflects the existing code as written, but no, I haven't observed it. I suspect that if you've observed the absence of it, it's likely because of the cancel check is specific to the synchronized "subevent" (for lack of a better term). That may not be the event type when
on: pull_requestis used in the workflow...I can take a closer look tomorrow. Since this behavior was just copied from the existing implementation, I didn't dig into it in detail, I was just concerned with not changing anything that existed in that path. 🤔
Very nicely done and an easy to read series of commits. I also took a look at your todo list mfenniak/forgejo#1 out of curiosity but this PR stands on its own. I'll review again once the matching runner PR is merged.
8e2fb30e2bf94c345349@earl-warren wrote in #9434 (comment):
Regarding the TODO list, the only item unsatisfied is that I'd love to performance test the concurrency group task picking database queries. (Query in question: mfenniak/forgejo#1 (comment))
I'm not very worried about the changes because:
action_run_jobrecords withtask_id=0andstatus=StatusWaiting, and the additional complicated subqueries on concurrency groups will only apply to that small set of waiting tasks.Despite that I had thought it might be good to test the performance with a large-scale
action_runandaction_run_jobtable like Codeberg's... but then there's a complication that it requires the newconcurrency_groupandconcurrency_typefields... and I'd have to synthetically populate those in order to create a test dataset. Once you get to doing performance testing with synthetically generated data, the value of that testing varies wildly depending on how lucky you are in guessing the data that might generate problems.I'm reaching two possible conclusions:
action_runandaction_run_job(which could haveevent_payloadandworkflow_payloadredacted for its sensitive data), load it into a test DB, perform schema changes and synthetic data generation, and hope to generate a reasonable test case.and I'm leaning towards it's not worth it.
@earl-warren wrote in #9434 (comment):
I've bumped the forgejo->runner reference up to the latest
maincommit in this branch now, and there will be a few CI tweaks to resolve. A follow-up review should hold until I get a nice green build. 🙂f94c345349f2cadf5409f2cadf54094cc122047a4cc122047a1d2ef03cf8WIP: feat: implement "concurrency" block in Forgejo Actions at the workflow levelto feat: implement "concurrency" block in Forgejo Actions at the workflow levelWhere does that come from?
The following is a preview of the release notes for this pull request, as they will appear in the upcoming release. They are derived from the content of the `release-notes/9434.md` file, if it exists, or the title of the pull request. They were also added at the bottom of the description of this pull request for easier reference.This message and the release notes originate from a call to the release-notes-assistant.
Release notes
@earl-warren wrote in #9434 (comment):
You are right to raise a concern here. The existing behavior...
First attempts to
CancelPreviousJobsif the run'sEventisHookEventPullRequestSync:// cancel running jobs if the event is push or pull_request_syncif run.Event == webhook_module.HookEventPush ||run.Event == webhook_module.HookEventPullRequestSync {if err := CancelPreviousJobs(ctx,run.RepoID,run.Ref,run.WorkflowID,run.Event,); err != nil {log.Error("CancelPreviousJobs: %v", err)}}And then attempts to find the runs to cancel based upon querying
TriggerEvent, notEvent.// CancelPreviousJobs cancels all previous jobs of the same repository, reference, workflow, and event.// It's useful when a new run is triggered, and all previous runs needn't be continued anymore.func CancelPreviousJobs(ctx context.Context, repoID int64, ref, workflowID string, event webhook_module.HookEventType) error {// Find all runs in the specified repository, reference, and workflow with non-final statusruns, total, err := db.FindAndCount[actions_model.ActionRun](ctx, actions_model.FindRunOptions{RepoID: repoID,Ref: ref,WorkflowID: workflowID,TriggerEvent: event,Status: []actions_model.Status{actions_model.StatusRunning, actions_model.StatusWaiting, actions_model.StatusBlocked},})When a push occurs to a branch with an open PR, the
EventisHookEventPullRequestSync, causingCancelPreviousJobsto be invoked. But theTriggerEventin the database for previous runs is actuallyHookEventPullRequest:And so the existing behavior is that the run generated by opening the pull request is the only one that will ever be cancelled by future pushes. Once that run is complete, subsequent pushes to the branch will not effect each other.
This seems like it is clearly a bug...
The "default" behavior in this PR works as I would expect, and as it seems was intended in the original implementation: push and pull request synchronize events cancel all previous runs on the same repository, branch, event, and workflow.
If we're OK with the change: Is it possible that we could document this as a "breaking bug fix" in the release notes on this PR? I'm not sure if that's possible with the release notes assistant to have two entries, one for a feature, and one for a breaking bug fix?
Otherwise: I could restore the broken behavior... or remove any reference to
HookEventPullRequestSyncand make it the cancel only apply toPush.@mfenniak wrote in #9434 (comment):
The absence of performance tests make me nervous on occasions. But given your analysis of what actually happens, I think it will be fine and performance testing would be overkill. That said, it would be fine by me to trust you with the requested data from code.forgejo.org if you are motivated to run tests.
@mfenniak wrote in #9434 (comment):
It is possible to have multiple lines when using a
release-notes/*.mdfile, which can be used for instance when upgrading a dependency that has multiple user facing changes worth a release note.feat: mermaid: [Add support for iconify icons](https://github.com/mermaid-js/mermaid/pull/5793).feat: mermaid: [Allow multi-line relationship labels](https://github.com/mermaid-js/mermaid/pull/5711).feat: mermaid: [Adds architecture diagrams which allows users to show relations between services](https://github.com/mermaid-js/mermaid/pull/5452).IMHO it is not worth restoring the broken behavior. Nice sleuth work.
A minor change request after a first review pass. I'll need to spend time on a thorough review now that I get the overall logic.
@ -11,3 +11,3 @@code.forgejo.org/forgejo/levelqueue v1.0.0code.forgejo.org/forgejo/reply v1.0.2code.forgejo.org/forgejo/runner/v11 v11.1.1code.forgejo.org/forgejo/runner/v11 v11.1.2-0.20250927055528-46a955d1ffecNow updated to v11.1.2.
aeb0de1269..5669e49025@ -242,0 +295,4 @@// computationally expensive on the database. To manage the risk that this might have on large-scale deployments// When this feature is initially released, it can be disabled in the ini file by setting// `CONCURRENCY_GROUP_QUEUE_ENABLED = false` in the `[actions]` section. If disabled, then actions with a// concurrency group and `cancel-in-progress: false` will run simultaneously rather than being queued.If I understand correctly, the idea behind
CONCURRENCY_GROUP_QUEUE_ENABLEDisIf I interpreted that correctly, it should be documented in
custom/conf/app.example.iniCONCURRENCY_GROUP_QUEUE_ENABLEDwas only intended to manage the performance risks from the database query change. It doesn't disable the entire capability ofconcurrency-- just "group queue" functionality. The cancellation capabilities still work.Documentation added covering the existing implementation: !9434 (commit
c52860cc3d)"... to make it so existing workflows do not suddenly change their behavior" is not the intended use of the setting, however, it's an interesting thought. That need wouldn't be satisfied by the current setting because of the limited scope -- for example, an existing workflow with
cancel-in-progress: truewould be valid and run, and have its behavior change with this feature regardless of the setting.I'm thinking about it... and I don't think it's necessary to create a setting that disables the entire feature. If someone has an existing
concurrencyblock that suddenly becomes functional and they don't want it, they can just remove it to get the old behavior. It isn't breaking without any solution available.@ -227,0 +228,4 @@// Reaching a finalized result for a task can cause other tasks in the same concurrency group to become// unblocked. Increasing task version here allows all applicable runners to requery to the DB for that state.// Because it is only useful for that condition, and it has system performance risks, only enable it when// concurrency group queuing is enabled.Note to self: here is a comment explaining the reason for having
ConcurrencyGroupQueueEnabled@ -120,2 +120,4 @@// v39 -> v40NewMigration("Add index for release sha1", AddIndexForReleaseSha1),// v40 -> v41NewMigration("Add action_run concurrency fields", AddActionRunConcurrency),I don't get why the upgrade test fail. 🤔
I think it has nothing to do with this PR.
cascading-pr updated at https://code.forgejo.org/forgejo/end-to-end/pulls/1072
The upgrade failure of the cascade is because the tag for v14 dev is not in this branch, it can be ignored. What matters really is that the actions tests run fine and show it does not break anything.
The upgrade test will run fine once this PR is rebased.
Unfinished thoughts.
Tthe query
getConcurrencyConditionis both complex and potentially expensive. It feels like something is missing that would make it more efficient and simpler.In the current state of the codebase
CreateTaskForRunnerdraws the first available row from theaction_run_jobtable and it sits in theaction_tasktable until it is picked by a runner with the appropriate label.With this PR the
getConcurrencyConditionon theaction_runrow will decide whichaction_run_jobis eligible based on the concurrency logic.Since the concurrency rules are at the workflow level, could they be implemented so that no
action_run_jobis created rather than at a lower level where it regulates how anaction_taskis created from anaction_run_job?1d2ef03cf828ae1c339f@earl-warren wrote in #9434 (comment):
Just a minor correction:
action_taskis created after anaction_run_jobis picked for a runner. Before jobs are picked, there are justaction_run(a workflow) andaction_run_job(a job).It's an interesting thought, but I think we'd run into two problems:
action_runandaction_run_job, then at this later time you would need to parse the workflow and generate the N jobs from the run into theaction_run_jobto make them runnable, and you'd end up with some code duplication at this time.If
CreateTaskForRunnerwas invoked for every fetch, I think the best argument for what you're suggesting is that fetches happen more frequently than tasks finishing, and so the expensive query would happen less frequently. But with the task version layer in the API that preventsCreateTaskForRunnerfrom being invoked for most fetches, it isn't invoked too often.I think it's a fair risk to be concerned about the complexity and performance of the queuing database query -- that is why I created the targeted enable/disable flag. But... I think it might be a premature to be re-architecting based upon just the worry.
As Forgejo is deployed on larger systems and runs into scalability problems, I'm confident that the current implementation with CreateTaskForRunner pulling all the pending tasks and check them against the runner's labels in the application server will be a performance problem before the concurrency querying is. 😬
c52860cc3d722cb9a9ed@mfenniak wrote in #9434 (comment):
I stand corrected 👍
@mfenniak wrote in #9434 (comment):
But you could also do nothing different than now when a task completes. You would only do something different when a run completes. Only then would you look for other runs eligible according to the concurrency rules. And once you find one, you call notify() on it and it spawns
action_run_jobviaactions_model.InsertRunThe existing logic of
CreateTaskForRunnerwould not need to be changed, it would continue to pick jobs the way it currently does. Only it would not see anyaction_run_jobfor runs that are queued because of concurrency rules.Maybe we can chat about that later, the code logic is not as clear in my head as it is in yours and I may be dreaming too much about this "let's do this at a higher level" idea. 😊
@mfenniak wrote in #9434 (comment):
I agree. My motivation here is to simplify rather than refactor. If a refactor is needed, that would defeat the idea I'm proposing.
722cb9a9edaeb0de1269aeb0de12695669e49025This is one of my blocker to use the runner for more jobs on out own instance 😏