fix: get new session from enginegroup instead of masterengine #10140

Merged
Gusted merged 2 commits from gusted/forgejo-getengine-defaultcontext into forgejo 2025-11-17 14:42:58 +01:00 AGit
Owner

Within Codeberg we are looking into distributing the database queries, we tried forgejo/forgejo!7212 on several occasions but never got it to work.

After a long debugging session in a staging environment I was able to find two bugs that made it impossible for this feature to work: forgejo/docs!1587 which resulted in replica engines never being configured and used if you followed the documentation. The other bug is what this patch intends to fix. In order to do some database operation, you need to the database engine - it will first look if one is set for the context (only useful for transactions) and otherwise create a new session of the engine from the master engine x. The problem is that x is explicitly set to be the master engine and not the engine group (that includes the replica engines) - Unless the code uses DefaultContext, which is almost nowhere used after some great refactoring in Gitea to use the passed context, it did not use the replica engines.

Get engine from the DefaultContext (which is set to the enginegroup) and create a new session from that.

Lines 220 to 231 in 20f8572
// SetDefaultEngine sets the default engine for db.
func SetDefaultEngine(ctx context.Context, eng Engine) {
masterEngine, err := GetMasterEngine(eng)
if err == nil {
x = masterEngine
}
DefaultContext = &Context{
Context: ctx,
e: eng,
}
}

And SetDefaultEngine is called from

SetDefaultEngine(ctx, eng)

Where eng is the engine group.

No test (no clue where to begin even). Tested in staging environment.

Test

  1. Configure database replicas.
  2. Start Forgejo.
  3. Verify Forgejo loads.
  4. Stop the database replicas.
  5. Verify Forgejo shows 500 errors.

Release notes

  • Bug fixes
    • PR: get new session from enginegroup instead of masterengine
Within Codeberg we are looking into distributing the database queries, we tried forgejo/forgejo!7212 on several occasions but never got it to work. After a long debugging session in a staging environment I was able to find two bugs that made it impossible for this feature to work: forgejo/docs!1587 which resulted in replica engines never being configured and used if you followed the documentation. The other bug is what this patch intends to fix. In order to do some database operation, you need to the database engine - it will first look if one is set for the context (only useful for transactions) and otherwise create a new session of the engine from the master engine `x`. The problem is that `x` is explicitly set to be the master engine and not the engine group (that includes the replica engines) - Unless the code uses `DefaultContext`, which is almost nowhere used after some great refactoring in Gitea to use the passed context, it did not use the replica engines. Get engine from the `DefaultContext` (which is set to the enginegroup) and create a new session from that. https://codeberg.org/forgejo/forgejo/src/commit/20f8572b923c83258a53f1629782953000dd0e75/models/db/engine.go#L220-L231 And `SetDefaultEngine` is called from https://codeberg.org/forgejo/forgejo/src/commit/20f8572b923c83258a53f1629782953000dd0e75/models/db/engine.go#L212 Where `eng` is the engine group. No test (no clue where to begin even). Tested in staging environment. ## Test 1. Configure database replicas. 2. Start Forgejo. 3. Verify Forgejo loads. 4. Stop the database replicas. 5. Verify Forgejo shows 500 errors. <!--start release-notes-assistant--> ## Release notes <!--URL:https://codeberg.org/forgejo/forgejo--> - Bug fixes - [PR](https://codeberg.org/forgejo/forgejo/pulls/10140): <!--number 10140 --><!--line 0 --><!--description Z2V0IG5ldyBzZXNzaW9uIGZyb20gZW5naW5lZ3JvdXAgaW5zdGVhZCBvZiBtYXN0ZXJlbmdpbmU=-->get new session from enginegroup instead of masterengine<!--description--> <!--end release-notes-assistant-->
fix: get new session from enginegroup instead of masterengine
Some checks failed
testing / frontend-checks (pull_request) Successful in 50s
testing / backend-checks (pull_request) Successful in 3m0s
issue-labels / backporting (pull_request_target) Has been skipped
issue-labels / cascade (pull_request_target) Has been skipped
issue-labels / release-notes (pull_request_target) Has been skipped
requirements / merge-conditions (pull_request) Successful in 3s
testing / test-pgsql (pull_request) Failing after 2m55s
testing / test-sqlite (pull_request) Failing after 3m2s
testing / test-unit (pull_request) Successful in 6m36s
testing / test-remote-cacher (redis) (pull_request) Successful in 2m14s
testing / test-remote-cacher (valkey) (pull_request) Successful in 2m14s
testing / test-remote-cacher (garnet) (pull_request) Successful in 2m13s
testing / test-remote-cacher (redict) (pull_request) Successful in 2m13s
testing / test-e2e (pull_request) Successful in 16m35s
testing / test-mysql (pull_request) Successful in 22m8s
testing / security-check (pull_request) Has been skipped
8015b94dc8
Within Codeberg we are looking into distributing the database queries,
we tried forgejo/forgejo!7212 on several occasions but never got it to
work.

After a long debugging session in a staging environment I was able
to find two bugs that made it impossible for this feature to work:
forgejo/docs!1587 which resulted in replica engines never being
configured and used if you followed the documentation. The other bug is
what this patch intends to fix. In order to do some database operation,
you need to the database engine - it will first look if one is set for
the context (only useful for transactions) and otherwise create a new
session of the engine from the master engine `x`. The problem is that
`x` is explictly set to be the master engine and not the engine
group (that includes the replica engines) - Unless the code uses
`DefaultContext`, which is almost nowhere used after some great
refactoring in Gitea to use the passed context, it did not use the
replica engines.

Get engine from the `DefaultContext` (which is set to the enginegroup)
and create a new session from that.

No test (no clue where to begin even). Tested in staging environment.
Author
Owner

@pat-s FYI

@pat-s FYI
chore: don't return enginegroup when no replica is set
Some checks failed
testing / frontend-checks (pull_request) Successful in 54s
testing / backend-checks (pull_request) Successful in 3m19s
testing / test-unit (pull_request) Successful in 6m3s
testing / test-remote-cacher (redis) (pull_request) Successful in 2m20s
testing / test-remote-cacher (valkey) (pull_request) Successful in 2m19s
testing / test-remote-cacher (garnet) (pull_request) Successful in 2m21s
testing / test-remote-cacher (redict) (pull_request) Successful in 2m21s
testing / test-e2e (pull_request) Failing after 19m44s
testing / test-mysql (pull_request) Successful in 32m28s
testing / test-sqlite (pull_request) Successful in 39m9s
testing / test-pgsql (pull_request) Successful in 45m20s
testing / security-check (pull_request) Successful in 1m55s
issue-labels / cascade (pull_request_target) Has been skipped
requirements / merge-conditions (pull_request) Successful in 1s
issue-labels / release-notes (pull_request_target) Successful in 34s
milestone / set (pull_request_target) Successful in 3s
issue-labels / backporting (pull_request_target) Successful in 25s
76fc2344f7
Don't always return a enginegroup, if there's no replica just return the
main engine.
mfenniak approved these changes 2025-11-17 02:46:02 +01:00
mfenniak left a comment
Member

In the scope of this change to Forgejo, everything looks good to me.

My other thoughts here are out-of-scope of this PR.

With respect to your testing... I feel this is obvious but I want to point it out anyway... "5. Verify Forgejo shows 500 errors." is great for proving that this change works but a terrible behaviour for an application accessing a database cluster. 🤣 I see that xorm doesn't do any kind of health checking on the replica set to maintain access to just the online replicas. You might want to be cautious as this will make the reliability of your system worse -- for example, if you're going from (1 primary) to (1 primary + 2 replicas), the application becomes quite broken at one database node outage. I'm just curious as I'm always interested in scaling -- is there a mechanism in mind to handle this? Or is it a future requirement?

In the scope of this change to Forgejo, everything looks good to me. My other thoughts here are out-of-scope of this PR. With respect to your testing... I feel this is obvious but I want to point it out anyway... "5. Verify Forgejo shows 500 errors." is great for proving that this change works but a terrible behaviour for an application accessing a database cluster. 🤣 I see that xorm doesn't do any kind of health checking on the replica set to maintain access to just the online replicas. You might want to be cautious as this will make the reliability of your system worse -- for example, if you're going from (1 primary) to (1 primary + 2 replicas), the application becomes quite broken at one database node outage. I'm just curious as I'm always interested in scaling -- is there a mechanism in mind to handle this? Or is it a future requirement?
Author
Owner

@mfenniak wrote in #10140 (comment):

I see that xorm doesn't do any kind of health checking on the replica set to maintain access to just the online replicas. You might want to be cautious as this will make the reliability of your system worse -- for example, if you're going from (1 primary) to (1 primary + 2 replicas), the application becomes quite broken at one database node outage. I'm just curious as I'm always interested in scaling -- is there a mechanism in mind to handle this? Or is it a future requirement?

It's very likely I'll cook up another solution for Codeberg that doesn't involve asking XORM to do load-balancing (ProxySQL or HAProxy, but everything at database HA/scaling feels and looks enterprise), after seeing how this works in XORM (and getting it to work to verify this exact concern) we decided to not even test it on production.

@mfenniak wrote in https://codeberg.org/forgejo/forgejo/pulls/10140#issuecomment-8311652: > I see that xorm doesn't do any kind of health checking on the replica set to maintain access to just the online replicas. You might want to be cautious as this will make the reliability of your system worse -- for example, if you're going from (1 primary) to (1 primary + 2 replicas), the application becomes quite broken at one database node outage. I'm just curious as I'm always interested in scaling -- is there a mechanism in mind to handle this? Or is it a future requirement? It's very likely I'll cook up another solution for Codeberg that doesn't involve asking XORM to do load-balancing (ProxySQL or HAProxy, but everything at database HA/scaling feels and looks enterprise), after seeing how this works in XORM (and getting it to work to verify this exact concern) we decided to not even test it on production.
Where does that come from? The following is a preview of the release notes for this pull request, as they will appear in the upcoming release. They are derived from the content of the `release-notes/10140.md` file, if it exists, or the title of the pull request. They were also added at the bottom of the description of this pull request for easier reference.

This message and the release notes originate from a call to the release-notes-assistant.

@@ -19,3 +19,11 @@
 3. Verify Forgejo loads.
 4. Stop the database replicas.
-5. Verify Forgejo shows 500 errors.
\ No newline at end of file
+5. Verify Forgejo shows 500 errors.
+
+<!--start release-notes-assistant-->
+
+## Release notes
+<!--URL:https://codeberg.org/forgejo/forgejo-->
+- Bug fixes
+  - [PR](https://codeberg.org/forgejo/forgejo/pulls/10140): <!--number 10140 --><!--line 0 --><!--description Z2V0IG5ldyBzZXNzaW9uIGZyb20gZW5naW5lZ3JvdXAgaW5zdGVhZCBvZiBtYXN0ZXJlbmdpbmU=-->get new session from enginegroup instead of masterengine<!--description-->
+<!--end release-notes-assistant-->

Release notes

  • Bug fixes
    • PR: get new session from enginegroup instead of masterengine
<details> <summary>Where does that come from?</summary> The following is a preview of the release notes for this pull request, as they will appear in the upcoming release. They are derived from the content of the `release-notes/10140.md` file, if it exists, or the title of the pull request. They were also added at the bottom of the description of this pull request for easier reference. This message and the release notes originate from a call to the [release-notes-assistant](https://code.forgejo.org/forgejo/release-notes-assistant). ```diff @@ -19,3 +19,11 @@ 3. Verify Forgejo loads. 4. Stop the database replicas. -5. Verify Forgejo shows 500 errors. \ No newline at end of file +5. Verify Forgejo shows 500 errors. + +<!--start release-notes-assistant--> + +## Release notes +<!--URL:https://codeberg.org/forgejo/forgejo--> +- Bug fixes + - [PR](https://codeberg.org/forgejo/forgejo/pulls/10140): <!--number 10140 --><!--line 0 --><!--description Z2V0IG5ldyBzZXNzaW9uIGZyb20gZW5naW5lZ3JvdXAgaW5zdGVhZCBvZiBtYXN0ZXJlbmdpbmU=-->get new session from enginegroup instead of masterengine<!--description--> +<!--end release-notes-assistant--> ``` </details> <!--start release-notes-assistant--> ## Release notes <!--URL:https://codeberg.org/forgejo/forgejo--> - Bug fixes - [PR](https://codeberg.org/forgejo/forgejo/pulls/10140): <!--number 10140 --><!--line 0 --><!--description Z2V0IG5ldyBzZXNzaW9uIGZyb20gZW5naW5lZ3JvdXAgaW5zdGVhZCBvZiBtYXN0ZXJlbmdpbmU=-->get new session from enginegroup instead of masterengine<!--description--> <!--end release-notes-assistant-->
Gusted merged commit afbd05c398 into forgejo 2025-11-17 14:42:58 +01:00
jasewolf referenced this pull request from a commit 2025-11-29 04:04:07 +01:00
Sign in to join this conversation.
No reviewers
No labels
arch
riscv64
backport/v1.19
backport/v1.20
backport/v1.21/forgejo
backport/v10.0/forgejo
backport/v11.0/forgejo
backport/v12.0/forgejo
backport/v13.0/forgejo
backport/v14.0/forgejo
backport/v7.0/forgejo
backport/v8.0/forgejo
backport/v9.0/forgejo
breaking
bug
bug
confirmed
bug
duplicate
bug
needs-more-info
bug
new-report
bug
reported-upstream
code/actions
code/api
code/auth
code/auth/faidp
code/auth/farp
code/email
code/federation
code/git
code/migrations
code/packages
code/wiki
database
MySQL
database
PostgreSQL
database
SQLite
dependency-upgrade
dependency
certmagic
dependency
chart.js
dependency
Chi
dependency
Chroma
dependency
citation.js
dependency
codespell
dependency
css-loader
dependency
devcontainers
dependency
dropzone
dependency
editorconfig-checker
dependency
elasticsearch
dependency
enmime
dependency
F3
dependency
ForgeFed
dependency
garage
dependency
Git
dependency
git-backporting
dependency
Gitea
dependency
gitignore
dependency
go-ap
dependency
go-enry
dependency
go-gitlab
dependency
Go-org
dependency
go-rpmutils
dependency
go-sql-driver mysql
dependency
go-swagger
dependency
go-version
dependency
go-webauthn
dependency
gocron
dependency
Golang
dependency
goldmark
dependency
goquery
dependency
Goth
dependency
grpc-go
dependency
happy-dom
dependency
Helm
dependency
image-spec
dependency
jsonschema
dependency
KaTeX
dependency
lint
dependency
MariaDB
dependency
Mermaid
dependency
minio-go
dependency
misspell
dependency
Monaco
dependency
PDFobject
dependency
playwright
dependency
postcss
dependency
postcss-plugins
dependency
pprof
dependency
prometheus client_golang
dependency
protobuf
dependency
relative-time-element
dependency
renovate
dependency
reply
dependency
ssh
dependency
swagger-ui
dependency
tailwind
dependency
temporal-polyfill
dependency
terminal-to-html
dependency
tests-only
dependency
text-expander-element
dependency
urfave
dependency
vfsgen
dependency
vite
dependency
Woodpecker CI
dependency
x tools
dependency
XORM
Discussion
duplicate
enhancement/feature
forgejo/accessibility
forgejo/branding
forgejo/ci
forgejo/commit-graph
forgejo/documentation
forgejo/furnace cleanup
forgejo/i18n
forgejo/interop
forgejo/moderation
forgejo/privacy
forgejo/release
forgejo/scaling
forgejo/security
forgejo/ui
Gain
High
Gain
Nice to have
Gain
Undefined
Gain
Very High
good first issue
i18n/backport-stable
impact
large
impact
medium
impact
small
impact
unknown
Incompatible license
issue
closed
issue
do-not-exist-yet
issue
open
manual test
Manually tested during feature freeze
OS
FreeBSD
OS
Linux
OS
macOS
OS
Windows
problem
QA
regression
release blocker
Release Cycle
Feature Freeze
release-blocker
v7.0
release-blocker
v7.0.1
release-blocker
v7.0.2
release-blocker
v7.0.3
release-blocker
v7.0.4
release-blocker
v8.0.0
release-blocker/v9.0.0
run-all-playwright-tests
run-end-to-end-tests
test
manual
test
needed
test
needs-help
test
not-needed
test
present
untested
User research - time-tracker
valuable code
worth a release-note
User research - Accessibility
User research - Blocked
User research - Community
User research - Config (instance)
User research - Errors
User research - Filters
User research - Future backlog
User research - Git workflow
User research - Labels
User research - Moderation
User research - Needs input
User research - Notifications/Dashboard
User research - Rendering
User research - Repo creation
User research - Repo units
User research - Security
User research - Settings (in-app)
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo/forgejo!10140
No description provided.