-
-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Limit repo size #21820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Limit repo size #21820
Conversation
…or later see what is still needed
|
Does it count also LFS? (currently repo size is git repo size + LFS object size) Imho we should finally split to have two repo sizes in db (git repo size and LFS object size) and both should have different limits settable. |
|
@lafriks it does count the LFS now and the repo limit applies to both, which I think is correct. |
6f61259 to
75585de
Compare
|
LFS size is not counted. And LFS is tricky because we may end up with LFS files uploaded and the pointer push is denied because of the limit or (worse) the pointer push is allowed and the file upload fails afterwards because of the limit. |
|
Ok will look into this further. Thank you! |
|
Yes that's why my suggestion was for those to be different limits and splitting also how sizes are saved in database for repo |
|
Hi @techknowlogick @kdumontnu I have fixed the opencollective link to make it easy to donate. I've moved the funds I was able to gather 5$ - there :) Here is the new link for this activity: https://opencollective.com/oss-code-ge/projects/gitea-limit-repository-size |
|
Hi @DmitryFrolovTri |
|
@DmitryFrolovTri I've heard back from OC, and because you are not using them as a fiscal sponsor that's why I couldn't make the transfer. So I will just mention it here that upon completion of this PR/issue we will pay out $500 from our collective for the bounty (this amount was chosen to limit tax burden to whoever gets paid out), and @sapk for your work so far, if you are interested, we can pay you for your work so far (reach out to me via email and I can get you sorted). |
|
@techknowlogick I've since then re-registered and the link above is with the opencollective (OC host) now. |
…ed in XX YY where YY is the MiB MB etc
|
@lunny @techknowlogick @lafriks please take a look. I have updated the task with new screenshots and all things done so far. For now I have decided not to do major refactor to adapt to forgejo or similair logic but we definitely need to go there. Potentially start with limits stored in another table and go from there. Open to suggestions but knowing how long I sat on this I would prefer to refactor later. The feature works with the current gitea. Let's decide whether we are good to let it in experimental or you want some changes, please take a notice that there are UI screens and there is a save of those variables from UI. So I would love to ship this somehow so can go on with other things. Please review. |
Signed-off-by: silverwind <[email protected]>
Signed-off-by: silverwind <[email protected]>
silverwind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get it in.
| func AddSizeLimitOnRepo(x *xorm.Engine) error { | ||
| type Repository struct { | ||
| ID int64 `xorm:"pk autoincr"` | ||
| SizeLimit int64 `xorm:"NOT NULL DEFAULT 0"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More and more columns added to this table. Whether it's better to have another table to store these limitations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size limit is related to the repository. It would make sense to make it a separate struct if it could be reused, eg. for the global limits. Otherwise what would be the point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend a new table to store them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for this recommendation, though?
The repository table as tens of columns by now, and all of them are applicable to all repositories.
It seems that some of these values are inherent and some calculated. eg. number of starts or size need to be updated by some bookkeeping
It might be useful to split out the calculated values but the values added here are inherent part of the repository configuration.
If this was followed from the start and every couple repository configuration options were in a separate table how many tables would be there by now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above:
- The repository table has already accumulated an increasing number of columns. At the same time, we maintain several dedicated tables for configuration data—such as
repo_unit,system_setting, anduser_setting. The proposed column is not a core attribute of a repository; it exists purely as a constraint or limitation. Given this, there is no strong justification for further bloating the repository table. - Most repositories will inherit the global configuration without any overrides. Storing an additional column for every repository in this case would largely be wasted space.
- Adding a new column to an existing, heavily used repository table is more time-consuming and risky than creating a new table during Gitea’s startup or migration process.
|
|
||
| // FileSize calculates the file size and generate user-friendly string. | ||
| func FileSize(s int64) string { | ||
| if s == -1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why -1 is possible and we need to handle it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lunny It is used for display values in UI. -1 is used in config to represent a feature toggle. I was asked to remove the the master switch from the config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this logic be placed outside of this package? Since this is a tool/utility package, it shouldn’t be coupled with business logic.
| // Get FileSize bytes value from String. | ||
| func GetFileSize(s string) (int64, error) { | ||
| s = strings.TrimSpace(s) | ||
| if s == "-1" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above
| ) | ||
|
|
||
| // CountObjects returns the results of git count-objects on the repoPath | ||
| func CountObjects(ctx context.Context, repoPath string) (*CountObject, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to move all the functions to modules/gitrepo and used gitrepo.Repository instead of repoPath.
| return | ||
| } | ||
|
|
||
| err = setting.SaveGlobalRepositorySetting(repoSizeLimit, lfsSizeLimit, form.LFSSizeInRepoSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will be allowed. It will cause many problems. Many users use HA mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lunny Pllease advise what is your view on this:
- Keep how done currently, remove the save, make this a runtime only configuration
- Modify the UI in some way (which), remove the save, make this a runtime only configuration
- Make this a read only/display only configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, there are two possible approaches in the codebase:
- 1 Define it in the configuration file and do not allow it to be modified through the UI.
- 2 Store it in the database and make it configurable via the UI.
Thank you for bringing this up. I’ve been reviewing the PR, and unfortunately, from my perspective, there is still a fair amount of work needed to make this feature more robust. |
That's correct, file deletion increases repo size. Edit: I see, the problem is not that deletion is rejected but that the error message is not correct. |
The goal of this PR is to define a repo limit size using "no-worsen" approach (allow things that do not increase or decrease the size on disk eventually) I think org and user level restriction could come later.
Thanks to @sapk as this is a continuation of his work that was started in #7833
Would address (except for LFS): #3658
**Screenshots:**
global limitindividual repo size limit for admin

app.ini with settings
how currently push is rejected if limit is reached

TODO: (1)
the config parameter - ENABLE_SIZE_LIMIT = true/false
TOFIX: (2)
Deletion of file from UI trigger 500 when repo is over. -> TODO catch this specific error.
2019/08/16 05:23:58 ...uters/repo/editor.go:432:DeleteFilePost() [E] DeleteRepoFile: git push: remote: Gitea: new repo size is over limitation 10000 To /home/sapk/go/src/code.gitea.io/gitea/data/repositories/sapk/test.git ! [remote rejected] d9629b41f9c58da756cf806aabf5811b1ff45b50 -> master (pre-receive hook declined) error: impossible de pousser des références vers '/home/sapk/go/src/code.gitea.io/gitea/data/repositories/sapk/test.git'Creation of branch from UI trigger 500 when repo is over. -> TODO catch this specific error.
2019/08/16 05:28:42 ...uters/repo/branch.go:287:CreateBranch() [E] CreateNewBranch: Push: exit status 1 - remote: Gitea: new repo size is over limitation 10000 To /home/sapk/go/src/code.gitea.io/gitea/data/repositories/sapk/test.git ! [remote rejected] test -> test (pre-receive hook declined) error: impossible de pousser des références vers '/home/sapk/go/src/code.gitea.io/gitea/data/repositories/sapk/test.git'Console operations when failing doesn't provide a correct error message - now it does
Unable to push anything that decreases size of repo and a relevant test scenario - now is possible
Documentation update
Add a delete test - gitea should accept any action (tag, branch, commit, file) that should reduce the repo in size. Since delete doesn't remove any data, we hope for git gc to reduce size after delete eventually. Ideally we should also run git gc and confirm size release.
Added usage of
git verify-pack -vto speed up size assessment, added usage ofgit batch-check;git cat-file -snow uses workers. If pack files are present then compressed size is used as it is closer to real disk usage on both repo analysis and push analysis.Update algorithm to calculate on incoming push lfs object sizes and exclude their size from the push size (LFS objects are included into push size by git by default)
Rework Size Checking enablement to use only REPO_SIZE_LIMIT from config -1 - not enabled, ignore any size limits, 0 - no global size limit, but respect the limit of repo, any other value - global size limit is set, but respect the limit set on repo individually. (After LFSSize and GitSize split update the code of size checking to exclude the LFS for now DmitryFrolovTri/gitea-limit-repo-size#18)
Remove any mention of the EnableSizeLimit parameter. (Remove any mention of the EnableSizeLimit parameter. DmitryFrolovTri/gitea-limit-repo-size#20)
Update code to use GitSize value from repo instead of Size. (Update code to use GitSize value from repo instead of Size. DmitryFrolovTri/gitea-limit-repo-size#21)
Optional: Update the repo size field in UI should be with human editable text (The Size limit field in the repository overview should use the FileSize and there should be no scroll for numbers DmitryFrolovTri/gitea-limit-repo-size#19)
Add ability to count LFS Size towards Repo Size for those who store LFS on the same disk as the repo itself.
Note: If functionality to control size is enabled and a push triggers a size check (it's size would breach the limit) then push will be accepted only if total size of not referenced objects (removed size) is over or equal to total size of newly added objects in push. Since git controls when not referenced objects are purged and it's not fast this condition could last for a while. Administrator of instance could speed it up via following steps:
<path_to_gitea_server_folder>/data/gitea-repositories/<user>/<repository>:This would free up all not referenced objects and update repository size in UI. On next push (if push size is smaller then limit) adding new objects will be allowed.
NEXT PR (4)
EVEN further PR (5)
Related: #3658 #7833