fix: eliminate log data loss caused by LXC & PTY buffer #1287
No reviewers
Labels
No labels
FreeBSD
Kind/Breaking
Kind/Bug
Kind/Chore
Kind/DependencyUpdate
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
Windows
linux-powerpc64le
linux-riscv64
linux-s390x
run-end-to-end-tests
run-forgejo-tests
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo/runner!1287
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "mfenniak/forgejo-runner:experiment-nsenter"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR is based upon #1286 which adds an environmental sanity check -- that test should pass as it's own PR first to ensure that this PR is not causing any behavioural regressions that could be detected by it.
Alternative fix for #1258. This PR replaces the use of
lxc-attachwithnsenterto get into the LXC container and execute workflow commands.nsenterdoes not have the short-write behaviour that was observed withlxc-attachwhere a failure to write stdout data to the pty would be ignored --nsentersimplyexecvporexeclthe child process and allows it to inherit its FDs.The in-memory buffer which was intended to pull data out of the PTY buffer as fast as possible is removed in this PR, restoring normal backpressure on stdout.
I'm confident in this change with respect to the simpler behaviour of stdout, and therefore the integrity of the logs.
However, my confidence is less strong on "is
nsenterdoing the same things aslxc-attachto isolate the process", as that question seems to require a lot of knowledge to fully answer that I don't have. I've researched and built the #1286 test case around what I could discover, but hopefully a reviewer will have some confidence to say that it covers the right test subjects. 🙂I don't know anything about LXC. So I refrain from reviewing this PR. Nonetheless, thanks a lot for figuring this out. It's maddening when I try to fix CI problems and the logs are missing.
1fc6171c6205f32612d805f32612d8db49823df7db49823df720e559cccd20e559cccd771f35dc32cascading-pr updated at actions/setup-forgejo#829
As nobody with LXC expertise has shown up so far: the implementation looks good, the previously added test looks good, too. So I'm willing to hit that "Approve" button. If it turns out that there's a problem, we can still revert.
In any case, it might make sense to report our problem to the LXC project. They might know a better workaround, greenlight our workaround, and perhaps even enhance
lxc-attach.I've expanded my scope looking for help from the Forgejo Dev chat, to Forgejo Chat, and now to the LXC forum (https://discuss.linuxcontainers.org/t/fixing-forgejo-runners-lxc-logging/25918). I'll let this sit for another day to see if these efforts provide any feedback.
@aahlenst wrote in #1287 (comment):
I'll take you up on that offer if you're still good with it. Upstream discussion with LXC (link in last comment) is promising in terms of collaborating on a fix, but in the near term I think this change will help close the logging gap.
@mfenniak wrote in #1287 (comment):
I've been following it and it's great to see because both projects will be better off afterwards 🤞
Dear friends,
I'm developer from LXC project :)
Mathieu asked me to look at this PR and leave some comment.
This particular PR looks good if it solves the problem, but please, don't keep this as a permanent solution (because it is really easy to mess up with the containers and the way you enter them from a security standpoint and not only) and make sure that you switch back to
lxc-attachand help us to validate next LXC release to ensure that the problem went away.PR with the fix https://github.com/lxc/lxc/pull/4633
Kind regards,
Alex
@mihalicyn Thank you for your time taking a look at this! 🙂 It is definitely intended as a temporary solution, and I'll be happy to help validate the upstream fix.
mfenniak referenced this pull request2026-01-22 17:27:24 +00:00
@mihalicyn What would be the best option to get the fixed lxc version into our lxc templates? it shouldn't need too much additional time
https://code.forgejo.org/forgejo/lxc-helpers
we use these lxc helper scripts to create the container and templates. we need to support bookworm and trixie, older releases could be dropped if needed
PATHmodifications are lost in LXC executions #1326