fix: eliminate log data loss caused by LXC & PTY buffer #1287

Merged
mfenniak merged 2 commits from mfenniak/forgejo-runner:experiment-nsenter into main 2026-01-20 21:39:59 +00:00
Owner

This PR is based upon #1286 which adds an environmental sanity check -- that test should pass as it's own PR first to ensure that this PR is not causing any behavioural regressions that could be detected by it.

Alternative fix for #1258. This PR replaces the use of lxc-attach with nsenter to get into the LXC container and execute workflow commands. nsenter does not have the short-write behaviour that was observed with lxc-attach where a failure to write stdout data to the pty would be ignored -- nsenter simply execvp or execl the child process and allows it to inherit its FDs.

The in-memory buffer which was intended to pull data out of the PTY buffer as fast as possible is removed in this PR, restoring normal backpressure on stdout.

  • bug fixes
    • PR: fix: eliminate log data loss caused by LXC & PTY buffer
This PR is based upon #1286 which adds an environmental sanity check -- that test should pass as it's own PR first to ensure that this PR is not causing any behavioural regressions that could be detected by it. Alternative fix for #1258. This PR replaces the use of `lxc-attach` with `nsenter` to get into the LXC container and execute workflow commands. `nsenter` does not have the short-write behaviour that was observed with `lxc-attach` where a failure to write stdout data to the pty would be ignored -- `nsenter` simply [`execvp`](https://github.com/util-linux/util-linux/blob/6766bbf17ad618a6a7d35d4cffe5134da2ffe077/sys-utils/nsenter.c#L853) or [`execl`](https://github.com/util-linux/util-linux/blob/6766bbf17ad618a6a7d35d4cffe5134da2ffe077/lib/exec_shell.c#L42C2-L42C7) the child process and allows it to inherit its FDs. The in-memory buffer which was intended to pull data out of the PTY buffer as fast as possible is removed in this PR, restoring normal backpressure on stdout. <!--start release-notes-assistant--> <!--URL:https://code.forgejo.org/forgejo/runner--> - bug fixes - [PR](https://code.forgejo.org/forgejo/runner/pulls/1287): <!--number 1287 --><!--line 0 --><!--description Zml4OiBlbGltaW5hdGUgbG9nIGRhdGEgbG9zcyBjYXVzZWQgYnkgTFhDICYgUFRZIGJ1ZmZlcg==-->fix: eliminate log data loss caused by LXC & PTY buffer<!--description--> <!--end release-notes-assistant-->
Author
Owner

I'm confident in this change with respect to the simpler behaviour of stdout, and therefore the integrity of the logs.

However, my confidence is less strong on "is nsenter doing the same things as lxc-attach to isolate the process", as that question seems to require a lot of knowledge to fully answer that I don't have. I've researched and built the #1286 test case around what I could discover, but hopefully a reviewer will have some confidence to say that it covers the right test subjects. 🙂

I'm confident in this change with respect to the simpler behaviour of stdout, and therefore the integrity of the logs. However, my confidence is less strong on "is `nsenter` doing the same things as `lxc-attach` to isolate the process", as that question seems to require a lot of knowledge to fully answer that I don't have. I've researched and built the #1286 test case around what I could discover, but hopefully a reviewer will have some confidence to say that it covers the right test subjects. 🙂
Member

I don't know anything about LXC. So I refrain from reviewing this PR. Nonetheless, thanks a lot for figuring this out. It's maddening when I try to fix CI problems and the logs are missing.

I don't know anything about LXC. So I refrain from reviewing this PR. Nonetheless, thanks a lot for figuring this out. It's maddening when I try to fix CI problems and the logs are missing.
mfenniak force-pushed experiment-nsenter from 1fc6171c62
Some checks failed
checks / validate mocks (pull_request) Successful in 44s
issue-labels / release-notes (pull_request_target) Successful in 5s
checks / validate pre-commit-hooks file (pull_request) Successful in 49s
checks / build and test (pull_request) Successful in 1m51s
checks / runner exec tests (pull_request) Successful in 29s
/ example-docker-compose (pull_request) Successful in 2m28s
Integration tests for the release process / release-simulation (pull_request) Successful in 4m43s
checks / runner integration tests (pull_request) Failing after 4m29s
/ example-lxc-systemd (pull_request) Successful in 6m57s
checks / integration tests (docker-latest) (pull_request) Successful in 8m47s
checks / integration tests (docker-stable) (pull_request) Successful in 10m55s
to 05f32612d8
Some checks failed
cascade / forgejo (pull_request_target) Has been skipped
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
issue-labels / release-notes (pull_request_target) Successful in 4s
checks / build and test (pull_request) Successful in 38s
/ example-docker-compose (pull_request) Successful in 2m38s
checks / validate mocks (pull_request) Successful in 32s
Integration tests for the release process / release-simulation (pull_request) Successful in 4m21s
checks / validate pre-commit-hooks file (pull_request) Successful in 37s
checks / runner exec tests (pull_request) Successful in 35s
/ example-lxc-systemd (pull_request) Successful in 6m49s
checks / runner integration tests (pull_request) Failing after 4m22s
checks / integration tests (docker-latest) (pull_request) Successful in 9m12s
checks / integration tests (docker-stable) (pull_request) Successful in 10m43s
2026-01-13 21:48:15 +00:00
Compare
mfenniak force-pushed experiment-nsenter from 05f32612d8
Some checks failed
cascade / forgejo (pull_request_target) Has been skipped
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
issue-labels / release-notes (pull_request_target) Successful in 4s
checks / build and test (pull_request) Successful in 38s
/ example-docker-compose (pull_request) Successful in 2m38s
checks / validate mocks (pull_request) Successful in 32s
Integration tests for the release process / release-simulation (pull_request) Successful in 4m21s
checks / validate pre-commit-hooks file (pull_request) Successful in 37s
checks / runner exec tests (pull_request) Successful in 35s
/ example-lxc-systemd (pull_request) Successful in 6m49s
checks / runner integration tests (pull_request) Failing after 4m22s
checks / integration tests (docker-latest) (pull_request) Successful in 9m12s
checks / integration tests (docker-stable) (pull_request) Successful in 10m43s
to db49823df7
All checks were successful
cascade / forgejo (pull_request_target) Has been skipped
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
checks / build and test (pull_request) Successful in 44s
checks / validate mocks (pull_request) Successful in 28s
checks / validate pre-commit-hooks file (pull_request) Successful in 35s
checks / runner exec tests (pull_request) Successful in 33s
/ example-lxc-systemd (pull_request) Successful in 6m53s
checks / integration tests (docker-latest) (pull_request) Successful in 10m17s
checks / runner integration tests (pull_request) Successful in 4m58s
checks / integration tests (docker-stable) (pull_request) Successful in 11m39s
issue-labels / release-notes (pull_request_target) Successful in 4s
/ example-docker-compose (pull_request) Successful in 1m55s
Integration tests for the release process / release-simulation (pull_request) Successful in 5m44s
2026-01-13 22:00:46 +00:00
Compare
mfenniak force-pushed experiment-nsenter from db49823df7
All checks were successful
cascade / forgejo (pull_request_target) Has been skipped
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
checks / build and test (pull_request) Successful in 44s
checks / validate mocks (pull_request) Successful in 28s
checks / validate pre-commit-hooks file (pull_request) Successful in 35s
checks / runner exec tests (pull_request) Successful in 33s
/ example-lxc-systemd (pull_request) Successful in 6m53s
checks / integration tests (docker-latest) (pull_request) Successful in 10m17s
checks / runner integration tests (pull_request) Successful in 4m58s
checks / integration tests (docker-stable) (pull_request) Successful in 11m39s
issue-labels / release-notes (pull_request_target) Successful in 4s
/ example-docker-compose (pull_request) Successful in 1m55s
Integration tests for the release process / release-simulation (pull_request) Successful in 5m44s
to 20e559cccd
All checks were successful
cascade / forgejo (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
cascade / debug (pull_request_target) Has been skipped
issue-labels / release-notes (pull_request_target) Successful in 8s
checks / Build Forgejo Runner (pull_request) Successful in 50s
checks / validate mocks (pull_request) Successful in 53s
checks / validate pre-commit-hooks file (pull_request) Successful in 41s
checks / runner exec tests (pull_request) Successful in 52s
checks / integration tests (docker-latest) (pull_request) Successful in 16m36s
checks / integration tests (docker-stable) (pull_request) Successful in 19m25s
2026-01-13 22:40:55 +00:00
Compare
mfenniak force-pushed experiment-nsenter from 20e559cccd
All checks were successful
cascade / forgejo (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Has been skipped
cascade / debug (pull_request_target) Has been skipped
issue-labels / release-notes (pull_request_target) Successful in 8s
checks / Build Forgejo Runner (pull_request) Successful in 50s
checks / validate mocks (pull_request) Successful in 53s
checks / validate pre-commit-hooks file (pull_request) Successful in 41s
checks / runner exec tests (pull_request) Successful in 52s
checks / integration tests (docker-latest) (pull_request) Successful in 16m36s
checks / integration tests (docker-stable) (pull_request) Successful in 19m25s
to 771f35dc32
All checks were successful
checks / Build Forgejo Runner (pull_request) Successful in 23s
checks / validate mocks (pull_request) Successful in 22s
checks / validate pre-commit-hooks file (pull_request) Successful in 35s
checks / runner exec tests (pull_request) Successful in 33s
checks / integration tests (docker-latest) (pull_request) Successful in 14m53s
checks / integration tests (docker-stable) (pull_request) Successful in 17m3s
issue-labels / release-notes (pull_request_target) Successful in 7s
cascade / debug (pull_request_target) Has been skipped
cascade / end-to-end (pull_request_target) Successful in 7s
cascade / forgejo (pull_request_target) Successful in 23s
2026-01-14 02:43:56 +00:00
Compare
Contributor

cascading-pr updated at actions/setup-forgejo#829

cascading-pr updated at https://code.forgejo.org/actions/setup-forgejo/pulls/829
Member

As nobody with LXC expertise has shown up so far: the implementation looks good, the previously added test looks good, too. So I'm willing to hit that "Approve" button. If it turns out that there's a problem, we can still revert.

In any case, it might make sense to report our problem to the LXC project. They might know a better workaround, greenlight our workaround, and perhaps even enhance lxc-attach.

As nobody with LXC expertise has shown up so far: the implementation looks good, the previously added test looks good, too. So I'm willing to hit that "Approve" button. If it turns out that there's a problem, we can still revert. In any case, it might make sense to report our problem to the LXC project. They might know a better workaround, greenlight our workaround, and perhaps even enhance `lxc-attach`.
Author
Owner

I've expanded my scope looking for help from the Forgejo Dev chat, to Forgejo Chat, and now to the LXC forum (https://discuss.linuxcontainers.org/t/fixing-forgejo-runners-lxc-logging/25918). I'll let this sit for another day to see if these efforts provide any feedback.

I've expanded my scope looking for help from the Forgejo Dev chat, to Forgejo Chat, and now to the LXC forum (https://discuss.linuxcontainers.org/t/fixing-forgejo-runners-lxc-logging/25918). I'll let this sit for another day to see if these efforts provide any feedback.
Author
Owner

@aahlenst wrote in #1287 (comment):

As nobody with LXC expertise has shown up so far: the implementation looks good, the previously added test looks good, too. So I'm willing to hit that "Approve" button. If it turns out that there's a problem, we can still revert.

I'll take you up on that offer if you're still good with it. Upstream discussion with LXC (link in last comment) is promising in terms of collaborating on a fix, but in the near term I think this change will help close the logging gap.

@aahlenst wrote in https://code.forgejo.org/forgejo/runner/pulls/1287#issuecomment-73767: > As nobody with LXC expertise has shown up so far: the implementation looks good, the previously added test looks good, too. So I'm willing to hit that "Approve" button. If it turns out that there's a problem, we can still revert. I'll take you up on that offer if you're still good with it. Upstream discussion with LXC (link in last comment) is promising in terms of collaborating on a fix, but in the near term I think this change will help close the logging gap.
aahlenst approved these changes 2026-01-20 11:31:16 +00:00
Member

@mfenniak wrote in #1287 (comment):

Upstream discussion with LXC (link in last comment) is promising in terms of collaborating on a fix.

I've been following it and it's great to see because both projects will be better off afterwards 🤞

@mfenniak wrote in https://code.forgejo.org/forgejo/runner/pulls/1287#issuecomment-74426: > Upstream discussion with LXC (link in last comment) is promising in terms of collaborating on a fix. I've been following it and it's great to see because both projects will be better off afterwards 🤞
mfenniak deleted branch experiment-nsenter 2026-01-20 21:40:00 +00:00
First-time contributor

Dear friends,

I'm developer from LXC project :)

Mathieu asked me to look at this PR and leave some comment.

This particular PR looks good if it solves the problem, but please, don't keep this as a permanent solution (because it is really easy to mess up with the containers and the way you enter them from a security standpoint and not only) and make sure that you switch back to lxc-attach and help us to validate next LXC release to ensure that the problem went away.

PR with the fix https://github.com/lxc/lxc/pull/4633

Kind regards,
Alex

Dear friends, I'm developer from LXC project :) Mathieu [asked me](https://discuss.linuxcontainers.org/t/fixing-forgejo-runners-lxc-logging/25918/4?u=amikhalitsyn) to look at this PR and leave some comment. This particular PR looks good if it solves the problem, but please, don't keep this as a permanent solution (because it is really easy to mess up with the containers and the way you enter them from a security standpoint and not only) and make sure that you switch back to `lxc-attach` and help us to validate next LXC release to ensure that the problem went away. PR with the fix https://github.com/lxc/lxc/pull/4633 Kind regards, Alex
Author
Owner

@mihalicyn Thank you for your time taking a look at this! 🙂 It is definitely intended as a temporary solution, and I'll be happy to help validate the upstream fix.

@mihalicyn Thank you for your time taking a look at this! 🙂 It is definitely intended as a temporary solution, and I'll be happy to help validate the upstream fix.
Owner

@mihalicyn What would be the best option to get the fixed lxc version into our lxc templates? it shouldn't need too much additional time

@mihalicyn What would be the best option to get the fixed lxc version into our lxc templates? it shouldn't need too much additional time
Owner

https://code.forgejo.org/forgejo/lxc-helpers

we use these lxc helper scripts to create the container and templates. we need to support bookworm and trixie, older releases could be dropped if needed

https://code.forgejo.org/forgejo/lxc-helpers we use these lxc helper scripts to create the container and templates. we need to support bookworm and trixie, older releases could be dropped if needed
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo/runner!1287
No description provided.