Skip to content

lxc-attach: fix data corruption during heavy IO on PTS#4633

Merged
stgraber merged 2 commits intolxc:mainfrom
mihalicyn:fix_terminal_data_corruption
Jan 22, 2026
Merged

lxc-attach: fix data corruption during heavy IO on PTS#4633
stgraber merged 2 commits intolxc:mainfrom
mihalicyn:fix_terminal_data_corruption

Conversation

@mihalicyn
Copy link
Member

@mihalicyn mihalicyn commented Jan 22, 2026

Reworked version of #4597 with testcase.

Authorship of @DreamConnected is preserved, of course.

Fixes: #4546

…o handlers

Previously, lxc_write_nointr could return without writing all data
when write() returned EAGAIN/EWOULDBLOCK due to buffer full conditions.

This change:
- Implements a loop to continue writing until all data is sent
- Handles EINTR, EAGAIN, and EWOULDBLOCK errors appropriately
- Uses poll() to wait for fd to become ready when blocked
- Maintains backward compatibility while fixing partial write issues

Signed-off-by: DreamConnected <[email protected]>
[ alex ]
- introduce a separate helper lxc_write_all and use it only in ptx/peer
  io handlers
- cleanup the code a bit
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
@mihalicyn mihalicyn force-pushed the fix_terminal_data_corruption branch from 13fb6c6 to b4a57a1 Compare January 22, 2026 12:37
@DreamConnected
Copy link
Contributor

It seems that the cmp has detected that the files are different and there are no short write in the log yet.

@mihalicyn mihalicyn force-pushed the fix_terminal_data_corruption branch 2 times, most recently from c7e77d3 to cd80055 Compare January 22, 2026 13:15
@mihalicyn mihalicyn force-pushed the fix_terminal_data_corruption branch from cd80055 to 97bd769 Compare January 22, 2026 13:50
@mihalicyn
Copy link
Member Author

It seems that the cmp has detected that the files are different and there are no short write in the log yet.

yep, the reason was that depending on at which point of time dd if=/dev/urandom bs=$BS count=1 status=none | hexdump | tee large-data.txt command reaches busybox, it will change (or not change) current directory and file can land to /root or to /. Solution is to use an absolute path /root/large-data.txt.

@mihalicyn
Copy link
Member Author

Okay, now everything looks green :)

@mihalicyn
Copy link
Member Author

mihalicyn commented Jan 22, 2026

Just for the history. In a new test case I mentioned a bug in Busybox this is a reproducer (of course, not involving LXC anyhow):

#!/bin/sh
set -e

BS=1000000

out=$(mktemp /tmp/out_XXXX)
( sleep 0.25; echo "echo DATASTART ; dd if=/dev/urandom bs=$BS count=1 status=none | hexdump | tee large-data.txt ; echo DATAEND"; ) | \
	script -q -e -c "busybox sh" | \
	sed -n '/DATASTART/,/DATAEND/{/DATASTART/d;/DATAEND/d;s/[\r\n]*$//;p}' > $out


[ $(stat -c%s $out) -gt $BS ] || ( echo FAILSIZE && false )
cmp -s large-data.txt $out || (  echo FAILDATA && false )
rm -f large-data.txt $out

this will stuck forever. If we add sleep 1 after echo "echo DATASTART ; dd if=/dev/urandom bs=$BS count=1 status=none | hexdump | tee large-data.txt ; echo DATAEND"; it brings this back to life. It is something I can look into once time permits.

@stgraber stgraber merged commit ad528f6 into lxc:main Jan 22, 2026
18 checks passed
Copy link

@viceice viceice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you see a possibility to get this fix into current debian stable?

@stgraber
Copy link
Member

That'd be one for @gibmat as he's the packager for Debian.
On our end, this will make it into LXC 6.0.6 when we publish that bugfix release over the coming few weeks, but for Debian stable, that'd then need to be cherry-picked in their package.

@gibmat
Copy link
Member

gibmat commented Jan 22, 2026

do you see a possibility to get this fix into current debian stable?

Staged at https://salsa.debian.org/lxc-team/lxc/-/commit/4615978f082ebe26dc4908f1bd235d48a341ba83 for the 13.4 stable point release, planned for March 14.

arthurzam pushed a commit to gentoo-golang-dist/forgejo-runner that referenced this pull request Jan 28, 2026
Replacing `lxc-attach` with `nsenter` continues to cause problems, currently #1341, but previously #1326 and #1336.

As we have an upstream fix in lxc/lxc#4633, I don't think I can justify any further effort in fixing the `nsenter` behaviour.  This PR reverts the change and restores the 100 MiB in-memory buffer -- that buffer can & should be removed when 4633 is broadly available.

All tests added in #1326 and #1336 are retained.

<!--start release-notes-assistant-->
<!--URL:https://code.forgejo.org/forgejo/runner-->
- bug fixes
  - [PR](https://code.forgejo.org/forgejo/runner/pulls/1317): <!--number 1317 --><!--line 0 --><!--description Zml4OiByZXZlcnQgc3dpdGNoIGZyb20gbHhjLWF0dGFjaCB0byBuc2VudGVy-->fix: revert switch from lxc-attach to nsenter<!--description-->
<!--end release-notes-assistant-->

Reviewed-on: https://code.forgejo.org/forgejo/runner/pulls/1317
Reviewed-by: Andreas Ahlenstorf <[email protected]>
Co-authored-by: Mathieu Fenniak <[email protected]>
Co-committed-by: Mathieu Fenniak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

lxc-attach drops output with lots of console output

6 participants