[staging-next] coreutils: Disable SEEK_HOLE due to corruption#143097
[staging-next] coreutils: Disable SEEK_HOLE due to corruption#143097vcunat merged 1 commit intoNixOS:staging-nextfrom
Conversation
vcunat
left a comment
There was a problem hiding this comment.
Nit: over longer term it would be good to add some explanation around the patch, but it's possible that we'll soon have some update thanks to upstream.
|
I will amend the commit with the bug id in the coreutils bugtracker once the bug is opened |
|
I'm happy to run a darwin build tonight to check if the issue is resolved. |
|
On chat I understood that @r-burns did that (bisecting coreutils and building |
9f833dd to
538fcee
Compare
|
@happysalada are you using zfs on darwin by any chance? |
To my knowledge we could also reproduce (consistently even) with tmpfs and ext4. Edit: On Linux |
|
Well, for now we need this PR as mitigation, regardless of whose bug it is or how exactly it happens. |
|
Not using zfs, just a regular darwin systemd with apfs. |
|
I was wondering why my build server was having is issues: I've disabled auto-optimise for now. But if I want to re-enable it I would have to destroy that pool |
There was a problem hiding this comment.
We can't easily undef SEEK_HOLE?
There was a problem hiding this comment.
Yes, we could instead undef/redef it somewhere under all includes. (The risk of includes inside the file should be low.)
There was a problem hiding this comment.
But why? It will lead to the exact same result but require me to redo the patch, amend the commit, and retest it for no benefit at all
538fcee to
da8fb80
Compare
See openzfs/zfs#11900 as an example. This only happens on Coreutils 9.0. Reported here: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=51433
da8fb80 to
924ccbf
Compare
That would be very surprising. A minimal reproducer would be exceptionally useful |
|
My guess is that coreutils is the victim of a kernel or filesystem bug here. If so, that bug really needs fixing as other programs use lseek+SEEK_DATA too. See my comment in GNU coreutils Bug#51433. |
|
Ok, jx and alacritty build fine on this PR, all is good with darwin! |
After further thought and investigation with the help of others I strongly believe this to be a misattribution. Identifying the root cause in the case of peertube was difficult because of the unclear error message and strange reproducibility. I believe we downloaded dependencies from peertube from the so-called binary cache (which likely has zfs-backed builders) one of which was esbuild the actual broken derivation. So this issue is pretty surely not actually on ext4/tmpfs. I'm sorry for causing confusion. |
|
@mohe2015 Thanks for all the info and testing. We'll focus on handling ZFS in upstream coreutils |
|
Apparently this PR broke one test on I reproduced the issue on the community box and also retried the two unprefixed builds and x86_64-linux builds, and all seems reproducible with failure only in this single combination. So far I have no idea why it happens and why |
|
Without looking at the code I assume that test would also need to be skipped like the other test. |
|
Currently testing around this issue but it takes a lot of time because my aarch64 hardware isn't the fastest |
|
Can look at the code right now but maybe it's reproducible even on a slow HDD on x86 |
|
It built for me on aarch64-linux. So it's probably just a flakey test. Should I disable the test or should we just restart the build on Hydra, @vcunat ? |
|
It seemed reproducible to me. BTW, I only have https://github.com/nix-community/aarch64-build-box#want-access |
|
Either way, the |
Motivation for this change
Things done
sandbox = trueset innix.conf? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"./result/bin/)