Skip to content

stdenv substituteAll: fix handling of variables#1

Closed
vcunat wants to merge 2 commits intoProfpatsch:setup-bash-namesfrom
vcunat:p/14907
Closed

stdenv substituteAll: fix handling of variables#1
vcunat wants to merge 2 commits intoProfpatsch:setup-bash-namesfrom
vcunat:p/14907

Conversation

@vcunat
Copy link

@vcunat vcunat commented Apr 23, 2016

The problem was uncovered by the grandparent commit. env doesn't escape newlines in contents of variables, and we use those e.g. for phases, so we would consider lots of crap. Fortunately set command behaves better in this respect.

Profpatsch and others added 2 commits April 23, 2016 18:03
The filtering of environment variables that start with an uppercase
letter is documented in the manual.
The problem was uncovered by the grandparent commit.
`env` doesn't escape newlines in contents of variables,
and we use those e.g. for phases, so we would consider
lots of crap. Fortunately `set` command behaves better
in this respect.
@@ -445,7 +445,7 @@ substituteAll() {

# Select all environment variables that start with a lowercase character.
# Will not work with nix attribute names (and thus env variables) containing '\n'.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was aware of that, see this comment. But in this case the \n was probably in the value, not the variable name. Wasn’t aware that set does something similar, though. It’s all a big mountain of 💩. @aszlig mentioned something about the regex being flaky as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the comment... and that it only referred to names. I don't know of any occurrence with a newline in variable name, but we do have lots of newlines in values, as we pass phases as variables.

Personally, I find it rather hard to program fully correctly in bash. Everything is just text even if it means something else, and unwanted expansions get into my way all the time. (Still, on typical inputs the scripts work somehow.)

Profpatsch pushed a commit that referenced this pull request Oct 14, 2016
@Profpatsch Profpatsch closed this Apr 11, 2017
@vcunat vcunat deleted the p/14907 branch July 16, 2017 09:41
Profpatsch pushed a commit that referenced this pull request Jan 3, 2018
kde-applications: 17.08.3 -> 17.12.0
Profpatsch pushed a commit that referenced this pull request Feb 19, 2018
nixos/k8s: Enable Node authorizer and NodeRestriction by default
Profpatsch pushed a commit that referenced this pull request May 28, 2018
This fix is required for the raspherry pi 3 with glibc 2.27,
otherwise the kernel panics in initrd with:
```
<<< NixOS Stage 1 >>>

loading module dm_mod...
running udev...
kbd_mode: KDSKBMODE: Inappropriate ioctl for device
Gstarting device mapper and LVM...
[    1.969164] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
[    1.969164]
[    1.978476] CPU: 0 PID: 1 Comm: init Not tainted 4.16.8 #1-NixOS
[    1.984580] Hardware name: Raspberry Pi 3 Model B (DT)
[    1.989801] Call trace:
[    1.992301]  dump_backtrace+0x0/0x1c8
[    1.996025]  show_stack+0x24/0x30
[    1.999396]  dump_stack+0x9c/0xc0
[    2.002766]  panic+0x124/0x294
[    2.005872]  complete_and_exit+0x0/0x30
[    2.009771]  do_group_exit+0x40/0xa8
[    2.013406]  get_signal+0x280/0x5b0
[    2.016954]  do_signal+0x88/0x240
[    2.020325]  do_notify_resume+0xd8/0x130
[    2.024311]  work_pending+0x8/0x10
[    2.027774] SMP: stopping secondary CPUs
[    2.031763] Kernel Offset: disabled
[    2.035308] CPU features: 0x0802004
[    2.038850] Memory Limit: none
[    2.041963] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x0000000b
[    2.041963]
[    2.865264] random: crng init done
```
Suse has done the same to circumvent crashes with hostname resolving in
glibc 2.27 on aarch64.
Profpatsch pushed a commit that referenced this pull request Jun 23, 2018
Profpatsch pushed a commit that referenced this pull request Jul 22, 2018
This adds some initial values for .dir-locals.el. Mainly this is
useful for using bug-reference-mode.

So if you have bug-reference-mode enabled -

> M-x bug-reference-mode

You will see as clickable text like this:

  Fixes NixOS#15

  (NixOS#12)

  Closed NixOS#1252

  issue #1
Profpatsch pushed a commit that referenced this pull request Aug 28, 2018
Profpatsch pushed a commit that referenced this pull request Nov 29, 2019
Profpatsch pushed a commit that referenced this pull request Nov 29, 2019
roundcube: tests - add space to db password, check setup script worked
Profpatsch pushed a commit that referenced this pull request Mar 11, 2022
This effectively fixes the majority of all VM tests which were broken
because `/dev/vda` (or any other block device) wasn't mountable:

      machine # mounting /dev/vda on /...
      machine # mount: mounting /dev/vda on /mnt-root/ failed: No such device[    2.820976] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
      machine # [    2.821757] CPU: 0 PID: 1 Comm: init Not tainted 5.10.72 #1-NixOS
      machine # [    2.821757] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      machine # [    2.821757] Call Trace:
      machine # [    2.821757]  dump_stack+0x6b/0x83
      machine # [    2.821757]  panic+0x101/0x2c8
      machine # [    2.821757]  do_exit.cold+0x14/0xb3
      machine # [    2.821757]  do_group_exit+0x33/0xa0
      machine # [    2.821757]  __x64_sys_exit_group+0x14/0x20
      machine # [    2.821757]  do_syscall_64+0x33/0x40
      machine # [    2.821757]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      machine # [    2.821757] RIP: 0033:0x7f67ec2800f6
      machine # [    2.821757] Code: 00 4c 8b 0d 2c 5d 11 00 eb 19 66 2e 0f 1f 84 00 00 00 00 00 89 d7 89 f0 0f 05 48 3d 00 f0 ff ff 77 22 f4 89 d7 44 89 c0 0f 05 <48> 3d 00 f0 ff ff 76 e2 f7 d8 64 41 89 01 eb da 66 2e 0f 1f 84 00
      machine # [    2.821757] RSP: 002b:00007fff8f5a71d8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
      machine # [    2.821757] RAX: ffffffffffffffda RBX: 0000000000699704 RCX: 00007f67ec2800f6
      machine # [    2.821757] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
      machine # [    2.821757] RBP: 0000000000000004 R08: 00000000000000e7 R09: ffffffffffffff80
      machine # [    2.821757] R10: 00007f67ec33f3e0 R11: 0000000000000202 R12: 000000000000000b
      machine # [    2.821757] R13: 00007fff8f5a75a8 R14: 0000000000000000 R15: 00000000004fc198
      machine # [    2.821757] Kernel Offset: 0x31e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      machine # [    2.821757] Rebooting in 1 seconds..

This happened because the kernel failed to load modules such as `ext4`
from `boot.initrd.availableKernelModules`[1] on e.g. a `mount(2)` syscall.

The problem is that `kmod` isn't linked against `libpthread.so.0`
anymore because it got merged into `libc.so.6` (however, the .so still
exists), but still needs it:

      machine # newfstatat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/x86_64", 0x7ffd951114c0, 0) = -1 ENOENT (No such file or directory)
      machine # openat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
      machine # newfstatat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/x86_64", 0x7ffd951114c0, 0) = -1 ENOENT (No such file or directory)
      machine # openat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
      machine # newfstatat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib", 0x7ffd951114c0, 0) = -1 ENOENT (No such file or directory)
      machine # openat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
      machine # writev(2, [{iov_base="/nix/store/kdc9n48ksdc1a8y8w512w"..., iov_len=69}, {iov_base=": ", iov_len=2}, {iov_base="error while loading shared libra"..., iov_len=36}, {iov_base=": ", iov_len=2}, {iov_base="libpthread.so.0", iov_len=15}, {iov_base=": ", iov_len=2}, {iov_base="cy
      machine # ) = 184
      machine # exit_group(127)                         = ?
      machine # +++ exited with 127 +++
      machine # mount: mounting /dev/vda on /mnt-root/ failed: No such device
      machine # [   19.167180] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
      machine # [   19.167711] CPU: 0 PID: 1 Comm: init Not tainted 5.10.72 #1-NixOS

This is not a problem

* inside stage-1 because `LD_LIBRARY_PATH` points to `$out/lib` of
  extra-utils where `libpthread.so.6` also exists.
* on a running system because `${pkgs.glibc}/lib` is part of kmod's
  rpath.

However this is a problem inside the kernel which calls `modprobe` (in
our case `kmod`) to load modules and doesn't know about
`LD_LIBRARY_PATH`. Also, the rpath-reference was nuked.

To work around this, the kernel's `modprobe`
(i.e. `/proc/sys/kernel/modprobe`) now points to a wrapper which
explicitly declares `LD_LIBRARY_PATH`. We can't use `makeWrapper` here
because `modprobe` itself must not be renamed. Otherwise, `kmod` (which
is the link-target of `modprobe`) won't work because it expects
`argv[0] == "modprobe"` to perform modprobe's tasks.

[1] https://nixos.org/manual/nixos/stable/options.html#opt-boot.initrd.availableKernelModules
Profpatsch pushed a commit that referenced this pull request Jan 7, 2025
nixosTests.cryptpad started failing recently.

Investigating the issue shows that seccomp has become problematic during
the init phase, (e.g. this can be reproduced by removing the customize
directory in /var/lib/cryptpad):

machine # [   10.774365] systemd-coredump[864]: Process 756 (node) of user 65513 dumped core.
machine #
machine # Module libgcc_s.so.1 without build-id.
machine # Module libstdc++.so.6 without build-id.
machine # Module libicudata.so.74 without build-id.
machine # Module libicuuc.so.74 without build-id.
machine # Module libicui18n.so.74 without build-id.
machine # Module libz.so.1 without build-id.
machine # Module node without build-id.
machine # Stack trace of thread 756:
machine # #0  0x00007ff951974dcb fchown (libc.so.6 + 0x107dcb)
machine # #1  0x00007ff95490d0c0 uv__fs_copyfile (libuv.so.1 + 0x150c0)
machine # #2  0x00007ff95490d89a uv__fs_work (libuv.so.1 + 0x1589a)
machine # #3  0x00007ff954910c76 uv_fs_copyfile (libuv.so.1 + 0x18c76)
machine # NixOS#4  0x0000000000eb8a39 _ZN4node2fsL8CopyFileERKN2v820FunctionCallbackInfoINS1_5ValueEEE (node + 0xab8a39)
machine # NixOS#5  0x0000000001cda5e2 Builtins_CallApiCallbackGeneric (node + 0x18da5e2)
[...]
machine # [   10.877468] cryptpad[685]: /nix/store/h4yhhxpfm03c5rgz91q7jrvknh596ly2-cryptpad-2024.12.0/bin/cryptpad: line 3:   756 Bad system call         (core dumped) "/nix/store/fkyp1bm5gll9adnfcj92snyym524mdrj-nodejs-22.11.0/bin/node" "/nix/store/h4yhhxpfm03c5rgz91q7jrvknh596ly2-cryptpad-2024.12.0/lib/node_modules/cryptpad/scripts/build.js"

nodejs 20.18 rightly did not require chown when the source and
destination are the same owner (heck, the script does not run as
root so even if it is not blocked there is no way it'd work with a
different owner...)

For now just allow chown calls again, this is not worth wasting more
time.

Fixes NixOS#370717
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants