Skip to content

separateDebugInfo: allow any reasonable length of build ID#146275

Closed
oxalica wants to merge 1 commit intoNixOS:stagingfrom
oxalica:fix/separate-debug
Closed

separateDebugInfo: allow any reasonable length of build ID#146275
oxalica wants to merge 1 commit intoNixOS:stagingfrom
oxalica:fix/separate-debug

Conversation

@oxalica
Copy link
Contributor

@oxalica oxalica commented Nov 16, 2021

Motivation for this change

LLVM stdenv produces 16byte build ID, which is different than GCC's 40byte one. They both work well in gdb. Currently we create a directory layer for first 2byte, so any length greater or equal than 3 should be fine.

I'm working on keeping debug info for firefox, and ran into this issue. This PR should unblock that attempt.

Tested: glibc built and produced debug output properly.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 21.11 Release Notes (or backporting 21.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

LLVM stdenv produces 16byte build ID, which is different than GCC's
40byte one. They both work well in gdb. Currently we create a directory
layer for first 2byte, so any length greater or equal than 3 should be
fine.
@oxalica oxalica requested a review from Ericson2314 as a code owner November 16, 2021 17:02
@ofborg ofborg bot added 10.rebuild-linux-stdenv This PR causes stdenv to rebuild on Linux and must target a staging branch. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 5001+ This PR causes many rebuilds on Linux and must target the staging branches. labels Nov 16, 2021
@mweinelt
Copy link
Member

@edolstra @Ericson2314 Can one of you review this change? We're carrying a large-ish workaround in firefox to work around this issue.

Copy link
Member

@vcunat vcunat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this syntax right? My bash (ran by hand) does not agree.

bash: 3: No such file or directory

With using -le this sounds OK. It's unclear to me why originally the condition was written this way (the commit ec5b66e does not indicate), but perhaps there really wasn't any reason beyond what's mentioned here.

@edolstra
Copy link
Member

I don't think the rest of our infrastructure (index-debuginfo.cc, dwarffs) handles non-standard build IDs, so this change is not enough to support those.

@edolstra
Copy link
Member

edolstra commented Mar 21, 2022

BTW, 64 bits seems rather insufficient as a unique identifier... I wonder why LLVM went with such short IDs?

Given the birthday paradox, this means we'll only need a few billion binaries (hardly unimaginable given the size of the binary cache) until we can expect collisions.

@vcunat
Copy link
Member

vcunat commented Mar 21, 2022

I don't expect that it's so bad. Existence of a collision shouldn't cause issues by itself, if the code is written reasonably. Probability of a collision for a hash that you need should still be fairly low, I think.

@edolstra
Copy link
Member

Existence of a collision shouldn't cause issues by itself, if the code is written reasonably.

The hash is used as a key to look up debug info, so a collision means that we get the wrong debug info.

@vcunat
Copy link
Member

vcunat commented Mar 21, 2022

Yes, of course. But existence of a collision in a huge repo only breaks info on (some of) the colliding files, nowhere else. There will still be very very few of the collisions.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Sep 21, 2022
# Extract the Build ID. FIXME: there's probably a cleaner way.
local id="$($READELF -n "$i" | sed 's/.*Build ID: \([0-9a-f]*\).*/\1/; t; d')"
if [ "${#id}" != 40 ]; then
if [ "${#id}" < 3 ]; then
Copy link
Member

@SuperSandro2000 SuperSandro2000 Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if [ "${#id}" < 3 ]; then
if [ "${#id}" -lt 16 ]; then

if llvm produces 16 bytes then we should use that as a lower limit. Otherwise collisions will happen all the time.

Copy link
Contributor

@trofi trofi Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

< is a lexicographical compare. You probably want -lt for numeric comparison: if [ "${#id}" -lt 3 ]; then

@stale stale bot removed 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md labels Feb 22, 2023
@symphorien
Copy link
Member

I'd like to test this, how can I get an affected llvm stdenv ? I tried compiling sl with llvmPackages.stdenv but it still has long build ids, so presumably it's still using a partly gnu toolchain.

@rrbutani
Copy link
Contributor

rrbutani commented Feb 25, 2023

Perhaps not a solution but: you can get lld to produce longer build-ids with --build-id=md5, --build-id=uuid, --build-id=sha1:

[nix-shell:~]$ nix-shell --expr "with (import <nixpkgs> {}); (mkShell.override { stdenv = pkgsLLVM.stdenv; }) {}" --run '$CC -Wl,--build-id=fast -xc - <<<"int main() { }" -o test; $READELF -n test | grep Build'
    Build ID: 8903a1c09635bcb6

[nix-shell:~]$ nix-shell --expr "with (import <nixpkgs> {}); (mkShell.override { stdenv = pkgsLLVM.stdenv; }) {}" --run '$CC -Wl,--build-id=md5 -xc - <<<"int main() { }" -o test; $READELF -n test | grep Build'
    Build ID: f10ec65cd2789197314b9025851c6b18

[nix-shell:~]$ nix-shell --expr "with (import <nixpkgs> {}); (mkShell.override { stdenv = pkgsLLVM.stdenv; }) {}" --run '$CC -Wl,--build-id=sha1 -xc - <<<"int main() { }" -o test; $READELF -n test | grep Build'
    Build ID: f0793470c0198f9f167186a1a0e15b7c60cb2e68

[nix-shell:~]$ nix-shell --expr "with (import <nixpkgs> {}); (mkShell.override { stdenv = pkgsLLVM.stdenv; }) {}" --run '$CC -Wl,--build-id=uuid -xc - <<<"int main() { }" -o test; $READELF -n test | grep Build'
    Build ID: 8a00108ec7d22857a47b23eba1f481ee

(I haven't done any benchmarking but I believe fast uses xxhash so there's presumably some perf cost to using sha1, md5 instead of it; uuid is just random so we probably don't want to use it for the sake of reproducibility)


Maybe we can change this:

# Only add --build-id if this is a final link. FIXME: should build gcc
# with --enable-linker-build-id instead?
if [ "$NIX_SET_BUILD_ID_@suffixSalt@" = 1 ] && ! (( "$relocatable" )); then
extraAfter+=(--build-id)
fi

to use --build-id=sha1 (seems to be what GNU ld defaults to; it also accepts --build-id=sha1)


lld and GNU ld also let you specify an ID to use manually with --build-id=0x..... (with the requirement that you have an even number of digits), maybe we could use the derivation's out base hash here? Downside is that all binaries in a derivation will have the same build-id which seems unacceptable.

@symphorien
Copy link
Member

(thanks to @rrbutani I could obtain such an executable with short build id as follows:

with import ./. { };
(sl.override { stdenv = pkgsLLVM.stdenv; }).overrideAttrs (old:
{
  separateDebugInfo = true;
  nativeBuildInputs = (old.nativeBuildInputs or [ ]) ++ [ pkgs/build-support/setup-hooks/separate-debug-info.sh ];
})

)

About tooling: nixseparatedebuginfod can handle such short buildids.

About collisions: I agree with @vcunat that even if hydra may one day contain collisions, the probability that the debuginfo you need is affected is quite low. To give concrete numbers I have currently 18618 distinct build ids on my system, and if I'm doing the math right, the probability that I'm affected if I asked nixseparatedebuginfod to download the debuginfo for them all is 1-product(1-k/(2^64), k, 1, 18618) so 9.39592755809e-12. This is low enough for me. For this reason we can accept such short build ids.

But it would be nice if we can have lld produce the same length of build ids as ld as @rrbutani suggests is possible.

Concretely I am in favor of this change, modulo the bash syntax for comparison, and the fact that we still disable separatedebuginfo for the llvm stdenv (

separateDebugInfo' = separateDebugInfo && stdenv.hostPlatform.isLinux && !(stdenv.hostPlatform.useLLVM or false);
, discovered by @rrbutani ) so except for the case of firefox which calls clang itself we kind of never get those buildids.

We can change the kind of buildids lld produces later if deemed desirable.

@rrbutani rrbutani mentioned this pull request Feb 25, 2023
12 tasks
@oxalica
Copy link
Contributor Author

oxalica commented Feb 25, 2023

I personally prefer the alternative approach #218301 to make ld.ldd produce SHA1. This fits more with our existing infrastructures, and looks cleaner to me.

@oxalica oxalica closed this Feb 25, 2023
@oxalica oxalica deleted the fix/separate-debug branch February 25, 2023 23:57
mweinelt pushed a commit that referenced this pull request Mar 4, 2023
#146275 has more discussion on this; the abridged version
is that `lld` defaults to using `--build-id=fast` while GNU `ld` defaults
to `--build-id=sha1`. These differ in length and so
`separate-debug-info.sh`, as of this writing, errors on `lld`'s shorter
`--build-id=fast`-generated hashes.

`lld` offers the following `build-id` styles:
  - UUID (random; fast but bad for reproducibility)
  - fast (xxhash; fast but shorter hashes)
  - user provided hexstring
  - SHA1
  - MD5

GNU `ld` supports the latter three options, `mold` supports all of these
plus SHA256.

UUID is out because it's not reproducible, fast isn't supported by GNU
`ld`

Using a nix provided (sourced from the output base hash) hash as the
`build-id` seems tempting but would require a little extra work
(we have to include some characteristic of the binary being hashed
so that binaries within a derivation still have unique hashes; it
seems easy to get this wrong; i.e. a path based approach would make
two otherwise identical binaries that reside at different paths have
different `build-id` hashes)

That leaves SHA1 and MD5 and GNU `ld` already defaults to the former.

This commit adds `$NIX_BUILD_ID_STYLE` as an escape hatch, in case any
packages have strong opinions about which hash to use.

----

Note that if/when #146275 goes through, this change can be
reverted if linker speed is a priority.
mweinelt pushed a commit that referenced this pull request Mar 4, 2023
This was disabled here: b86e62d#diff-282a02cc3871874f16401347d8fadc90d59d7ab11f6a99eaa5173c3867e1a160

h/t to @teh: b86e62d#commitcomment-77916294
for pointing out that the failure that @matthewbauer was
seeing was caused by the `separate-debug-info.sh` `build-id` length
requirement that #146275 will relax

`lld` has had `--build-id` support dating back to LLVM4: https://reviews.llvm.org/D18091

This predates every `llvmPackages_` version currently in nixpkgs (and
certainly every version actually still used in `useLLVM` stdenvs) so
with the previous commit (asking `ld` for sufficiently long SHA1 hashes)
I think we can safely enable `separateDebugInfo` when using LLVM
bintools.
@rrbutani rrbutani added the 6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related label May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 5001+ This PR causes many rebuilds on Linux and must target the staging branches. 10.rebuild-linux-stdenv This PR causes stdenv to rebuild on Linux and must target a staging branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants