Skip to content

runtime build may not be reproducible on Mac #609

@mvdan

Description

@mvdan

We've been seeing gogarble.txt failures in CI on Mac for a couple of weeks now. They happen with Go 1.19.x

I was finally able to grab the two binaries which should be equal but are not, attached below. Here is the start of their diffoscope with LLVM installed:

--- file1-3602764619
+++ file2-1889520802
├── strings -a -n 8 {}
│ @@ -13,15 +13,14 @@
│  __noptrdata
│  __noptrbss
│  __LINKEDIT
│  /usr/lib/dyld
│  /usr/lib/libSystem.B.dylib
│  /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
│  /System/Library/Frameworks/Security.framework/Versions/A/Security
│ -/usr/lib/libobjc.A.dylib
│  UUUUUUUUH!
│  33333333H!
│  D$pH9P@w
│  t*H9HPt$
│  debugCal
│  debugCal
│  debugCalH9
│ @@ -33143,15 +33142,14 @@
│  _getrlimit
│  _getsockname
│  _getsockopt
│  _mach_absolute_time
│  _mach_timebase_info
│  _madvise
│  _nanosleep
│ -_objc_msgSend
│  _pthread_attr_destroy
│  _pthread_attr_getstacksize
│  _pthread_attr_init
│  _pthread_attr_setdetachstate
│  _pthread_cond_broadcast
│  _pthread_cond_init
│  _pthread_cond_signal
├── llvm-readobj --file-headers {}
│ @@ -3,16 +3,16 @@
│  Arch: x86_64
│  AddressSize: 64bit
│  MachHeader {
│    Magic: Magic64 (0xFEEDFACF)
│    CpuType: X86-64 (0x1000007)
│    CpuSubType: CPU_SUBTYPE_X86_64_ALL (0x3)
│    FileType: Executable (0x2)
│ -  NumOfLoadCommands: 18
│ -  SizeOfLoadCommands: 2432
│ +  NumOfLoadCommands: 17
│ +  SizeOfLoadCommands: 2376
│    Flags [ (0x85)
│      MH_DYLDLINK (0x4)
│      MH_NOUNDEFS (0x1)
│      MH_TWOLEVEL (0x80)
│    ]
│    Reserved: 0x0
│  }
├── llvm-readobj --needed-libs {}
│ @@ -2,9 +2,8 @@
│  Format: Mach-O 64-bit x86-64
│  Arch: x86_64
│  AddressSize: 64bit
│  NeededLibraries [
│    /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
│    /System/Library/Frameworks/Security.framework/Versions/A/Security
│    /usr/lib/libSystem.B.dylib
│ -  /usr/lib/libobjc.A.dylib
│  ]
├── llvm-readobj --symbols {}
│ @@ -1,66975 +1,66975 @@
│  
│  Format: Mach-O 64-bit x86-64
│  Arch: x86_64
│  AddressSize: 64bit
│  Symbols [
│    Symbol {
│ -    Name: _runtime.text (2024)
│ +    Name: _runtime.text (2010)
│      Type: Section (0xE)
│      Section: __text (0x1)
│      RefType: UndefinedNonLazy (0x0)
│      Flags [ (0x0)
│      ]
│      Value: 0x1000018C0
│    }
│    Symbol {
│ -    Name: _internal/cpu.Initialize (2038)
│ +    Name: _internal/cpu.Initialize (2024)
│      Type: Section (0xE)
│      Section: __text (0x1)
│      RefType: UndefinedNonLazy (0x0)
│      Flags [ (0x0)
│      ]
│      Value: 0x1000018C0
│    }
│    Symbol {
│ -    Name: _internal/cpu.processOptions (2063)
│ +    Name: _internal/cpu.processOptions (2049)
│      Type: Section (0xE)
│      Section: __text (0x1)
│      RefType: UndefinedNonLazy (0x0)
│      Flags [ (0x0)

It looks like, for some reason, the original build dynamically links against libc, but the rebuild does not. I'm not sure if msgSend (presumably from //go:cgo_import_dynamic libc_sendmsg sendmsg "libc.so" in Go's standard library) is relevant here, but it appears to be the only string difference as well.

Here is the last build log plus the diffed files:

Some observations:

  • This is a problem in Go 1.19.2 (per the log above), so if Go is to blame, it is not a recent problem.
  • I don't see anything about darwin, the toolchain, or libc for the released 1.19.3 nor the upcoming 1.19.4.
  • So far, we haven't reproduced outside of GitHub Actions. It could be an issue with how they installed Go or the C/libc toolchain. It would be useful to see if we can reproduce the error on other Mac machines.
  • This issue only pops up on Mac. We've been able to reproduce four times on CI, but not on any other OS. So this at least tells me that the bug is rather unlikely to be in garble itself - we only treat Mac as special in one single GOOS != "darwin" line, and it seems fairly innocent.
  • This started happening on about a quarter of all Mac builds on CI recently. I struggle to imagine what recent change in garble could have caused this. The first failure was 13 days ago in fix garble with newer Go tip versions #601 (comment), so any one of the handful of changes in garble over the last month could have caused it.

My current best guess is that we broke reproducibility on Darwin with 7c28663. The syscall package is quite special as it does a log of linker directive magic, like the //go:cgo_import_dynamic I showed above. We might be obfuscating those names, but we're definitely not touching those directives - we only update the ones we explicitly support, like //go:linkname.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions