Skip to content

Releases: go-webgpu/goffi

v0.5.3 — FreeBSD ARM64 Support

28 May 10:39
542dbd2

Choose a tag to compare

What's Changed

Fixed

  • FreeBSD ARM64 build failure — added freebsd to build tags in 4 files. FreeBSD ARM64 uses identical AAPCS64 calling convention as Linux ARM64 — no code changes needed, only build tag additions (#52)
  • FreeBSD ARM64 callback trampolineffi/callback_arm64.s was missing freebsd while Go side already included it, causing NewCallback() to silently return invalid addresses on FreeBSD ARM64

Changed

  • Platform count: 7 → 8 targets (added FreeBSD ARM64)
  • CI cross-compilation check now validates all 8 platforms

Platform Support (8 targets)

Platform Arch ABI Status
Linux amd64 System V Tested
Linux arm64 AAPCS64 Cross-compile verified
Windows amd64 Win64 Tested
Windows arm64 AAPCS64 Tested (Snapdragon X)
macOS amd64 System V Tested
macOS arm64 AAPCS64 Tested (M3 Pro)
FreeBSD amd64 System V Cross-compile verified
FreeBSD arm64 AAPCS64 Cross-compile verified ← NEW

Thanks to @unxed for reporting #52!

v0.5.2 — Variadic Functions + go vet Clean

25 May 08:37
8dfbc56

Choose a tag to compare

What's New

Variadic Function Support (closes #47)

PrepareVariadicCallInterface(cif, convention, nfixedargs, returnType, argTypes) enables calling C variadic functions (printf, sprintf, XCreateIC, custom variadic APIs).

On Apple ARM64 (M1/M2/M3/M4), variadic arguments are forced onto the stack at the fixed/variadic boundary per Apple's ABI — the same behavior as clang and libffi ffi_prep_cif_var(). On all other platforms the call is identical to PrepareCallInterface.

var cif types.CallInterface
ffi.PrepareVariadicCallInterface(&cif, types.DefaultCall, 1, // nfixedargs
    types.SInt64TypeDescriptor,
    []*types.TypeDescriptor{types.SInt64TypeDescriptor, types.SInt64TypeDescriptor, types.SInt64TypeDescriptor})

Also in this release

  • go vet clean — fixed dl_unix.go unsafe.Pointer warnings, syscall_linux_stub.s return signature
  • cmd/variadic-test — standalone verification binary for Apple Silicon: go run github.com/go-webgpu/goffi/cmd/variadic-test
  • E2E variadic tests with gcc-compiled C test functions

Full Changelog

See CHANGELOG.md

v0.5.1 — Struct ABI + CGO_ENABLED=1 + Race Detector

13 May 20:29
e45f47e

Choose a tag to compare

What's New

CGO_ENABLED=1 Support (PR #37 by @jiyeyuran)

goffi now works under both CGO_ENABLED=0 and CGO_ENABLED=1. Enables race detector (go test -race), coexistence with CGO libraries (gocv, database drivers), and resolves #22 duplicate symbol conflict as alternative workaround.

Struct By-Value Argument Passing (PR #39, closes #33)

Structs passed as arguments were sent as raw pointers instead of their bytes. Now correctly handles all three size classes per System V AMD64 ABI: ≤8B (INTEGER/SSE), 9-16B (two eightbytes), >16B (MEMORY class). Windows Win64 ABI also fixed.

9-16B Struct Return via XMM Registers (PR #45, P1)

Structs like {float64, float64} (NSPoint, NSSize, CGPoint, CGSize) now correctly return via XMM0:XMM1. Previously misclassified as sret, producing corrupted values on macOS Intel. Four return modes: RAX:RDX, RAX:XMM0, XMM0:RAX, XMM0:XMM1.

Callback Struct Arguments (PR #42 by @pekim, closes #41)

Callbacks (C→Go) now accept struct arguments on AMD64 Unix. All three size classes supported. First pure Go FFI library with this capability.

Race Detector Compatible (PR #39)

CGO_ENABLED=1 go test -race now passes cleanly — checkptr violations fixed with double-indirection pattern (Go proposal #58625).

E2E Test Infrastructure

gcc-compiled C test library (testdata/structtest.c) validates struct passing and return across all size classes and ABI modes.

Contributors

  • @jiyeyuran — CGO_ENABLED=1 support, C-thread callback test, ongoing CGO path maintainer
  • @pekim — callback struct arguments, struct bug reports (#33, #41)

Full Changelog

See CHANGELOG.md for complete details.

v0.5.0 — Windows ARM64 + FreeBSD support

29 Mar 11:44
ca3231c

Choose a tag to compare

What's New

Two new platforms -- goffi now supports 7 targets:

Windows ARM64 (Snapdragon X)

  • Extended AAPCS64 ARM64 implementation to Windows via build tag changes
  • Zero new assembly -- Windows ARM64 ABI is identical to Unix ARM64
  • runtime.cgocall works on Windows without fakecgo
  • Tested on Samsung Galaxy Book 4 Edge (Snapdragon X Elite) by @SideFx

FreeBSD amd64

  • Added libc.so.7 dynamic loading support
  • System V ABI identical to Linux -- same assembly code
  • Requires -gcflags="github.com/go-webgpu/goffi/internal/fakecgo=-std" for CGO_ENABLED=0 builds

CI improvements

  • New cross-compilation job validates all 7 platforms compile correctly

Platform Support (7 targets)

Platform Arch ABI Status
Windows amd64 Win64 Production
Windows arm64 AAPCS64 NEW -- tested on Snapdragon X
Linux amd64 System V Production
Linux arm64 AAPCS64 Production
macOS amd64 System V Production
macOS arm64 AAPCS64 Production
FreeBSD amd64 System V NEW -- cross-compile verified

Full Changelog

See CHANGELOG.md

v0.4.2

03 Mar 19:49
c8ef100

Choose a tag to compare

purego Compatibility Fix

Fixed

  • Unix: duplicate symbol conflict with purego — added build tag nofakecgo to resolve _cgo_init linker collision when goffi and purego coexist with CGO_ENABLED=0 (#22)

Workaround

When using both goffi and purego in the same binary:

CGO_ENABLED=0 go build -tags nofakecgo ./...

This disables goffi's internal fakecgo, relying on purego's identical copy.

Also included

  • Unit tests for types and internal/arch/amd64 packages
  • Test coverage increased from 75% to 89% (-coverpkg=./...)
  • Dynamic Codecov badge in README

Closes #22

v0.4.1 — ABI Compliance Hotfix

02 Mar 00:27
7e87b11

Choose a tag to compare

ABI Compliance Hotfix

Full forward call path audit — 10 of 11 identified ABI gaps fixed.

Fixed

  • Float32 argument encoding bugmath.Float32bits instead of float64 widening, which corrupted XMM bit patterns
  • AMD64 Unix: stack spill for arguments 7+ — args beyond 6 GP registers now correctly pushed to stack before CALL
  • ARM64 Unix: stack spill for arguments 9+ — args beyond 8 GP registers now correctly pushed to stack before BL
  • AMD64 struct return 9-16 bytes — RAX+RDX register pair correctly assembled into output buffer
  • AMD64 sret hidden pointer — structs >16B inject caller buffer as first arg (RDI), per System V ABI
  • ARM64 HFA stack spill — HFA overflow correctly spills entire aggregate to stack per AAPCS64
  • runtime.KeepAlive — added after each FFI call to prevent GC of argument pointers

Added

  • Overflow detectionErrTooManyArguments for >15 args
  • Regression tests: TestWindowsStackArguments, TestWindowsStackArgumentsFileIO, TestWindowsStackArguments10Args, TestFloat32ArgEncoding, TestOverflowDetection, TestUnixStackSpill7Args

Removed

  • Dead callUnix64 assembly experiment

Known Limitation (documented)

  • Windows: float return from XMM0syscall.SyscallN only returns RAX, not XMM0

Verification

  • Build: all 5 platforms cross-compile OK
  • Tests: all PASS, coverage 89.6%
  • Linter: 0 issues

Closes #19

v0.4.0 — crosscall2 Integration

27 Feb 09:35
6fe9a0b

Choose a tag to compare

What's New

crosscall2 integration — callbacks now work from C-library-created threads (Metal, wgpu-native).

Added

  • crosscall2 integration for C-thread callback support (#16)
    • Dispatchers route through crosscall2 → runtime·load_g → runtime·cgocallback
    • Supports callbacks from arbitrary C threads
    • callbackWrap_call closure for ABIInternal fn ptr from assembly
    • go_asm.h constants for callbackArgs struct offsets

Fixed

  • fakecgo trampoline register bugs (synced with purego v0.10.0)
    • ARM64: R26→R9, R2→R9, threadentry callee-save/restore
    • AMD64: DX→R11, CX→R11, BX→R11, JMP tail calls, PUSH_REGS_HOST_TO_ABI0

Verification

  • All CI checks pass (Linux, Windows, macOS)
  • 89.6% test coverage
  • 0 linter issues
  • 5-platform cross-compile verified

Full Changelog: v0.3.9...v0.4.0

v0.3.9 — ARM64 Callback Fixes

18 Feb 10:46
aa78271

Choose a tag to compare

What's Changed

Fixed

  • ARM64 callback trampoline rewrite — replaced BL dispatcher with MOVD $index, R12 + B dispatcher pattern (matching Go runtime and purego conventions). Fixes LR corruption and entrySize mismatch for callbacks at index > 0.
  • Symbol rename — callback assembly symbols renamed to package-scoped (·callbackTrampoline/·callbackDispatcher) to avoid linker collision with purego (#15)

Known Limitations

  • crosscall2 bypass — callbacks invoked from C-library-created threads (e.g., Metal addCompletedHandler:) may fail because goffi calls Go directly without crosscall2 → runtime·cgocallback. Tracked in #16, planned for v0.4.0.

Upgrading

go get github.com/go-webgpu/[email protected]

If you use goffi callbacks on ARM64 (macOS Apple Silicon / Linux ARM64), this update is strongly recommended.

Full Changelog: v0.3.8...v0.3.9

v0.3.8: Enterprise-grade CGO_ENABLED=1 Error Handling

24 Jan 20:07
84e93e8

Choose a tag to compare

What's Changed

This release fixes confusing linker errors that occurred when building on Linux/macOS with a C compiler (gcc/clang) installed.

Fixed

  • CGO_ENABLED=1 build error handling (gogpu/wgpu#43)
    • Users now see a clear compile-time error: undefined: GOFFI_REQUIRES_CGO_ENABLED_0
    • Opening the source file shows full instructions in godoc comment

Added

  • Compile-time CGO detection with descriptive error identifier
  • Requirements section in README.md with clear CGO_ENABLED=0 instructions
  • Runtime panic fallback with detailed fix instructions (defense in depth)

Changed

  • Added !cgo build constraint to:
    • ffi/dl_unix.go, ffi/dl_darwin.go
    • internal/dl/dl_stubs_unix.s, internal/dl/dl_wrappers_unix.s
    • internal/dl/dl_stubs_arm64.s, internal/dl/dl_wrappers_arm64.s

User Experience

Before (v0.3.7):

# github.com/go-webgpu/goffi/ffi
.../dl_unix.go:54:20: undefined: dl.Dlopen

Confusing - no indication of how to fix

After (v0.3.8):

# github.com/go-webgpu/goffi/ffi
ffi/cgo_unsupported.go:28:9: undefined: GOFFI_REQUIRES_CGO_ENABLED_0

Clear - identifier name tells user exactly what's needed

Quick Fix

CGO_ENABLED=0 go build ./...

Or set permanently:

go env -w CGO_ENABLED=0

Full Changelog: v0.3.7...v0.3.8

v0.3.7 - ARM64 Darwin Comprehensive Support

03 Jan 07:18

Choose a tag to compare

ARM64 Darwin Comprehensive Support

This release adds comprehensive ARM64 darwin (Apple Silicon) support, tested on M3 Pro.

Added

  • ARM64 Darwin comprehensive support (PR #9 by @ppoage)

    • Tested on Apple Silicon M3 Pro (64 ns/op benchmark)
    • Nested struct handling via placeStructRegisters()
    • Mixed int/float struct support via countStructRegUsage()
    • ensureStructLayout() for auto-computing size/alignment
    • Assembly shim (abi_capture_test.s) for ABI verification
    • Comprehensive darwin ObjC tests (747 lines)
    • Struct argument tests (537 lines)
  • r2 (X1) return for 9-16 byte struct returns

    • Call8Float now returns both X0 and X1
    • Fixes struct returns between 9-16 bytes on ARM64
  • uint64 bit patterns for float registers

    • Cleaner handling of mixed float32/float64 arguments

Fixed

  • BenchmarkGoffiStringOutput segfault on darwin
    • Pointer argument now correctly passed as unsafe.Pointer(&strPtr)

Contributors

  • @ppoage - ARM64 Darwin fixes, ObjC tests, assembly shim

Full Changelog: v0.3.6...v0.3.7