Skip to content

Comments

fix(aix): add aix directory synchronization support#2115

Merged
matthewmcneely merged 4 commits intodgraph-io:mainfrom
pmur:murp/aix-sync-support
Feb 4, 2026
Merged

fix(aix): add aix directory synchronization support#2115
matthewmcneely merged 4 commits intodgraph-io:mainfrom
pmur:murp/aix-sync-support

Conversation

@pmur
Copy link
Contributor

@pmur pmur commented Sep 26, 2024

AIX doesn't support a proper flock like linux, but it seems to have enough support for process level file locking using fcntl.

For #2035

Problem

GOOS=aix does not build. AIX does not support a linux-like flock, and does not export unix.Flock.

Solution

Create an AIX specific directory-locking implementation using the AIX version of flock.

@pmur pmur requested a review from a team September 26, 2024 20:16
@CLAassistant
Copy link

CLAassistant commented Sep 26, 2024

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@netlify
Copy link

netlify bot commented Sep 26, 2024

Deploy Preview for badger-docs canceled.

Name Link
🔨 Latest commit ceab022
🔍 Latest deploy log https://app.netlify.com/sites/badger-docs/deploys/67121e7a9919ca000806bdce

@mangalaman93 mangalaman93 force-pushed the murp/aix-sync-support branch 2 times, most recently from 21435d9 to ceab022 Compare October 18, 2024 08:38
@pmur
Copy link
Contributor Author

pmur commented Oct 31, 2024

Hi, is there something I need to fix in the CL, or is this a CI issue?

@github-actions
Copy link

This PR has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open.

@github-actions github-actions bot added the Stale label Dec 31, 2024
@pmur
Copy link
Contributor Author

pmur commented Jan 2, 2025

This is still useful. Ping.

@github-actions github-actions bot removed the Stale label Jan 3, 2025
@ryanfoxtyler ryanfoxtyler requested a review from a team February 15, 2025 00:04
@ryanfoxtyler
Copy link
Contributor

@pmur tests are passing now, though I'm concerned about the maintenance cost of this when we don't have an environment to test on–is that something you're able/willing to help with?

@dmitshur
Copy link

dmitshur commented Feb 15, 2025

In case it's helpful, I'll point out that you can get some signal from compile-only testing via cross-compilation. For example, GOOS=aix GOARCH=ppc64 go test -c -o=/dev/null can be run on any platform you have access to, and at least check for build errors. (The -c flag tells the go command to compile the test binary but do not run it, while the -o flag is used to discard the result, since you only care about the exit code.)

@pmur
Copy link
Contributor Author

pmur commented Feb 17, 2025

I can provide some help in a limited capacity. Note, I have been using publicly available resources to implement test this (the aix golang CI instance, and the aix gcc compiler farm instance).

I hope @ayappanec might be able to provide more official help. He is part of the IBM team maintaining OSS software on aix.

@ayappanec
Copy link

@pmur Thanks for tagging me here.
We are currently in the process of creating new AIX VMs in OSU lab which we give it to various Opensource communities to enable AIX support. I will try to get a VM for this.

@pmur
Copy link
Contributor Author

pmur commented Mar 19, 2025

ping

@pmur
Copy link
Contributor Author

pmur commented Jan 21, 2026

This has been collected dust for awhile. Is this still blocking the transition of the Go AIX CI instance?

@dmitshur
Copy link

dmitshur commented Jan 21, 2026

Is this still blocking the transition of the Go AIX CI instance?

Yeah, I re-checked golang/go#67299 (comment) and LUCI's cas dependency still fails to cross-compile for aix/ppc64 today:

$ GOOS=aix GOARCH=ppc64 CGO_ENABLED=0 go install go.chromium.org/luci/client/cmd/cas@latest
go: downloading go.chromium.org/luci v0.0.0-20260121191827-3391df2e9ccc
[...]
# github.com/dgraph-io/badger/v3
go/pkg/mod/github.com/dgraph-io/badger/[email protected]/dir_unix.go:62:13: undefined: unix.Flock

An alternative might be to investigate how viable or disruptive it'd be to remove the dependency on badger from its dependency tree.

@matthewmcneely matthewmcneely requested a review from a team as a code owner January 21, 2026 22:25
Copy link
Collaborator

@matthewmcneely matthewmcneely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmur Sorry for the delay. If we get this through, I'll add some aix-specific build commands to our CD to validation that at least it compiles without error.

AIX doesn't support a proper flock like linux, but it seems
to have enough support for process level file locking using
fcntl.

Likewise, it does not support directory level fsync.
@pmur pmur force-pushed the murp/aix-sync-support branch from e370555 to 58b15a5 Compare January 23, 2026 21:10
@pmur
Copy link
Contributor Author

pmur commented Jan 23, 2026

I suspect there is more work to get this running fully on AIX beyond implementing this. It runs, but some unit tests seem to run incredibly slow. Though, that may be related to the only machine I have access to (gcc119 in the gcc compiler farm).

@pmur
Copy link
Contributor Author

pmur commented Jan 30, 2026

Ping

@matthewmcneely
Copy link
Collaborator

matthewmcneely commented Jan 30, 2026

I suspect there is more work to get this running fully on AIX beyond implementing this

but some unit tests seem to run incredibly slow

I'm not keen on releasing AIX support if it's half-baked and non-performant. I do not have access to a machine running AIX.

@pmur
Copy link
Contributor Author

pmur commented Jan 30, 2026

I'm not keen on releasing AIX support if it's half-baked and non-performant. I do not have access to a machine running AIX.

I wouldn't draw that conclusion from my N=1 sample of available machines. The community machine I have access to is slow to run anything; it's a shared community machine running old (power8) hardware.

The lack of directory fsync support doesn't seem like a blocker. The documentation seems to indicate it's only best effort for OS crashes (loss of power, or otherwise).

@ayappanec could you run the tests on more representative hardware? I don't trust gcc farm machine for anything performance related.

@ayappanec
Copy link

@pmur Sure. Let me check and get back.

@ayappanec
Copy link

Is that "jemalloc" is a prerequisite for this ?

# ./test.sh
go version go1.25.5 aix/ppc64
/badger/badger /badger
github.com/dgraph-io/ristretto/v2/z
# github.com/dgraph-io/ristretto/v2/z
/go/pkg/mod/github.com/dgraph-io/ristretto/[email protected]/z/calloc_jemalloc.go:13:10: fatal error: jemalloc/jemalloc.h: No such file or directory
   13 | #include <jemalloc/jemalloc.h>
      |          ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.

@pmur
Copy link
Contributor Author

pmur commented Feb 2, 2026

I ran the tests on gcc119 with something like GOMAXPROCS=8 go test -v -timeout=1h. You could try unsetting tags in test.sh for aix and running it too.

As for jemalloc, @ayappanec you may know better. I tried building the tip of jemalloc on gcc119, it failed with a nonsensical:

configure: error: Unsupported pointer size: 0

@ayappanec
Copy link

Thanks @pmur
Just ran GOMAXPROCS=8 go test -v -timeout=1h. It took around 30 minutes for completion.

Level Done
--- PASS: ExampleOpen (1.21s)
=== RUN   ExampleTxn_NewIterator
badger 2026/02/02 10:14:43 INFO: All 0 tables opened in 0s
badger 2026/02/02 10:14:43 INFO: Discard stats nextEmptySlot: 0
badger 2026/02/02 10:14:43 INFO: Set nextTxnTs to 0
badger 2026/02/02 10:14:43 INFO: Lifetime L0 stalled for: 0s
badger 2026/02/02 10:14:44 INFO: 
Level 0 [ ]: NumTables: 01. Size: 15 KiB of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 6 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level Done
--- PASS: ExampleTxn_NewIterator (1.23s)
PASS
ok      github.com/dgraph-io/badger/v4  1777.993s

@matthewmcneely
Copy link
Collaborator

@ayappanec Can you describe the cpu and memory characteristics for the machine on which you ran that test?

@ayappanec
Copy link

# lparstat

System configuration: type=Shared mode=Uncapped smt=2 lcpu=30 mem=40960MB psize=17 ent=1.50 

%user  %sys  %wait  %idle physc %entc  lbusy  vcsw phint  %nsp  %utcyc
----- ----- ------ ------ ----- ----- ------ ----- ----- -----  ------
  0.1   0.3    0.0   99.7  0.01   0.5    3.3 3900039994     0   101   0.78 

It has 15 virtual cpus (shared mode) and smt is 2. So logical cpus is 2x15 = 30. It's a POWER8 machine
Memory is 40GB.

@matthewmcneely matthewmcneely merged commit d3b6b86 into dgraph-io:main Feb 4, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

6 participants