Skip to content

DAX RX: native pw_stream source on Linux — 400ms → 200ms (#1008)#2312

Merged
ten9876 merged 2 commits intoten9876:mainfrom
RiskAndReward1337:aidan/dax-1008-native-pipewire
May 4, 2026
Merged

DAX RX: native pw_stream source on Linux — 400ms → 200ms (#1008)#2312
ten9876 merged 2 commits intoten9876:mainfrom
RiskAndReward1337:aidan/dax-1008-native-pipewire

Conversation

@RiskAndReward1337
Copy link
Copy Markdown
Contributor

Summary

Replaces module-pipe-source with a native libpipewire-0.3 pw_stream per DAX RX channel — the deferred Patch D from the #1008 triage thread. On my setup (FLEX-8600 fw 4.1.5, PipeWire 1.0.5, Qt 6.11.0, WSJT-X via Qt PulseAudio backend) this drops WSJT-X DT from a steady ~400 ms baseline down to a steady ~200 ms.

Marked as draft. This branch is rebased off bc47c3b (43 commits behind upstream/main at time of push) and the diff against upstream/main will look noisy because of that drift, not because of changes in this PR. Posting it draft so the upstream agents can review the architecture and decide whether to rebase + merge, or cherry-pick the new files onto a fresh branch.

See issue #1008 comment for the full measurement trace, what failed pulse-side experiments were tried (so the next agent can skip them), and where the remaining 200 ms actually lives (consumer-side, not in our pipeline).

Architecture

  • src/core/PipeWireNativeContext.{h,cpp} — singleton owning a shared pw_thread_loop + pw_context + pw_core, refcounted across the four DAX RX channels. Loop runs on its own dedicated RT thread.
  • src/core/PipeWireNativeRxSource.{h,cpp} — one pw_stream per channel, Audio/Source + PW_DIRECTION_OUTPUT. Producer (Qt audio thread) ↔ consumer (PipeWire RT thread) via fixed-size SPSC float ring with std::atomic head/tail. No locks, no allocations on the RT path. Drops oldest samples on overflow so backlog can never exceed ring size (~42 ms).

Stream is created with node.latency=256/48000, node.force-quantum=256, node.force-rate=48000, node.always-process=true. Verified via pw-top that the DAX nodes actually lock at QUANT=256 with B/Q ≈ 0 once primed.

Bridge changes

  • feedDaxAudio() is now fully lock-free and runs on PanadapterStream's network thread via Qt::DirectConnection (wired in MainWindow::startDax). Removes 10–50 ms of variable latency that the GUI thread was adding under waterfall paint load.
  • m_open / m_channelGain[] / m_transmitting / m_lastAudioMs converted to std::atomic so the audio fast path never touches Qt-thread-affine state.
  • Silence timer (TX→RX gap fill) self-stops by observing m_lastAudioMs rather than calling QTimer::stop() cross-thread.
  • Per-channel fallback: if PipeWireNativeRxSource::open() fails, that channel still uses the legacy module-pipe-source path. Non-PipeWire systems are unaffected.
  • Audio is float32 mono 48 kHz end-to-end. Linear-interp upsample of the radio's 24 kHz stereo to 48 kHz mono runs in feedDaxAudio so PipeWire never inserts a resampler.

Build

  • pkg_check_modules(PIPEWIRE_NATIVE libpipewire-0.3) gates the new path behind HAVE_PIPEWIRE_NATIVE. Build silently falls back to the existing pipe-source path when the dev headers aren't present.
  • libpipewire-0.3-dev added to .github/docker/Dockerfile for CI.

What this does not fix

The remaining ~200 ms gap to macOS-equivalent is the libpulse fragment-pool buffer pulse-compat holds for libpulse clients (WSJT-X via Qt PA gets pulse.attr.fragsize=4800 × ~4 fragments queued). PipeWire 1.0.5 has no documented server-side mechanism to clamp this from our source — pulse.min.* keys are minimums (xrun protection), not maximums. Bypass-tested with pw-cat (native PipeWire client) which gets sub-10 ms from the same source. Real fix paths are consumer-side and tracked in the issue thread.

Test plan

  • Builds cleanly with Qt 6.11.0 / RelWithDebInfo
  • pw-dump confirms all four DAX nodes register with node.latency=256/48000 + node.force-quantum=256
  • pw-top shows DAX nodes locked at QUANT=256, B/Q≈0, no xruns under load
  • pw-cat --record --target=aethersdr-dax-1 captures clean float32 mono 48 kHz audio
  • WSJT-X DT measured at ~200 ms steady (down from ~400 ms baseline)
  • Per-channel fallback to legacy module-pipe-source path verified by forcing open() failure
  • TX path unaffected (silence-fill, readTxPipe, gain, RMS metering all unchanged)

🤖 Generated with Claude Code

@RiskAndReward1337 RiskAndReward1337 force-pushed the aidan/dax-1008-native-pipewire branch from a18d51d to f7de417 Compare May 3, 2026 13:31
@jensenpat
Copy link
Copy Markdown
Collaborator

@RiskAndReward1337 Can you split the main window thread priority change into a separate PR for testing? I don't want to introduce instability in other non-Linux DAX / TCI use cases.

Replaces module-pipe-source with a native libpipewire-0.3 pw_stream per DAX
channel.  This eliminates both the kernel FIFO and the pulseaudio-pipe
translation that the previous path relied on, getting WSJT-X DT from a
~400 ms baseline down to a steady ~200 ms.  See issue ten9876#1008 comment thread
for measurement traces.

Architecture
------------
- PipeWireNativeContext (singleton) owns one pw_thread_loop + pw_context +
  pw_core, refcounted across the four RX channels.  Locked across pw_stream
  creation/destruction; the loop runs on its own dedicated thread.
- PipeWireNativeRxSource is one pw_stream per channel, registered as
  Audio/Source with PW_DIRECTION_OUTPUT.  Producer (Qt audio thread) and
  consumer (PipeWire RT thread) communicate via a fixed-size SPSC float ring
  with std::atomic head/tail indices — no locks, no allocations on the RT
  path.  Drops oldest samples on overflow so backlog cannot grow past ring
  size (~42 ms).

Stream is created with:
    PW_KEY_NODE_LATENCY        = 256/48000   (5.3 ms quantum request)
    PW_KEY_NODE_RATE           = 1/48000
    PW_KEY_NODE_FORCE_QUANTUM  = 256
    PW_KEY_NODE_FORCE_RATE     = 48000
    PW_KEY_NODE_ALWAYS_PROCESS = true

Verified live via pw-top: DAX nodes lock at QUANT=256, B/Q ~ 0, no xruns
once the ring is primed.  pw-cat (native PipeWire client) captures cleanly
from the source with ~5.3 ms latency end-to-end.

Bridge changes
--------------
- PipeWireAudioBridge::feedDaxAudio is now lock-free / atomic-only.  The
  audio fast path no longer touches Qt-thread-affine state, which is a
  prerequisite for an upcoming MainWindow connect-type change (split into
  a separate PR per maintainer request, so the connection-type change can
  be tested in isolation against non-Linux DAX/TCI use cases).
- m_open / m_channelGain / m_transmitting / m_lastAudioMs converted to
  std::atomic.
- Silence timer self-stops by observing m_lastAudioMs from the main thread
  rather than calling QTimer::stop() from the audio thread.
- Per-channel native source falls back to legacy module-pipe-source if the
  native open() fails, so non-PipeWire systems still work.
- Audio is float32 mono 48 kHz end-to-end.  Linear-interp upsample of the
  radio's 24 kHz stereo to 48 kHz mono runs in feedDaxAudio so the
  PipeWire graph does not have to insert a resampler.

Build
-----
- pkg_check_modules(PIPEWIRE_NATIVE libpipewire-0.3) gates the new path
  behind a HAVE_PIPEWIRE_NATIVE define.  When the dev headers are missing,
  the build silently falls back to the legacy module-pipe-source path
  (issue ten9876#1008's PR ten9876#1901 work, still present and used).
- libpipewire-0.3-dev added to .github/docker/Dockerfile for CI.

Remaining gap (~200 ms, consumer-side)
--------------------------------------
The remaining 200 ms is the libpulse fragment-pool buffer that pulse-compat
holds for clients using the PulseAudio API (WSJT-X via Qt PulseAudio
backend connects with pulse.attr.fragsize=4800 = 50 ms × ~4 fragments
queued).  Confirmed via pw-cat bypass test — native PipeWire clients get
sub-10 ms latency from the same source.  PipeWire 1.0.5 does not expose a
documented server-side mechanism to clamp pulse.attr.fragsize from the
source side; pulse.min.* keys are minimums (xrun protection), not maximums.
See issue ten9876#1008 for full measurement trace and the failed pulse.rules
attempts.  Real fix paths are consumer-side (PULSE_LATENCY_MSEC env, Qt
native PipeWire backend) and tracked separately.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@RiskAndReward1337
Copy link
Copy Markdown
Contributor Author

@jensenpat done — split into #2318. The MainWindow.cpp Qt::DirectConnection swap is now the only change in that PR so it can be reviewed/tested in isolation against non-Linux DAX / TCI use cases.

This PR has been rebased onto current upstream/main (ef84ad5) and the MainWindow.cpp hunk dropped — the bridge-side atomic conversions stay here as a prereq for #2318. Build is clean (Qt 6.11.0, RelWithDebInfo).

Recommended merge order: this PR first (so PipeWireAudioBridge::feedDaxAudio is lock-free), then #2318. I flagged in #2318 that VirtualAudioBridge::feedDaxAudio should be reviewed by someone with macOS hardware before merging that one.

@ten9876
Copy link
Copy Markdown
Owner

ten9876 commented May 3, 2026

Claude here on Jeremy's behalf — really nice writeup, and the architectural direction (native pw_stream + DirectConnection on the audio fast path) is sound. The 400 → 200 ms result with explicit identification of where the remaining latency lives (consumer-side libpulse fragsize, not in your pipeline) is exactly the level of rigor we love to see. Account checks out, no security concerns in the diff (Dockerfile / CMake / pactl call sites are all clean, no shell-injection surface introduced).

A few items to address before merge — the first is a real (rare) data race; the rest are smaller:

Must-fix:

  1. SPSC ring invariant is violated. PipeWireNativeRxSource.h:980-985 documents "Producer writes m_writeIdx; consumer writes m_readIdx", but the overflow path in PipeWireNativeRxSource.cpp:838 calls m_readIdx.fetch_add(drop, std::memory_order_release) from the producer side. Two writers to m_readIdx is no longer SPSC: when onProcess is mid-memcpy from [oldR..oldR+toRead-1] and the producer's drop advances m_readIdx past the slot about to be overwritten by the wrap, you get a torn read. Only triggers when the consumer stalls and the producer overflows in the same window — rare, but it's a real race the docs explicitly say can't happen. Suggested fix: drop new samples on overflow instead, or expose a separate atomic drop counter and let the consumer detect the gap on its next read.

  2. Raw new for the listener hook. PipeWireNativeRxSource.cpp:761: pw_stream_add_listener(m_stream, new spa_hook{}, &kStreamEvents, this). The project's CLAUDE.md style guide rules out raw new/delete — and besides, this leaks a hook per open() call. spa_hook is POD; just make it a member (spa_hook m_listenerHook{};) and pass &m_listenerHook.

Worth fixing while you're in there:

  1. PipeWireNativeContext::acquire() race. PipeWireNativeContext.cpp:475 — if thread B's fetch_add returns 1 while thread A is still mid-init (post-fetch_add, pre-m_loop = ...), B sees m_loop == nullptr and returns false even though init is in progress. Header comment says "Safe to call from any thread"; today they're effectively serialized through the main thread, but a std::once_flag or short mutex around init/teardown matches the documented contract.

  2. reinterpret_cast of the state-changed callback. PipeWireNativeRxSource.cpp:665 casts an (int, int, ...) signature to libpipewire's (enum pw_stream_state, enum pw_stream_state, ...) — works on Linux x86_64/aarch64 because the enums are int-sized, but it's UB in strict C++. Just declare onStateChanged with the enum types and drop the cast.

  3. Stale doc. PipeWireNativeRxSource.h:957 says "caps backlog at the ring size (~170 ms)" but RING_SIZE = 2048 / 48000 ≈ 42 ms. The 170 ms number is from the earlier 8192 sizing.

Nice-to-have, not blocking:

  • Run CI on the branch so the Linux build path with libpipewire-0.3-dev gets validated end-to-end before merge.

Once #1 and #2 are addressed I'd be happy to give a fresh pass and recommend merge. The latency win is meaningful and the rest of the work is genuinely careful — the upsampler's cross-packet prev retention and TX-gap reset detail in particular is the kind of thing easy to miss.

73, Jeremy KK7GWY & Claude (AI dev partner)

…ten9876#1008)

Addresses the review on ten9876#2312:

1. SPSC ring invariant.  feedAudio() previously advanced m_readIdx from
   the producer side on overflow ("drop oldest"), which is a real (rare)
   race against the consumer's in-flight memcpy when a wraparound write
   crosses the slot the consumer has already claimed.  Switch to
   drop-newest: clamp toCopy to the available space and bail if zero.
   m_readIdx is now consumer-only as documented.  The 42 ms latency cap
   is preserved; in steady state the ring is near-empty so the choice
   between drop-oldest and drop-newest only matters during consumer
   stalls (which in practice only occur under extreme system load).

2. Raw new for the listener hook.  pw_stream_add_listener was being
   handed a heap-allocated spa_hook that leaked on every open().  Make
   the hook a by-value member of PipeWireNativeRxSource so its lifetime
   is tied to the source itself — no raw new/delete, and zero leaks.

3. PipeWireNativeContext init/teardown race.  acquire() did
   m_refCount.fetch_add(1) and then read m_loop without serialisation,
   so a second thread arriving while the first was mid-init would see
   m_loop == nullptr and bail incorrectly.  Add a std::mutex around the
   init/teardown body in acquire()/release() so the documented "safe
   from any thread" contract is real instead of relying on incidental
   main-thread serialisation.  m_refCount no longer needs to be atomic
   under the lock; it is now a plain int.

4. reinterpret_cast on the state-changed callback.  Replace the cast of
   our int-arg static onStateChanged into pipewire's enum-arg slot with
   a small anonymous-namespace trampoline that has the correct C
   signature and forwards via static_cast<int>.  Header stays free of
   <pipewire/stream.h>.

5. Stale doc.  The feedAudio() comment still claimed "~170 ms" backlog
   from the earlier 8192 ring sizing.  RING_SIZE has been 2048 (~42 ms)
   for some time — fix the comment.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@RiskAndReward1337
Copy link
Copy Markdown
Contributor Author

@ten9876 Thanks Jeremy & Claude — that's a really sharp review, especially catching the SPSC violation. All five items addressed in 2cf2d15:

Must-fix #1 — SPSC ring race. Switched to drop-newest in feedAudio(). m_readIdx is now consumer-only as the docs claim. The 42 ms latency cap is preserved; the only behavioural change is which samples get dropped during a consumer stall, and in steady state the ring is near-empty so the choice doesn't matter. I went with the simpler of your two suggestions (drop-newest) rather than the gap-counter option because consumer stalls only occur under extreme system load in this codepath, so the rare drop-oldest-vs-drop-newest distinction isn't worth the extra coordination.

Must-fix #2 — raw new spa_hook. Made it a by-value member (spa_hook m_listenerHook{}). Lifetime now matches the RxSource itself, no leak across re-open() calls, and complies with the no-raw-new rule in CLAUDE.md. Pulled <spa/utils/hook.h> into the header for the member type.

Worth-fixing #3acquire() race. Wrapped init/teardown bodies in a std::mutex. m_refCount is no longer atomic since it's now under the lock — plain int is sufficient. The "safe from any thread" header contract is now real instead of incidentally true via main-thread serialisation.

Worth-fixing #4reinterpret_cast UB. Replaced with an anonymous-namespace trampoline (onStateChangedTrampoline) that has the correct C signature (enum pw_stream_state args) and forwards via static_cast<int> into the static member. Header stays free of <pipewire/stream.h>.

Worth-fixing #5 — stale doc. Comment on feedAudio() now correctly says "~42 ms" and reflects the new drop-newest semantics + SPSC invariant.

CI. The previous push's CI run on this branch was green (#25286904608). The new push's run is currently in action_required waiting on maintainer approval.

Builds clean locally on Qt 6.11.0 / RelWithDebInfo with no new warnings on the four touched files. Verified the binary still drives DAX correctly post-changes.

Ready for a fresh pass whenever convenient. Thanks for the careful read — that ring race genuinely could have shipped.

@ten9876 ten9876 marked this pull request as ready for review May 4, 2026 23:45
@ten9876 ten9876 merged commit 9834c05 into ten9876:main May 4, 2026
5 checks passed
@ten9876
Copy link
Copy Markdown
Owner

ten9876 commented May 4, 2026

Claude here on Jeremy's behalf — merged via admin squash. All five review items cleanly addressed in 2cf2d15; the SPSC fix in particular landed exactly the way you described.

Jeremy is going to run the hardware smoke test on his FLEX-8600 + Linux PipeWire setup against WSJT-X. We'll holler back here if the on-radio path surfaces anything — your fallback to legacy module-pipe-source on per-channel open() failure means worst case is "no latency improvement," not "DAX broken."

Thanks for the careful workmanship and the well-reasoned writeup. The 400 → 200 ms result with explicit identification of the remaining consumer-side libpulse fragsize gap is the kind of detail that makes review easy.

73, Jeremy KK7GWY & Claude (AI dev partner)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants