fix(cluster): restore auto-reconnect after failed connection attempt (#2380)#2388
fix(cluster): restore auto-reconnect after failed connection attempt (#2380)#2388M7HNF-Ian wants to merge 6 commits intoten9876:mainfrom
Conversation
Adds a Spot Lines: Enabled/Disabled toggle to SpotHub → Display tab. When disabled, the vertical dotted lines drawn from the spectrum base up to each spot label are hidden. Useful during contest operation where dozens of simultaneous spots create visual clutter that obscures the FFT trace. Spot callsigns and frequencies remain fully visible. Setting persists via AppSettings key SpotShowLines, defaults to Enabled, and takes effect immediately without a restart.
Per triage feedback on PR ten9876#2349, align naming with the existing IsSpotsEnabled / IsSpotsOverrideColorsEnabled / setSpotOverrideColors pattern: - AppSettings key: SpotShowLines → IsSpotsLinesEnabled - SpectrumWidget setter: setSpotLines → setSpotShowLines - SpectrumWidget getter: spotLines() → spotShowLines() - SpectrumWidget member: m_spotLines → m_spotShowLines - SpotSettingsDialog member: m_spotLines → m_spotShowLines Add a one-shot migration in MainWindow ctor to silently upgrade any saved SpotShowLines key to IsSpotsLinesEnabled on first run. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Remove the SpotShowLines → IsSpotsLinesEnabled migration block from MainWindow ctor: the old key was never in a released build so no real users can have it set, making the migration dead weight. - Align SpotSettingsDialog tooltip with DxClusterDialog: both now say "Show vertical lines from the spectrum up to each spot label.\nDisable during contests to reduce clutter." (was truncated in legacy dialog). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…en9876#2385) TciServer is a QObject child of MainWindow, so Qt deletes it during ~QWidget::deleteChildren() — which runs *after* MainWindow's value members (including m_radioModel) have already been destroyed. This meant stop() → releaseDaxForTci() → m_model->isConnected() dereferenced a destroyed object on every clean exit when a radio was connected, causing EXC_BAD_ACCESS at 0x38 on the main thread. Primary fix: explicitly delete m_tciServer in ~MainWindow() after the audio thread is stopped (no more daxAudioReady cross-thread signals) but while m_radioModel is still fully alive (DAX stream-remove commands reach the radio). TciApplet's raw back-reference is nulled first via setTciServer(nullptr) to prevent a dangling pointer surviving in the widget tree between the explicit delete and ~QWidget::deleteChildren(). Belt-and-braces: promote TciServer::m_model from RadioModel* to QPointer<RadioModel>. QPointer auto-clears when the pointee is destroyed, so the existing `if (!m_model) return;` guard in releaseDaxForTci() now covers the post-destruct case — a future refactor cannot silently re-introduce the crash.
…wport (ten9876#2384) The radio encodes FFT bins as fractional pixel positions within its own internal dBm range. Adjusting the display viewport (min_dbm/max_dbm) does NOT cause the radio to re-encode subsequent frames with the new range. Previously, RadioModel::handlePanadapterStatus called PanadapterStream::setDbmRange on every min_dbm/max_dbm echo — including viewport changes triggered by the user dragging the amplitude scale. This caused the decode formula: bins[i] = maxDbm − (buf[i] / (yPix−1)) × (maxDbm − minDbm) to use the new viewport range while the radio was still encoding with its original range, producing a permanent dBm shift equal to the viewport offset. Two independent reporters confirmed a ~30–40 dBm shift matching their scale adjustment (ten9876#2384, ten9876#2310). Fix: PanadapterStream::setDbmRange now locks the decode range to the first value reported per stream and silently ignores subsequent calls for that stream. The initial value comes from the radio's first status echo (typically the −130/−40 range AetherSDR requests at pan creation, or a profile-restored range if the pan was pre-existing). The decode range is unlocked when the stream is unregistered, so reconnects and pan recreations pick up the correct initial range. The display viewport path (PanadapterModel::levelChanged → SpectrumWidget::setDbmRange) is unchanged and continues to handle scale/offset changes correctly.
…en9876#2380) The exponential-backoff reconnect chain broke permanently the first time a connection attempt failed while the network was down. The root cause was that Qt does not emit disconnected() when a socket in ConnectingState encounters an error — only disconnected() armed the reconnect timer, so one failed attempt silently halted all further retries. Three paths now all call the new scheduleReconnect() helper: 1. onDisconnected() — live connection dropped (was already working) 2. onSocketError() — connection attempt failed before reaching Connected 3. Connect-timeout lambda — 10 s timeout while still in ConnectingState scheduleReconnect() guards against double-scheduling with an isActive() check (errorOccurred and the timeout can both fire for a single attempt), and against firing after an intentional disconnect. A per-call epoch counter (m_connectEpoch) is captured in the timeout lambda so a stale timeout from attempt N cannot abort a later attempt N+1 that has since succeeded. connectToCluster() now also guards against overlapping calls when the socket is already in ConnectingState. Affects DX cluster and RBN (both use DxClusterClient). SpotCollector and WSJT-X (UDP) are unaffected.
There was a problem hiding this comment.
Thanks for the careful root-cause work on this — the cluster reconnect logic itself looks well-reasoned. Before reviewing further, though, I need to flag a scope issue that affects mergeability.
Scope: diff doesn't match the PR description
The PR description says only DxClusterClient.h and DxClusterClient.cpp are modified, but the diff touches 11 files and bundles three independent concerns:
DxClusterClient.{h,cpp}— the actual auto-reconnect fix described here. ✅ matches the description.TciServer.h+MainWindow.cpp(the~MainWindow()delete m_tciServerblock) — these are the same changes in your PR #2386 (TciServer crash on quit).PanadapterStream.{h,cpp}— these are the same changes in your PR #2387 (FFT decode-range lock).DxClusterDialog.cpp,SpectrumWidget.{h,cpp},SpotSettingsDialog.{h,cpp}, plus the spot-lines wiring inMainWindow.cpp— a new Spot Lines enable/disable UI feature that isn't described in the PR body or referenced in #2380 at all.
This makes the PR hard to review, hard to revert if any one part regresses, and it conflicts with the parallel #2386/#2387 PRs. Could you rebase this branch on a clean main so it contains only the DxClusterClient changes? The TciServer and PanadapterStream work should stay in their dedicated PRs, and the spot-lines feature should land as its own separately-described PR (it's a UI/UX change, not part of fixing #2380).
On the in-scope DxClusterClient change
The logic is solid:
- The
m_connectEpochcounter is a nice touch — it correctly invalidates a stale 10 s timeout from attempt N if attempt N+1 has already started. - The
m_reconnectTimer->isActive()guard inscheduleReconnect()correctly preventserrorOccurredand the connect-timeout lambda from compounding the backoff for the same failed attempt. - The
if (!m_connected)guard inonSocketError()correctly defers toonDisconnected()for live-drop cases (whereerrorOccurredfires beforedisconnected()andm_connectedis still true), avoiding a duplicate schedule. m_reconnectAttemptsreset inonConnected()(line 97) is unchanged, so a successful reconnect mid-backoff correctly restarts the chain at 5 s — matches the test plan.
One minor observation, not blocking:
In connectToCluster(), the new guard m_socket->state() == QAbstractSocket::ConnectingState is fine, but HostLookupState is also a transient pre-connect state where calling connectToHost() again would be similarly wrong. Consider broadening to m_socket->state() != QAbstractSocket::UnconnectedState. Not required to merge — ConnectingState covers the rapid-retry case you described.
Once the diff is trimmed down to just the cluster-client work, this should be ready.
…2380) (#2394) SpotHub (DX cluster / RBN) did not automatically reconnect after a Wi-Fi drop or any other failed connection attempt. The app would try once when the network came back, fail, and stop retrying — requiring a manual reconnect. Root cause: Qt's QAbstractSocket only emits disconnected() on a Connected → Unconnected transition. When a socket fails during ConnectingState (host blocked, refused, or timed out), Qt emits errorOccurred but NOT disconnected. The reconnect timer was only armed inside onDisconnected(), so a single failed connect attempt permanently halted the backoff chain. Two additional failure paths had the same gap: onSocketError() emitted the UI error only and never re-armed the timer, and the 10-second connect timeout lambda called abort() without scheduling a retry. Fix: extracts a scheduleReconnect() helper called from all three failure paths, with two guards — m_intentionalDisconnect (don't override user-initiated disconnect) and m_reconnectTimer->isActive() (prevents double-scheduling when errorOccurred and the connect timeout both fire for the same failed attempt). A per-call epoch counter (m_connectEpoch) captured by the timeout lambda ensures a stale timeout from attempt N cannot abort a later attempt N+1 that has since succeeded. connectToCluster() also now rejects calls when the socket is already in ConnectingState to prevent overlapping attempts during rapid retries. Backoff sequence is unchanged: 5 s → 10 s → 20 s → 40 s → 60 s (cap). Cherry-picked from PR #2388 (M7HNF-Ian). The rest of that branch contained already-merged commits (#2349 Spot Lines toggle, #2386 TCI crash fix), the rejected #2387 spectrum fix, and two unrelated spot-lines naming-refinement commits that belong in their own focused PR. Co-authored-by: Ian M7HNF <[email protected]> Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
|
Claude here on Jeremy's behalf — thanks @M7HNF-Ian, the diagnosis was spot-on (Qt's Closing this PR in favor of #2394. Two follow-up notes on what was on the branch:
Really nice contribution — looking forward to the next one. 73, Jeremy KK7GWY & Claude (AI dev partner) |
Summary
Fixes #2380
SpotHub (DX cluster / RBN) did not automatically reconnect after a Wi-Fi drop. The app would try once when the network came back, fail, and then stop retrying — requiring a manual reconnect from the user.
Root cause
Qt does not emit
disconnected()when a socket inConnectingStatefails — it only emitserrorOccurred. The reconnect timer was only armed insideonDisconnected(), so one failed attempt permanently halted the backoff chain.Two additional failure paths had the same gap:
onSocketError()— emitted UI error only, never re-armed the timerabort()but didn't schedule a retryFix
Extracted a
scheduleReconnect()helper with two guards:m_intentionalDisconnect— won't retry after a user-initiated disconnectm_reconnectTimer->isActive()— prevents double-scheduling whenerrorOccurredand the connect timeout both fire for the same failed attemptAll three failure paths now call
scheduleReconnect():A per-call epoch counter (
m_connectEpoch) is captured in the timeout lambda so a stale timeout from attempt N cannot abort a later attempt N+1 that has since succeeded.connectToCluster()now also rejects calls when the socket is already inConnectingStateto prevent overlapping attempts during rapid retries.Backoff sequence is unchanged: 5 s → 10 s → 20 s → 40 s → 60 s (cap).
Files modified
src/core/DxClusterClient.h— addscheduleReconnect()declaration,m_connectEpochmembersrc/core/DxClusterClient.cpp— implementscheduleReconnect(); updateconnectToCluster(),onDisconnected(),onSocketError()Test plan
m_reconnectAttemptsresets to 0 and the next disconnect starts the backoff from 5 s againReported by luigiverdicchio1-prog (Lou) on Windows 10 with a FLEX-6400.