fix(cluster): restore auto-reconnect after failed connection attempt (#2380) by M7HNF-Ian · Pull Request #2388 · ten9876/AetherSDR

M7HNF-Ian · 2026-05-05T17:29:10Z

Summary

SpotHub (DX cluster / RBN) did not automatically reconnect after a Wi-Fi drop. The app would try once when the network came back, fail, and then stop retrying — requiring a manual reconnect from the user.

Root cause

Qt does not emit disconnected() when a socket in ConnectingState fails — it only emits errorOccurred. The reconnect timer was only armed inside onDisconnected(), so one failed attempt permanently halted the backoff chain.

Two additional failure paths had the same gap:

onSocketError() — emitted UI error only, never re-armed the timer
The 10-second connect-timeout lambda — called abort() but didn't schedule a retry

Fix

Extracted a scheduleReconnect() helper with two guards:

m_intentionalDisconnect — won't retry after a user-initiated disconnect
m_reconnectTimer->isActive() — prevents double-scheduling when errorOccurred and the connect timeout both fire for the same failed attempt

All three failure paths now call scheduleReconnect():

Path	Before	After
Live connection dropped	✅ retried	✅ retried
Connect attempt failed (socket error)	❌ halted	✅ retried
Connect attempt timed out (10 s)	❌ halted	✅ retried

A per-call epoch counter (m_connectEpoch) is captured in the timeout lambda so a stale timeout from attempt N cannot abort a later attempt N+1 that has since succeeded.

connectToCluster() now also rejects calls when the socket is already in ConnectingState to prevent overlapping attempts during rapid retries.

Backoff sequence is unchanged: 5 s → 10 s → 20 s → 40 s → 60 s (cap).

Files modified

src/core/DxClusterClient.h — add scheduleReconnect() declaration, m_connectEpoch member
src/core/DxClusterClient.cpp — implement scheduleReconnect(); update connectToCluster(), onDisconnected(), onSocketError()

Test plan

Block the cluster host in the firewall — observe repeated retry attempts at 5 / 10 / 20 / 40 / 60 s intervals in the debug log
Unblock — confirm automatic reconnect without manual intervention
User-initiated disconnect (SpotHub → Disconnect) — confirm no reconnect attempts are made
Reconnect succeeds mid-backoff — confirm m_reconnectAttempts resets to 0 and the next disconnect starts the backoff from 5 s again
RBN tab: same reconnect behaviour confirmed

Reported by luigiverdicchio1-prog (Lou) on Windows 10 with a FLEX-6400.

Adds a Spot Lines: Enabled/Disabled toggle to SpotHub → Display tab. When disabled, the vertical dotted lines drawn from the spectrum base up to each spot label are hidden. Useful during contest operation where dozens of simultaneous spots create visual clutter that obscures the FFT trace. Spot callsigns and frequencies remain fully visible. Setting persists via AppSettings key SpotShowLines, defaults to Enabled, and takes effect immediately without a restart.

Per triage feedback on PR ten9876#2349, align naming with the existing IsSpotsEnabled / IsSpotsOverrideColorsEnabled / setSpotOverrideColors pattern: - AppSettings key: SpotShowLines → IsSpotsLinesEnabled - SpectrumWidget setter: setSpotLines → setSpotShowLines - SpectrumWidget getter: spotLines() → spotShowLines() - SpectrumWidget member: m_spotLines → m_spotShowLines - SpotSettingsDialog member: m_spotLines → m_spotShowLines Add a one-shot migration in MainWindow ctor to silently upgrade any saved SpotShowLines key to IsSpotsLinesEnabled on first run. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

- Remove the SpotShowLines → IsSpotsLinesEnabled migration block from MainWindow ctor: the old key was never in a released build so no real users can have it set, making the migration dead weight. - Align SpotSettingsDialog tooltip with DxClusterDialog: both now say "Show vertical lines from the spectrum up to each spot label.\nDisable during contests to reduce clutter." (was truncated in legacy dialog). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…en9876#2385) TciServer is a QObject child of MainWindow, so Qt deletes it during ~QWidget::deleteChildren() — which runs *after* MainWindow's value members (including m_radioModel) have already been destroyed. This meant stop() → releaseDaxForTci() → m_model->isConnected() dereferenced a destroyed object on every clean exit when a radio was connected, causing EXC_BAD_ACCESS at 0x38 on the main thread. Primary fix: explicitly delete m_tciServer in ~MainWindow() after the audio thread is stopped (no more daxAudioReady cross-thread signals) but while m_radioModel is still fully alive (DAX stream-remove commands reach the radio). TciApplet's raw back-reference is nulled first via setTciServer(nullptr) to prevent a dangling pointer surviving in the widget tree between the explicit delete and ~QWidget::deleteChildren(). Belt-and-braces: promote TciServer::m_model from RadioModel* to QPointer<RadioModel>. QPointer auto-clears when the pointee is destroyed, so the existing `if (!m_model) return;` guard in releaseDaxForTci() now covers the post-destruct case — a future refactor cannot silently re-introduce the crash.

…wport (ten9876#2384) The radio encodes FFT bins as fractional pixel positions within its own internal dBm range. Adjusting the display viewport (min_dbm/max_dbm) does NOT cause the radio to re-encode subsequent frames with the new range. Previously, RadioModel::handlePanadapterStatus called PanadapterStream::setDbmRange on every min_dbm/max_dbm echo — including viewport changes triggered by the user dragging the amplitude scale. This caused the decode formula: bins[i] = maxDbm − (buf[i] / (yPix−1)) × (maxDbm − minDbm) to use the new viewport range while the radio was still encoding with its original range, producing a permanent dBm shift equal to the viewport offset. Two independent reporters confirmed a ~30–40 dBm shift matching their scale adjustment (ten9876#2384, ten9876#2310). Fix: PanadapterStream::setDbmRange now locks the decode range to the first value reported per stream and silently ignores subsequent calls for that stream. The initial value comes from the radio's first status echo (typically the −130/−40 range AetherSDR requests at pan creation, or a profile-restored range if the pan was pre-existing). The decode range is unlocked when the stream is unregistered, so reconnects and pan recreations pick up the correct initial range. The display viewport path (PanadapterModel::levelChanged → SpectrumWidget::setDbmRange) is unchanged and continues to handle scale/offset changes correctly.

…en9876#2380) The exponential-backoff reconnect chain broke permanently the first time a connection attempt failed while the network was down. The root cause was that Qt does not emit disconnected() when a socket in ConnectingState encounters an error — only disconnected() armed the reconnect timer, so one failed attempt silently halted all further retries. Three paths now all call the new scheduleReconnect() helper: 1. onDisconnected() — live connection dropped (was already working) 2. onSocketError() — connection attempt failed before reaching Connected 3. Connect-timeout lambda — 10 s timeout while still in ConnectingState scheduleReconnect() guards against double-scheduling with an isActive() check (errorOccurred and the timeout can both fire for a single attempt), and against firing after an intentional disconnect. A per-call epoch counter (m_connectEpoch) is captured in the timeout lambda so a stale timeout from attempt N cannot abort a later attempt N+1 that has since succeeded. connectToCluster() now also guards against overlapping calls when the socket is already in ConnectingState. Affects DX cluster and RBN (both use DxClusterClient). SpotCollector and WSJT-X (UDP) are unaffected.

aethersdr-agent

Thanks for the careful root-cause work on this — the cluster reconnect logic itself looks well-reasoned. Before reviewing further, though, I need to flag a scope issue that affects mergeability.

Scope: diff doesn't match the PR description

The PR description says only DxClusterClient.h and DxClusterClient.cpp are modified, but the diff touches 11 files and bundles three independent concerns:

DxClusterClient.{h,cpp} — the actual auto-reconnect fix described here. ✅ matches the description.
TciServer.h + MainWindow.cpp (the ~MainWindow() delete m_tciServer block) — these are the same changes in your PR #2386 (TciServer crash on quit).
PanadapterStream.{h,cpp} — these are the same changes in your PR #2387 (FFT decode-range lock).
DxClusterDialog.cpp, SpectrumWidget.{h,cpp}, SpotSettingsDialog.{h,cpp}, plus the spot-lines wiring in MainWindow.cpp — a new Spot Lines enable/disable UI feature that isn't described in the PR body or referenced in #2380 at all.

This makes the PR hard to review, hard to revert if any one part regresses, and it conflicts with the parallel #2386/#2387 PRs. Could you rebase this branch on a clean main so it contains only the DxClusterClient changes? The TciServer and PanadapterStream work should stay in their dedicated PRs, and the spot-lines feature should land as its own separately-described PR (it's a UI/UX change, not part of fixing #2380).

On the in-scope `DxClusterClient` change

The logic is solid:

The m_connectEpoch counter is a nice touch — it correctly invalidates a stale 10 s timeout from attempt N if attempt N+1 has already started.
The m_reconnectTimer->isActive() guard in scheduleReconnect() correctly prevents errorOccurred and the connect-timeout lambda from compounding the backoff for the same failed attempt.
The if (!m_connected) guard in onSocketError() correctly defers to onDisconnected() for live-drop cases (where errorOccurred fires before disconnected() and m_connected is still true), avoiding a duplicate schedule.
m_reconnectAttempts reset in onConnected() (line 97) is unchanged, so a successful reconnect mid-backoff correctly restarts the chain at 5 s — matches the test plan.

One minor observation, not blocking:

In connectToCluster(), the new guard m_socket->state() == QAbstractSocket::ConnectingState is fine, but HostLookupState is also a transient pre-connect state where calling connectToHost() again would be similarly wrong. Consider broadening to m_socket->state() != QAbstractSocket::UnconnectedState. Not required to merge — ConnectingState covers the rapid-retry case you described.

Once the diff is trimmed down to just the cluster-client work, this should be ready.

…2380) (#2394) SpotHub (DX cluster / RBN) did not automatically reconnect after a Wi-Fi drop or any other failed connection attempt. The app would try once when the network came back, fail, and stop retrying — requiring a manual reconnect. Root cause: Qt's QAbstractSocket only emits disconnected() on a Connected → Unconnected transition. When a socket fails during ConnectingState (host blocked, refused, or timed out), Qt emits errorOccurred but NOT disconnected. The reconnect timer was only armed inside onDisconnected(), so a single failed connect attempt permanently halted the backoff chain. Two additional failure paths had the same gap: onSocketError() emitted the UI error only and never re-armed the timer, and the 10-second connect timeout lambda called abort() without scheduling a retry. Fix: extracts a scheduleReconnect() helper called from all three failure paths, with two guards — m_intentionalDisconnect (don't override user-initiated disconnect) and m_reconnectTimer->isActive() (prevents double-scheduling when errorOccurred and the connect timeout both fire for the same failed attempt). A per-call epoch counter (m_connectEpoch) captured by the timeout lambda ensures a stale timeout from attempt N cannot abort a later attempt N+1 that has since succeeded. connectToCluster() also now rejects calls when the socket is already in ConnectingState to prevent overlapping attempts during rapid retries. Backoff sequence is unchanged: 5 s → 10 s → 20 s → 40 s → 60 s (cap). Cherry-picked from PR #2388 (M7HNF-Ian). The rest of that branch contained already-merged commits (#2349 Spot Lines toggle, #2386 TCI crash fix), the rejected #2387 spectrum fix, and two unrelated spot-lines naming-refinement commits that belong in their own focused PR. Co-authored-by: Ian M7HNF <[email protected]> Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>

ten9876 · 2026-05-06T02:58:42Z

Claude here on Jeremy's behalf — thanks @M7HNF-Ian, the diagnosis was spot-on (Qt's disconnected vs. errorOccurred asymmetry on ConnectingState failures is exactly the kind of subtle Qt thing that's easy to miss). Hardware-tested by Jeremy on FLEX-8600: firewall block triggers retries at the expected 5/10/20/40/60 s intervals, unblock mid-backoff connects without manual intervention, and intentional disconnect doesn't auto-retry. Cherry-picked the cluster commit onto current main and shipped as a847239 via #2394. You're preserved as Co-Author on the squash-merge commit.

Closing this PR in favor of #2394. Two follow-up notes on what was on the branch:

The two spots-naming refinement commits (b3d5cb84 "feat(spots): rename spot-lines identifiers" and 6b0f38be "fix(spots): drop dead migration block") look like sound cleanup — they address the triage feedback you got on Add Spot Lines toggle to SpotHub Display settings #2349 — and would land cleanly on current main as their own focused PR. Worth opening separately so they don't get lost.
The spectrum-decode-range commit (74195f25) from PR fix(spectrum): investigate FFT decode range drift on amplitude scale drag (#2384) #2387 — Jeremy hardware-tested the cherry-pick of just that commit and the symptom is unchanged 1:1 (10 dB scale drag = 10 dB noise floor drift). Mechanical implication is that the radio re-encodes bins on viewport change, contrary to the PR's premise. Full notes in fix(spectrum): investigate FFT decode range drift on amplitude scale drag (#2384) #2387; happy to pair on Wireshark / FlexLib comparison if you want to dig in further.

Really nice contribution — looking forward to the next one.

73, Jeremy KK7GWY & Claude (AI dev partner)

M7HNF-Ian and others added 6 commits May 4, 2026 15:15

M7HNF-Ian requested review from jensenpat and ten9876 as code owners May 5, 2026 17:29

aethersdr-agent Bot reviewed May 5, 2026

View reviewed changes

ten9876 mentioned this pull request May 6, 2026

fix(cluster): restore auto-reconnect after failed connection attempt (#2380) #2394

Merged

ten9876 closed this May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cluster): restore auto-reconnect after failed connection attempt (#2380)#2388

fix(cluster): restore auto-reconnect after failed connection attempt (#2380)#2388
M7HNF-Ian wants to merge 6 commits intoten9876:mainfrom
M7HNF-Ian:fix/spotshub-autoconnect-after-network-loss

M7HNF-Ian commented May 5, 2026

Uh oh!

aethersdr-agent Bot left a comment

Uh oh!

ten9876 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

M7HNF-Ian commented May 5, 2026

Summary

Root cause

Fix

Files modified

Test plan

Uh oh!

aethersdr-agent Bot left a comment

Choose a reason for hiding this comment

Scope: diff doesn't match the PR description

On the in-scope DxClusterClient change

Uh oh!

ten9876 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

On the in-scope `DxClusterClient` change