Skip to content

fix(cluster): restore auto-reconnect after failed connection attempt (#2380)#2388

Closed
M7HNF-Ian wants to merge 6 commits intoten9876:mainfrom
M7HNF-Ian:fix/spotshub-autoconnect-after-network-loss
Closed

fix(cluster): restore auto-reconnect after failed connection attempt (#2380)#2388
M7HNF-Ian wants to merge 6 commits intoten9876:mainfrom
M7HNF-Ian:fix/spotshub-autoconnect-after-network-loss

Conversation

@M7HNF-Ian
Copy link
Copy Markdown
Contributor

Summary

Fixes #2380

SpotHub (DX cluster / RBN) did not automatically reconnect after a Wi-Fi drop. The app would try once when the network came back, fail, and then stop retrying — requiring a manual reconnect from the user.

Root cause

Qt does not emit disconnected() when a socket in ConnectingState fails — it only emits errorOccurred. The reconnect timer was only armed inside onDisconnected(), so one failed attempt permanently halted the backoff chain.

Two additional failure paths had the same gap:

  • onSocketError() — emitted UI error only, never re-armed the timer
  • The 10-second connect-timeout lambda — called abort() but didn't schedule a retry

Fix

Extracted a scheduleReconnect() helper with two guards:

  • m_intentionalDisconnect — won't retry after a user-initiated disconnect
  • m_reconnectTimer->isActive() — prevents double-scheduling when errorOccurred and the connect timeout both fire for the same failed attempt

All three failure paths now call scheduleReconnect():

Path Before After
Live connection dropped ✅ retried ✅ retried
Connect attempt failed (socket error) ❌ halted ✅ retried
Connect attempt timed out (10 s) ❌ halted ✅ retried

A per-call epoch counter (m_connectEpoch) is captured in the timeout lambda so a stale timeout from attempt N cannot abort a later attempt N+1 that has since succeeded.

connectToCluster() now also rejects calls when the socket is already in ConnectingState to prevent overlapping attempts during rapid retries.

Backoff sequence is unchanged: 5 s → 10 s → 20 s → 40 s → 60 s (cap).

Files modified

  • src/core/DxClusterClient.h — add scheduleReconnect() declaration, m_connectEpoch member
  • src/core/DxClusterClient.cpp — implement scheduleReconnect(); update connectToCluster(), onDisconnected(), onSocketError()

Test plan

  • Block the cluster host in the firewall — observe repeated retry attempts at 5 / 10 / 20 / 40 / 60 s intervals in the debug log
  • Unblock — confirm automatic reconnect without manual intervention
  • User-initiated disconnect (SpotHub → Disconnect) — confirm no reconnect attempts are made
  • Reconnect succeeds mid-backoff — confirm m_reconnectAttempts resets to 0 and the next disconnect starts the backoff from 5 s again
  • RBN tab: same reconnect behaviour confirmed

Reported by luigiverdicchio1-prog (Lou) on Windows 10 with a FLEX-6400.

M7HNF-Ian and others added 6 commits May 4, 2026 15:15
Adds a Spot Lines: Enabled/Disabled toggle to SpotHub → Display tab.
When disabled, the vertical dotted lines drawn from the spectrum base up
to each spot label are hidden. Useful during contest operation where
dozens of simultaneous spots create visual clutter that obscures the FFT
trace. Spot callsigns and frequencies remain fully visible.

Setting persists via AppSettings key SpotShowLines, defaults to Enabled,
and takes effect immediately without a restart.
Per triage feedback on PR ten9876#2349, align naming with the existing
IsSpotsEnabled / IsSpotsOverrideColorsEnabled / setSpotOverrideColors
pattern:

- AppSettings key: SpotShowLines → IsSpotsLinesEnabled
- SpectrumWidget setter: setSpotLines → setSpotShowLines
- SpectrumWidget getter: spotLines() → spotShowLines()
- SpectrumWidget member: m_spotLines → m_spotShowLines
- SpotSettingsDialog member: m_spotLines → m_spotShowLines

Add a one-shot migration in MainWindow ctor to silently upgrade any
saved SpotShowLines key to IsSpotsLinesEnabled on first run.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Remove the SpotShowLines → IsSpotsLinesEnabled migration block from
  MainWindow ctor: the old key was never in a released build so no real
  users can have it set, making the migration dead weight.
- Align SpotSettingsDialog tooltip with DxClusterDialog: both now say
  "Show vertical lines from the spectrum up to each spot label.\nDisable
  during contests to reduce clutter." (was truncated in legacy dialog).

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…en9876#2385)

TciServer is a QObject child of MainWindow, so Qt deletes it during
~QWidget::deleteChildren() — which runs *after* MainWindow's value
members (including m_radioModel) have already been destroyed. This meant
stop() → releaseDaxForTci() → m_model->isConnected() dereferenced a
destroyed object on every clean exit when a radio was connected, causing
EXC_BAD_ACCESS at 0x38 on the main thread.

Primary fix: explicitly delete m_tciServer in ~MainWindow() after the
audio thread is stopped (no more daxAudioReady cross-thread signals) but
while m_radioModel is still fully alive (DAX stream-remove commands reach
the radio). TciApplet's raw back-reference is nulled first via
setTciServer(nullptr) to prevent a dangling pointer surviving in the
widget tree between the explicit delete and ~QWidget::deleteChildren().

Belt-and-braces: promote TciServer::m_model from RadioModel* to
QPointer<RadioModel>. QPointer auto-clears when the pointee is destroyed,
so the existing `if (!m_model) return;` guard in releaseDaxForTci() now
covers the post-destruct case — a future refactor cannot silently
re-introduce the crash.
…wport (ten9876#2384)

The radio encodes FFT bins as fractional pixel positions within its own
internal dBm range. Adjusting the display viewport (min_dbm/max_dbm) does
NOT cause the radio to re-encode subsequent frames with the new range.

Previously, RadioModel::handlePanadapterStatus called
PanadapterStream::setDbmRange on every min_dbm/max_dbm echo — including
viewport changes triggered by the user dragging the amplitude scale. This
caused the decode formula:

  bins[i] = maxDbm − (buf[i] / (yPix−1)) × (maxDbm − minDbm)

to use the new viewport range while the radio was still encoding with its
original range, producing a permanent dBm shift equal to the viewport
offset. Two independent reporters confirmed a ~30–40 dBm shift matching
their scale adjustment (ten9876#2384, ten9876#2310).

Fix: PanadapterStream::setDbmRange now locks the decode range to the
first value reported per stream and silently ignores subsequent calls for
that stream. The initial value comes from the radio's first status echo
(typically the −130/−40 range AetherSDR requests at pan creation, or a
profile-restored range if the pan was pre-existing). The decode range is
unlocked when the stream is unregistered, so reconnects and pan
recreations pick up the correct initial range.

The display viewport path (PanadapterModel::levelChanged →
SpectrumWidget::setDbmRange) is unchanged and continues to handle
scale/offset changes correctly.
…en9876#2380)

The exponential-backoff reconnect chain broke permanently the first time a
connection attempt failed while the network was down. The root cause was
that Qt does not emit disconnected() when a socket in ConnectingState
encounters an error — only disconnected() armed the reconnect timer, so
one failed attempt silently halted all further retries.

Three paths now all call the new scheduleReconnect() helper:

1. onDisconnected() — live connection dropped (was already working)
2. onSocketError()  — connection attempt failed before reaching Connected
3. Connect-timeout lambda — 10 s timeout while still in ConnectingState

scheduleReconnect() guards against double-scheduling with an isActive()
check (errorOccurred and the timeout can both fire for a single attempt),
and against firing after an intentional disconnect.

A per-call epoch counter (m_connectEpoch) is captured in the timeout
lambda so a stale timeout from attempt N cannot abort a later attempt N+1
that has since succeeded.

connectToCluster() now also guards against overlapping calls when the
socket is already in ConnectingState.

Affects DX cluster and RBN (both use DxClusterClient). SpotCollector and
WSJT-X (UDP) are unaffected.
Copy link
Copy Markdown
Contributor

@aethersdr-agent aethersdr-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the careful root-cause work on this — the cluster reconnect logic itself looks well-reasoned. Before reviewing further, though, I need to flag a scope issue that affects mergeability.

Scope: diff doesn't match the PR description

The PR description says only DxClusterClient.h and DxClusterClient.cpp are modified, but the diff touches 11 files and bundles three independent concerns:

  1. DxClusterClient.{h,cpp} — the actual auto-reconnect fix described here. ✅ matches the description.
  2. TciServer.h + MainWindow.cpp (the ~MainWindow() delete m_tciServer block) — these are the same changes in your PR #2386 (TciServer crash on quit).
  3. PanadapterStream.{h,cpp} — these are the same changes in your PR #2387 (FFT decode-range lock).
  4. DxClusterDialog.cpp, SpectrumWidget.{h,cpp}, SpotSettingsDialog.{h,cpp}, plus the spot-lines wiring in MainWindow.cpp — a new Spot Lines enable/disable UI feature that isn't described in the PR body or referenced in #2380 at all.

This makes the PR hard to review, hard to revert if any one part regresses, and it conflicts with the parallel #2386/#2387 PRs. Could you rebase this branch on a clean main so it contains only the DxClusterClient changes? The TciServer and PanadapterStream work should stay in their dedicated PRs, and the spot-lines feature should land as its own separately-described PR (it's a UI/UX change, not part of fixing #2380).

On the in-scope DxClusterClient change

The logic is solid:

  • The m_connectEpoch counter is a nice touch — it correctly invalidates a stale 10 s timeout from attempt N if attempt N+1 has already started.
  • The m_reconnectTimer->isActive() guard in scheduleReconnect() correctly prevents errorOccurred and the connect-timeout lambda from compounding the backoff for the same failed attempt.
  • The if (!m_connected) guard in onSocketError() correctly defers to onDisconnected() for live-drop cases (where errorOccurred fires before disconnected() and m_connected is still true), avoiding a duplicate schedule.
  • m_reconnectAttempts reset in onConnected() (line 97) is unchanged, so a successful reconnect mid-backoff correctly restarts the chain at 5 s — matches the test plan.

One minor observation, not blocking:

In connectToCluster(), the new guard m_socket->state() == QAbstractSocket::ConnectingState is fine, but HostLookupState is also a transient pre-connect state where calling connectToHost() again would be similarly wrong. Consider broadening to m_socket->state() != QAbstractSocket::UnconnectedState. Not required to merge — ConnectingState covers the rapid-retry case you described.

Once the diff is trimmed down to just the cluster-client work, this should be ready.

ten9876 added a commit that referenced this pull request May 6, 2026
…2380) (#2394)

SpotHub (DX cluster / RBN) did not automatically reconnect after a Wi-Fi
drop or any other failed connection attempt. The app would try once when
the network came back, fail, and stop retrying — requiring a manual
reconnect.

Root cause: Qt's QAbstractSocket only emits disconnected() on a Connected
→ Unconnected transition. When a socket fails during ConnectingState
(host blocked, refused, or timed out), Qt emits errorOccurred but NOT
disconnected. The reconnect timer was only armed inside onDisconnected(),
so a single failed connect attempt permanently halted the backoff chain.

Two additional failure paths had the same gap: onSocketError() emitted
the UI error only and never re-armed the timer, and the 10-second connect
timeout lambda called abort() without scheduling a retry.

Fix: extracts a scheduleReconnect() helper called from all three failure
paths, with two guards — m_intentionalDisconnect (don't override
user-initiated disconnect) and m_reconnectTimer->isActive() (prevents
double-scheduling when errorOccurred and the connect timeout both fire
for the same failed attempt). A per-call epoch counter (m_connectEpoch)
captured by the timeout lambda ensures a stale timeout from attempt N
cannot abort a later attempt N+1 that has since succeeded.
connectToCluster() also now rejects calls when the socket is already in
ConnectingState to prevent overlapping attempts during rapid retries.

Backoff sequence is unchanged: 5 s → 10 s → 20 s → 40 s → 60 s (cap).

Cherry-picked from PR #2388 (M7HNF-Ian). The rest of that branch contained
already-merged commits (#2349 Spot Lines toggle, #2386 TCI crash fix), the
rejected #2387 spectrum fix, and two unrelated spot-lines naming-refinement
commits that belong in their own focused PR.

Co-authored-by: Ian M7HNF <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
@ten9876
Copy link
Copy Markdown
Owner

ten9876 commented May 6, 2026

Claude here on Jeremy's behalf — thanks @M7HNF-Ian, the diagnosis was spot-on (Qt's disconnected vs. errorOccurred asymmetry on ConnectingState failures is exactly the kind of subtle Qt thing that's easy to miss). Hardware-tested by Jeremy on FLEX-8600: firewall block triggers retries at the expected 5/10/20/40/60 s intervals, unblock mid-backoff connects without manual intervention, and intentional disconnect doesn't auto-retry. Cherry-picked the cluster commit onto current main and shipped as a847239 via #2394. You're preserved as Co-Author on the squash-merge commit.

Closing this PR in favor of #2394. Two follow-up notes on what was on the branch:

Really nice contribution — looking forward to the next one.

73, Jeremy KK7GWY & Claude (AI dev partner)

@ten9876 ten9876 closed this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sport Hub Auto connect not working

2 participants