Skip to content

Fix SmartLink reconnect after WAN drop#2282

Merged
ten9876 merged 3 commits intoten9876:mainfrom
rfoust:codex/fix-wan-auto-reconnect
May 2, 2026
Merged

Fix SmartLink reconnect after WAN drop#2282
ten9876 merged 3 commits intoten9876:mainfrom
rfoust:codex/fix-wan-auto-reconnect

Conversation

@rfoust
Copy link
Copy Markdown
Contributor

@rfoust rfoust commented May 2, 2026

Summary

  • Reconnect SmartLink/WAN sessions after an unexpected radio TLS drop instead of leaving the app stuck in the disconnected/waiting state.
  • Refresh SmartLink auth with the stored refresh token before reconnecting, and stop retrying when SmartLink reports auth cannot recover.
  • Make ping-miss disconnect one-shot and close the WAN connection immediately so the reconnect UI appears promptly when Wi-Fi drops.
  • Reset panadapter GPU resources during shutdown to reduce QRhi/Metal teardown crash risk after disconnect flows.

Bug Addressed

When the network drops during SmartLink operation, AetherSDR can show the "Radio disconnected - waiting for reconnect" popup without actually recovering. The terminal logs showed repeated ping-miss "forcing disconnect" messages, followed by WAN socket closure and no successful SmartLink reconnect. Clicking Disconnect from that stale state could also lead to a macOS QRhi/Metal teardown crash during main-window destruction.

Fix Details

  • MainWindow now owns a WAN reconnect timer that re-requests a SmartLink radio connection using the last selected WAN radio when RadioModel reports an unexpected WAN disconnect.
  • SmartLinkClient::reconnect() refreshes Auth0 credentials via the saved refresh token before reconnecting to the SmartLink server, avoiding reuse of an expired id_token.
  • If SmartLink rejects registration or auth refresh fails while reconnecting, the retry loop stops and the connection panel is shown with a sign-in-required status instead of retrying forever.
  • RadioModel::forceDisconnect() now handles WAN connections, so missed pings transition the app to disconnected/reconnecting promptly instead of waiting for the TLS socket to notice the dead network.
  • The ping watchdog logs and forces disconnect once per outage rather than every second.
  • PanadapterStack::prepareShutdown() hides SpectrumWidget instances and releases GPU resources before main-window teardown, matching the crash stack seen in QRhi backing-store cleanup.

Validation

  • Built locally on macOS with cmake --build build.
  • Tested by dropping Wi-Fi during SmartLink: reconnect started after disconnect, SmartLink reconnected, WAN TLS re-established, UDP registration resumed, and remote_audio_rx was recreated/adopted.
  • Checked against PR Fix SmartLink disconnect teardown #2278. Both branches touch WAN disconnect handling in RadioModel.cpp, but in different paths: PR Fix SmartLink disconnect teardown #2278 keeps manual WAN disconnect teardown asynchronous, while this PR handles unexpected WAN ping failure/reconnect. If both land together, that shared file needs normal maintainer review during merge.

Copilot AI review requested due to automatic review settings May 2, 2026 12:33
@rfoust rfoust requested review from jensenpat and ten9876 as code owners May 2, 2026 12:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves SmartLink/WAN resilience by driving an explicit reconnect flow after unexpected WAN/TLS drops, tightening ping-miss disconnect behavior so the UI transitions promptly, and reducing macOS GPU teardown crash risk during disconnect/shutdown.

Changes:

  • Add a MainWindow-owned WAN reconnect timer that re-requests SmartLink WAN connections after unexpected disconnects, and stops on unrecoverable auth failures.
  • Extend disconnect forcing to cover WAN connections and make ping-miss disconnect one-shot to avoid repeated disconnect storms.
  • Release panadapter GPU resources during shutdown to reduce QRhi/Metal teardown crashes.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/models/RadioModel.h Adds a flag to make ping-miss disconnect one-shot per outage.
src/models/RadioModel.cpp Forces WAN disconnect on ping-miss and adjusts WAN disconnect handling.
src/gui/PanadapterStack.cpp Releases SpectrumWidget GPU resources during shutdown.
src/gui/MainWindow.h Adds WAN reconnect timer + reconnect helper slot; adjusts WAN connect API.
src/gui/MainWindow.cpp Implements SmartLink/WAN reconnect loop and stops it on user disconnect/auth failure.
src/core/SmartLinkClient.h Adds SmartLinkClient::reconnect() API.
src/core/SmartLinkClient.cpp Implements reconnect and stops retrying on registration_invalid.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/models/RadioModel.cpp
qCDebug(lcProtocol) << "RadioModel: WAN disconnected";
m_wanConn->disconnect(this);
m_wanConn = nullptr;
} else if (!m_intentionalDisconnect && !m_lastInfo.address.isNull()) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 3d039bc. Manual WAN disconnect now keeps the WanConnection alive long enough to run RadioModel::onDisconnected(): disconnectFromRadio wires a one-shot disconnected handler, calls disconnectFromRadio, and immediately runs cleanup if the socket is already idle.

Comment on lines +245 to +261
void SmartLinkClient::reconnect()
{
if (m_serverConnected) {
return;
}

if (!m_refreshToken.isEmpty()) {
loginWithRefreshToken(m_refreshToken);
return;
}

if (m_idToken.isEmpty()) {
return;
}

connectToServer();
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 3d039bc. SmartLinkClient now guards connectToServer on an idle socket, aborts a Closing socket during reconnect, and skips duplicate Auth0 refresh work while a refresh request is already in progress.

Comment thread src/gui/MainWindow.cpp
Comment on lines +706 to +725
void MainWindow::requestWanReconnect()
{
if (m_userDisconnected || m_radioModel.isConnected()
|| m_pendingWanRadio.serial.isEmpty()) {
m_wanReconnectTimer.stop();
return;
}

m_connPanel->setStatusText("Reconnecting via SmartLink...");
setPanadapterConnectionAnimation(true, "Reconnecting to remote radio...");

if (!m_smartLink.isConnected()) {
m_smartLink.reconnect();
m_wanReconnectTimer.start();
return;
}

startWanRadioConnect(m_pendingWanRadio, false);
m_wanReconnectTimer.start();
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 3d039bc. MainWindow now tracks m_wanReconnectAttemptInProgress so the 5s timer does not stack overlapping reconnect attempts. The flag is cleared on auth failure, SmartLink server connect/disconnect, connect_ready, WAN connected/disconnected, user disconnect, and shutdown.

Comment thread src/gui/MainWindow.cpp Outdated
Comment on lines +714 to +715
m_connPanel->setStatusText("Reconnecting via SmartLink...");
setPanadapterConnectionAnimation(true, "Reconnecting to remote radio...");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 3d039bc. The new reconnect status strings now use the same ellipsis style as the surrounding connection UI.

@aethersdr-agent
Copy link
Copy Markdown
Contributor

Hi @rfoust — thanks again for the contribution! Good news: this CI failure is not related to your code.

What failed

The only red check on 714ccfd is the Cleanup artifacts job (run 25251999590), specifically the Delete artifacts step. Everything that actually verifies your changes passed:

  • build — success
  • check-paths — success
  • Prepare / Agent / Upload results — success
  • analyze (cpp) — still in progress at the time of this comment

Cleanup artifacts is a post-build housekeeping job that uses gh api to remove the intermediate build artifacts the workflow uploaded. The Delete artifacts step is the only one that failed — it's a CI infrastructure issue (typically a transient GitHub API hiccup or a token-permission edge case when an artifact has already been GC'd), not anything you did.

Recommended action: nothing on your end. A maintainer can re-run that job, or it can simply be ignored — it doesn't affect the buildability or correctness of your patch. If it keeps failing across PRs, that's a workflow issue to address in .github/workflows/, separate from this PR.

Heads-up on the Copilot review comments

While you wait, the Copilot review flagged a few things on the related PRs in this stack that are worth a look (these are review notes, not CI failures):

  • src/models/RadioModel.cpp:1759 — on WAN disconnect, m_wanConn is being nulled before the WanConnection::disconnected signal can reach onDisconnected(), which can skip the panadapter/slice/meter/stream cleanup path and the connectionStateChanged(false) emission. Worth wiring a one-shot disconnected handler before disconnectFromRadio, or calling onDisconnected() directly when the WAN socket is already idle.
  • src/core/SmartLinkClient.cpp:261reconnect() / connectToServer() can call connectToHostEncrypted while the socket is still in Connecting/Connected/Closing. Guarding on m_socket.state() == UnconnectedState (or aborting the prior connection first) would prevent no-op reconnect attempts that race with the MainWindow retry timer.
  • src/gui/MainWindow.cpp:725requestWanReconnect() unconditionally restarts the 5s one-shot timer after kicking off SmartLinkClient::reconnect() or re-requesting connect_ready. With the network down, this can stack overlapping Auth0 refresh requests every 5s. Tracking an "attempt in progress" flag (and ideally adding backoff/jitter) would make the retry path safer.
  • src/gui/MainWindow.cpp:715 — small UI consistency nit: the new status strings use "..." while nearby code uses the Unicode ellipsis "…" (e.g. "Requesting SmartLink connection…"). Worth normalizing.

None of these block the CI; just flagging them so they don't get lost. Really appreciate the careful work on the SmartLink/WAN disconnect plumbing — this is gnarly state-machine territory and the fix path is the right shape.

Copy link
Copy Markdown
Contributor

@aethersdr-agent aethersdr-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rfoust — this is a thoughtful, well-scoped fix and the round of follow-up commits (3d039bc) addresses all four Copilot findings cleanly:

  • m_authRequestInProgress and the state() != UnconnectedState guard in connectToServer()/reconnect() correctly de-dup Auth0 refresh and TLS connect attempts.
  • m_wanReconnectAttemptInProgress prevents the 5s timer from stacking overlapping reconnect attempts. Reset paths (auth fail, server connect/disconnect, connect_ready, WAN connect/disconnect, user disconnect, shutdown) all look complete.
  • The new one-shot disconnected handler in RadioModel::disconnectFromRadio() plus the wan->isSocketIdle() synchronous fallback gives WAN sessions the same onDisconnected() cleanup as LAN, which fixes the gap Copilot called out and lines up with #2278.
  • registration_invalid now actually closes the socket and clears auth state instead of just emitting the signal — that should stop the stale "authenticated but no token" state.

A couple of small things to consider, all non-blocking:

  1. requestWanReconnect() retry cadence — fixed 5 s with no backoff/jitter. With the in-progress flag this won't stack requests, but during a long outage it will hammer Auth0/SmartLink at a steady 5 s/attempt completion. Not critical, but a modest exponential backoff (e.g., 5 → 10 → 30 s, capped) would be friendlier to the SmartLink endpoint and reduce log noise. Easy to add later.

  2. SmartLinkClient::reconnect() "deferred" path — if the socket is in ConnectingState/HostLookupState when reconnect() is called, the function returns without arming any retry. We rely on the eventual serverConnected/serverDisconnected to clear m_wanReconnectAttemptInProgress and let the next 5 s timer tick try again. That should resolve in practice (Qt will eventually time out the connect), but a comment noting that assumption — or an explicit retry hook — would make the state machine easier to follow.

  3. RadioModel::onDisconnected() now does m_wanConn->disconnect(this); — this is defensive but redundant in the manual-disconnect path (we already disconnected in disconnectFromRadio()). Harmless, just worth a brief comment that it covers the unsolicited-disconnect path where signals are still wired.

  4. PanadapterStack scopeprepareShutdown() GPU reset is a bit tangential to the SmartLink reconnect story, but the rationale (QRhi/Metal teardown crash observed in the same disconnect flow) is reasonable and the change is minimal. Worth a one-line code comment pointing to the crash stack so a future reader understands why hide+resetGpuResources is needed here in addition to the existing per-window teardown.

Nothing here is a blocker — the core fix is sound and the validation steps are convincing. As you noted, watch for the merge interaction with #2278 since both branches touch RadioModel::disconnectFromRadio().

…connect

# Conflicts:
#	src/gui/MainWindow.cpp
#	src/models/RadioModel.cpp
@ten9876 ten9876 merged commit 1384452 into ten9876:main May 2, 2026
3 checks passed
ten9876 added a commit that referenced this pull request May 2, 2026
The v0.9.5.1 entry was originally cut for just the TCI TX hotfix
(#2285), but six additional fixes had already landed in main between
the v0.9.5 tag and v0.9.5.1:

- #2275 NR2 wisdom generation off the audio thread
- #2278 SmartLink disconnect teardown
- #2279 Reset RX slice tabs on disconnect (#2254)
- #2280 macOS panadapter pop-out refresh + multi-pan dock layout
- #2282 SmartLink reconnect after WAN drop
- #2284 Qt log handler serialization fixes macOS tune-time crash

CHANGELOG.md gets a per-fix entry with root cause + behavior change.
WhatsNewData.cpp regenerated via scripts/gen_whatsnew.py so the
in-app What's New dialog reflects the full v0.9.5.1 contents.

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
ten9876 added a commit that referenced this pull request May 2, 2026
Match the v0.9.5 `(#NNNN, contributor)` heading style and call out
@rfoust and @jensenpat in the summary for the bulk of this release.

Authors per PR:
- @rfoust: #2278, #2279, #2280, #2282 (SmartLink + macOS popout)
- @jensenpat: #2275, #2284 (NR2 wisdom + Qt log serialization)

WhatsNewData.cpp regenerated.

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants