Fix SmartLink reconnect after WAN drop#2282
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves SmartLink/WAN resilience by driving an explicit reconnect flow after unexpected WAN/TLS drops, tightening ping-miss disconnect behavior so the UI transitions promptly, and reducing macOS GPU teardown crash risk during disconnect/shutdown.
Changes:
- Add a MainWindow-owned WAN reconnect timer that re-requests SmartLink WAN connections after unexpected disconnects, and stops on unrecoverable auth failures.
- Extend disconnect forcing to cover WAN connections and make ping-miss disconnect one-shot to avoid repeated disconnect storms.
- Release panadapter GPU resources during shutdown to reduce QRhi/Metal teardown crashes.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/models/RadioModel.h | Adds a flag to make ping-miss disconnect one-shot per outage. |
| src/models/RadioModel.cpp | Forces WAN disconnect on ping-miss and adjusts WAN disconnect handling. |
| src/gui/PanadapterStack.cpp | Releases SpectrumWidget GPU resources during shutdown. |
| src/gui/MainWindow.h | Adds WAN reconnect timer + reconnect helper slot; adjusts WAN connect API. |
| src/gui/MainWindow.cpp | Implements SmartLink/WAN reconnect loop and stops it on user disconnect/auth failure. |
| src/core/SmartLinkClient.h | Adds SmartLinkClient::reconnect() API. |
| src/core/SmartLinkClient.cpp | Implements reconnect and stops retrying on registration_invalid. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| qCDebug(lcProtocol) << "RadioModel: WAN disconnected"; | ||
| m_wanConn->disconnect(this); | ||
| m_wanConn = nullptr; | ||
| } else if (!m_intentionalDisconnect && !m_lastInfo.address.isNull()) { |
There was a problem hiding this comment.
Addressed in 3d039bc. Manual WAN disconnect now keeps the WanConnection alive long enough to run RadioModel::onDisconnected(): disconnectFromRadio wires a one-shot disconnected handler, calls disconnectFromRadio, and immediately runs cleanup if the socket is already idle.
| void SmartLinkClient::reconnect() | ||
| { | ||
| if (m_serverConnected) { | ||
| return; | ||
| } | ||
|
|
||
| if (!m_refreshToken.isEmpty()) { | ||
| loginWithRefreshToken(m_refreshToken); | ||
| return; | ||
| } | ||
|
|
||
| if (m_idToken.isEmpty()) { | ||
| return; | ||
| } | ||
|
|
||
| connectToServer(); | ||
| } |
There was a problem hiding this comment.
Addressed in 3d039bc. SmartLinkClient now guards connectToServer on an idle socket, aborts a Closing socket during reconnect, and skips duplicate Auth0 refresh work while a refresh request is already in progress.
| void MainWindow::requestWanReconnect() | ||
| { | ||
| if (m_userDisconnected || m_radioModel.isConnected() | ||
| || m_pendingWanRadio.serial.isEmpty()) { | ||
| m_wanReconnectTimer.stop(); | ||
| return; | ||
| } | ||
|
|
||
| m_connPanel->setStatusText("Reconnecting via SmartLink..."); | ||
| setPanadapterConnectionAnimation(true, "Reconnecting to remote radio..."); | ||
|
|
||
| if (!m_smartLink.isConnected()) { | ||
| m_smartLink.reconnect(); | ||
| m_wanReconnectTimer.start(); | ||
| return; | ||
| } | ||
|
|
||
| startWanRadioConnect(m_pendingWanRadio, false); | ||
| m_wanReconnectTimer.start(); | ||
| } |
There was a problem hiding this comment.
Addressed in 3d039bc. MainWindow now tracks m_wanReconnectAttemptInProgress so the 5s timer does not stack overlapping reconnect attempts. The flag is cleared on auth failure, SmartLink server connect/disconnect, connect_ready, WAN connected/disconnected, user disconnect, and shutdown.
| m_connPanel->setStatusText("Reconnecting via SmartLink..."); | ||
| setPanadapterConnectionAnimation(true, "Reconnecting to remote radio..."); |
There was a problem hiding this comment.
Addressed in 3d039bc. The new reconnect status strings now use the same ellipsis style as the surrounding connection UI.
|
Hi @rfoust — thanks again for the contribution! Good news: this CI failure is not related to your code. What failedThe only red check on
Recommended action: nothing on your end. A maintainer can re-run that job, or it can simply be ignored — it doesn't affect the buildability or correctness of your patch. If it keeps failing across PRs, that's a workflow issue to address in Heads-up on the Copilot review commentsWhile you wait, the Copilot review flagged a few things on the related PRs in this stack that are worth a look (these are review notes, not CI failures):
None of these block the CI; just flagging them so they don't get lost. Really appreciate the careful work on the SmartLink/WAN disconnect plumbing — this is gnarly state-machine territory and the fix path is the right shape. |
There was a problem hiding this comment.
Thanks @rfoust — this is a thoughtful, well-scoped fix and the round of follow-up commits (3d039bc) addresses all four Copilot findings cleanly:
m_authRequestInProgressand thestate() != UnconnectedStateguard inconnectToServer()/reconnect()correctly de-dup Auth0 refresh and TLS connect attempts.m_wanReconnectAttemptInProgressprevents the 5s timer from stacking overlapping reconnect attempts. Reset paths (auth fail, server connect/disconnect, connect_ready, WAN connect/disconnect, user disconnect, shutdown) all look complete.- The new one-shot
disconnectedhandler inRadioModel::disconnectFromRadio()plus thewan->isSocketIdle()synchronous fallback gives WAN sessions the sameonDisconnected()cleanup as LAN, which fixes the gap Copilot called out and lines up with #2278. registration_invalidnow actually closes the socket and clears auth state instead of just emitting the signal — that should stop the stale "authenticated but no token" state.
A couple of small things to consider, all non-blocking:
-
requestWanReconnect()retry cadence — fixed 5 s with no backoff/jitter. With the in-progress flag this won't stack requests, but during a long outage it will hammer Auth0/SmartLink at a steady 5 s/attempt completion. Not critical, but a modest exponential backoff (e.g., 5 → 10 → 30 s, capped) would be friendlier to the SmartLink endpoint and reduce log noise. Easy to add later. -
SmartLinkClient::reconnect()"deferred" path — if the socket is inConnectingState/HostLookupStatewhenreconnect()is called, the function returns without arming any retry. We rely on the eventualserverConnected/serverDisconnectedto clearm_wanReconnectAttemptInProgressand let the next 5 s timer tick try again. That should resolve in practice (Qt will eventually time out the connect), but a comment noting that assumption — or an explicit retry hook — would make the state machine easier to follow. -
RadioModel::onDisconnected()now doesm_wanConn->disconnect(this);— this is defensive but redundant in the manual-disconnect path (we already disconnected indisconnectFromRadio()). Harmless, just worth a brief comment that it covers the unsolicited-disconnect path where signals are still wired. -
PanadapterStack scope —
prepareShutdown()GPU reset is a bit tangential to the SmartLink reconnect story, but the rationale (QRhi/Metal teardown crash observed in the same disconnect flow) is reasonable and the change is minimal. Worth a one-line code comment pointing to the crash stack so a future reader understands why hide+resetGpuResources is needed here in addition to the existing per-window teardown.
Nothing here is a blocker — the core fix is sound and the validation steps are convincing. As you noted, watch for the merge interaction with #2278 since both branches touch RadioModel::disconnectFromRadio().
…connect # Conflicts: # src/gui/MainWindow.cpp # src/models/RadioModel.cpp
The v0.9.5.1 entry was originally cut for just the TCI TX hotfix (#2285), but six additional fixes had already landed in main between the v0.9.5 tag and v0.9.5.1: - #2275 NR2 wisdom generation off the audio thread - #2278 SmartLink disconnect teardown - #2279 Reset RX slice tabs on disconnect (#2254) - #2280 macOS panadapter pop-out refresh + multi-pan dock layout - #2282 SmartLink reconnect after WAN drop - #2284 Qt log handler serialization fixes macOS tune-time crash CHANGELOG.md gets a per-fix entry with root cause + behavior change. WhatsNewData.cpp regenerated via scripts/gen_whatsnew.py so the in-app What's New dialog reflects the full v0.9.5.1 contents. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Match the v0.9.5 `(#NNNN, contributor)` heading style and call out @rfoust and @jensenpat in the summary for the bulk of this release. Authors per PR: - @rfoust: #2278, #2279, #2280, #2282 (SmartLink + macOS popout) - @jensenpat: #2275, #2284 (NR2 wisdom + Qt log serialization) WhatsNewData.cpp regenerated. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Summary
Bug Addressed
When the network drops during SmartLink operation, AetherSDR can show the "Radio disconnected - waiting for reconnect" popup without actually recovering. The terminal logs showed repeated ping-miss "forcing disconnect" messages, followed by WAN socket closure and no successful SmartLink reconnect. Clicking Disconnect from that stale state could also lead to a macOS QRhi/Metal teardown crash during main-window destruction.
Fix Details
MainWindownow owns a WAN reconnect timer that re-requests a SmartLink radio connection using the last selected WAN radio whenRadioModelreports an unexpected WAN disconnect.SmartLinkClient::reconnect()refreshes Auth0 credentials via the saved refresh token before reconnecting to the SmartLink server, avoiding reuse of an expiredid_token.RadioModel::forceDisconnect()now handles WAN connections, so missed pings transition the app to disconnected/reconnecting promptly instead of waiting for the TLS socket to notice the dead network.PanadapterStack::prepareShutdown()hidesSpectrumWidgetinstances and releases GPU resources before main-window teardown, matching the crash stack seen in QRhi backing-store cleanup.Validation
cmake --build build.remote_audio_rxwas recreated/adopted.RadioModel.cpp, but in different paths: PR Fix SmartLink disconnect teardown #2278 keeps manual WAN disconnect teardown asynchronous, while this PR handles unexpected WAN ping failure/reconnect. If both land together, that shared file needs normal maintainer review during merge.