Fix sequenced radio disconnect teardown#2247
Conversation
There was a problem hiding this comment.
Thanks for the well-researched fix @jensenpat — the PR description and root-cause analysis are clear, and the overall approach (sequenced teardown with sendCommandAndWait, removing the self client disconnect, and adding the 0x04 marker) is sound.
A few things I'd like to discuss:
1. QEventLoop re-entrancy in sendCommandAndWait
The nested QEventLoop in sendCommandAndWait is invoked during disconnect teardown. Because loop.exec() pumps all events, other queued signals (status updates, further command responses, timer firings) will be delivered while we're waiting for the stream-remove ACK. This is generally safe here since heartbeat is already stopped and we're tearing down, but it's worth a comment noting this is intentional and safe only because we're in teardown state.
More concretely: if onSocketError fires during the loop (e.g., the radio drops TCP mid-teardown), the new code at line ~249 calls setState(Disconnected) which emits stateChanged — but the loop also connects m_socket::disconnected to quit. Is there a race where the loop could exit via the timer and the disconnect signal fires afterward, leaving dangling state? The disconnect() calls after loop.exec() should prevent double-delivery, but worth a quick sanity check.
2. writeDisconnectMarker() called unconditionally in disconnectFromRadio()
disconnectFromRadio() is now called from both gracefulDisconnect (after sequenced teardown) and from the destructor / other paths. The 0x04 marker is correct for self-disconnect, but is it safe to send during a forced disconnect where the radio already kicked us? If the socket is in an error state, write() will silently fail (which is fine), but the waitForBytesWritten(250) will block for 250ms unnecessarily. Consider guarding:
void RadioConnection::writeDisconnectMarker()
{
if (!m_socket || m_socket->state() != QAbstractSocket::ConnectedState)
return;
// ...
}Using ConnectedState instead of just checking != UnconnectedState avoids blocking on sockets in ClosingState or error states.
3. m_reconnectTimer retry on connection error
The new retry logic in onConnectionError starts m_reconnectTimer on a refused connection. The timer lambda calls m_connection->connectToRadio(m_lastInfo) — but connectToHost has an early return if m_state != Disconnected. After onSocketError sets Error state and then Disconnected (new code), this should work. Just confirming: is there a window where m_state is Error when the timer fires? If so, the reconnect would silently no-op. The new setState(Disconnected) after error on UnconnectedState should cover this, but it's a subtle ordering dependency.
4. forceDisconnect capture change — nice
The [conn = m_connection, handle] capture in forceDisconnect is a good fix — capturing this in a non-blocking lambda where the object might be destroyed is risky. Good catch.
5. Minor: m_reconnectTimer.stop() in onConnected
Correct addition — prevents a stale timer from firing after we've already reconnected successfully.
Overall this is a solid fix with good diagnostics logging. The sequenced approach matches how FlexLib handles teardown. The main actionable item is #2 (tightening the guard in writeDisconnectMarker to avoid a needless 250ms block on error-state sockets). The rest are observations for your consideration.
Nice work, and thanks for tracking down the client disconnect self-rejection behavior.
Reliability sweep release. Headline is jensenpat's DAX2 coexistence policy refactor (#2271): platform-aware decision table, lazy dax_tx stream creation, anti-stomp on incoming foreign DAX TX status, LAN VITA UDP rebind on Flex's "Port/IP pair already in use" error, dead- orphan DAX RX detection, and 24+ test assertions. Substantial reliability fixes around disconnect teardown (#2247 — sequenced stream remove with response wait, dropped self-disconnect, dying-gasp byte), worker-thread shutdown (#2248 — no more killTimer warnings), and panadapter zoom (#2246 — no spectrum flash). Closes 12 issues via two paired triage-cleanup sweep PRs (#2270 + #2272) covering small UI bugs, contrast, indicator routing, click-to- tune behaviour, and visible regressions from recent feature work. Also includes the #2273 follow-up to #2271: setDax() and DAX TX stream creation are now both gated on platforms that can actually feed DAX audio, so Linux non-PipeWire builds keep digital TX on the physical mic input rather than going silent. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
|
Wohoo! Thanks, team, for your efforts on this sticky TCP/session management issue. I've connected/disconnected multiple times in a row this morning, and no hangs/lockups. I'll spend the day playing around with BCB listening and FT8. Will put AetherSDR to work today with 3rd party programs, DAX, CAT, logging, etc. Appreciate your efforts on AetherSDR. This looks to be a fantastic piece of software. BTW, I love how you guys have implemented DAX and CAT. It just works. Too bad y'all didn't write the Windows version of DAX. :-) 73, -Jason, NK9B |
Summary
stream removeand wait for the radio response before closing TCP.client disconnectfor our own handle; the radio rejects that path, so normal self-disconnect now sends the Flex-style0x04marker and closes the socket.Root Cause
Issue #2218 logs showed that AetherSDR could close TCP before the radio processed teardown, leaving the radio unavailable on the next immediate reconnect. Follow-up local logs also showed that
client disconnect <our handle>returns50000062 Cannot disconnect yourself, which matches FlexLib behavior: self-disconnect closes the command connection rather than kicking its own client handle.Fixes #2113
Refs #2218
Validation
stream removereceivedR111|0|, TCP disconnected cleanly, and no selfclient disconnectwas sent.git diff --checkcmake --build build --target AetherSDR -j10👨🏼💻 Generated with OpenAI Codex (GPT-5.5 Pro 4/23) and tested by @jensenpat