Skip to content

Have server ack app_heartbeat messages received from client#13810

Merged
vdonato merged 8 commits intodevelopfrom
vdonato/ack-heartbeats-from-server
Feb 6, 2026
Merged

Have server ack app_heartbeat messages received from client#13810
vdonato merged 8 commits intodevelopfrom
vdonato/ack-heartbeats-from-server

Conversation

@vdonato
Copy link
Copy Markdown
Collaborator

@vdonato vdonato commented Feb 4, 2026

Describe your changes

In hosted Streamlit app products like SiS and Community Cloud, we're currently very slow to detect when
a user's internet connection is running into problems. This is because we don't have a mechanism to quickly
detect when our websocket connection runs into trouble, and it's possible for minutes to pass after our
websocket connection actually drops before we finally see the network error dialog.

To make this process faster, we extend the existing SEND_APP_HEARTBEAT mechanism, which exists
to allow the host of a Streamlit app to request that the OSS client send a heartbeat to the server for the sole
purpose of keeping its websocket connection alive in environments where it'll otherwise time out. With these
new changes, we also have the server send an ack that it received this message back to the client. If no
ack is received, we assume that we may have a connection error and attempt to reconnect our websocket
connection, which will cause us to either

  • hit the connection error dialog if the network problems are real, or
  • successfully reconnect with little consequence otherwise.

In the normal OSS world, we don't receive SEND_APP_HEARTBEAT messages, so none of this new code
is ever invoked.

Testing Plan

  • Updated python/typescript unit tests
  • Manual testing performed by:
    • Updated hostframe.html to add a new button to send an app heartbeat
    • Manually removed the code to ack the heartbeat from app_session.py to simulate a network issue
    • Verified that the client disconnects its websocket connection after the timeout.

@vdonato vdonato added security-assessment-completed change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users labels Feb 4, 2026
@snyk-io
Copy link
Copy Markdown
Contributor

snyk-io bot commented Feb 4, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 4, 2026

✅ PR preview is ready!

Name Link
📦 Wheel file https://core-previews.s3-us-west-2.amazonaws.com/pr-13810/streamlit-1.53.1-py3-none-any.whl
📦 @streamlit/component-v2-lib Download from artifacts
🕹️ Preview app pr-13810.streamlit.app (☁️ Deploy here if not accessible)

@vdonato vdonato marked this pull request as ready for review February 4, 2026 00:33
Copilot AI review requested due to automatic review settings February 4, 2026 00:33
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 4, 2026

📈 Frontend coverage change detected

The frontend unit test (vitest) coverage has increased by 0.1800%

  • Current PR: 86.6900% (13901 lines, 1850 missed)
  • Latest develop: 86.5100% (13875 lines, 1871 missed)

🎉 Great job on improving test coverage!

📊 View detailed coverage comparison

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a server-side acknowledgment mechanism for app heartbeat messages to enable faster detection of connection issues in hosted Streamlit environments (SiS and Community Cloud). When the client sends an app heartbeat, the server now responds with a heartbeat_ack message. If the client doesn't receive this acknowledgment within a timeout (29 seconds), it attempts to reconnect, which either surfaces the connection error dialog immediately or successfully reconnects with minimal disruption.

Changes:

  • Added heartbeat_ack field to the ForwardMsg protobuf definition for server-to-client acknowledgment
  • Modified backend _handle_app_heartbeat_request to send acknowledgment ForwardMsg when heartbeat is received
  • Implemented heartbeat timeout tracking in ConnectionManager with automatic reconnection on timeout
  • Added reconnect() method to WebsocketConnection to trigger reconnection without permanent disconnection

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
proto/streamlit/proto/ForwardMsg.proto Added heartbeat_ack boolean field (26) to ForwardMsg for server acknowledgments
lib/streamlit/runtime/app_session.py Modified _handle_app_heartbeat_request to send heartbeat_ack ForwardMsg response
lib/tests/streamlit/runtime/app_session_test.py Added test verifying that heartbeat handler sends proper ack message
frontend/connection/src/constants.ts Added HEARTBEAT_ACK_TIMEOUT_MS constant (29 seconds) for ack timeout
frontend/connection/src/WebsocketConnection.tsx Added reconnect() method to close connection and transition to PINGING_SERVER state
frontend/connection/src/WebsocketConnection.test.tsx Added tests for reconnect behavior in different connection states
frontend/connection/src/ConnectionManager.ts Implemented heartbeat ack timeout tracking and reconnection logic
frontend/connection/src/ConnectionManager.test.ts Added comprehensive tests for heartbeat timeout, ack handling, and cleanup
frontend/app/src/App.tsx Integrated heartbeat ack handling by calling ConnectionManager methods
frontend/app/src/App.test.tsx Added tests verifying onHeartbeatSent and onHeartbeatAckReceived are called correctly
e2e_playwright/test_assets/hostframe.html Added "Send Heartbeat" button for manual testing

@sfc-gh-lmasuch sfc-gh-lmasuch added the ai-review If applied to PR or issue will run AI review workflow label Feb 4, 2026
@github-actions github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 4, 2026

Summary

This PR extends the existing SEND_APP_HEARTBEAT mechanism to include server acknowledgments. When the frontend sends an app_heartbeat BackMsg, the server now responds with a heartbeat_ack ForwardMsg. The frontend tracks these acks and triggers a websocket reconnect if an ack isn't received within 29 seconds (1 second shorter than the expected 30s heartbeat interval in hosted environments).

Main changes:

  • Added heartbeat_ack field (field 26) to ForwardMsg.proto
  • Backend: _handle_app_heartbeat_request() now enqueues a heartbeat_ack ForwardMsg
  • Frontend: ConnectionManager tracks heartbeat timeouts and triggers reconnects on timeout
  • Frontend: WebsocketConnection added reconnect() method that transitions FSM to PINGING_SERVER
  • Frontend: App.tsx handles heartbeatAck messages and notifies ConnectionManager

Code Quality

The code is well-structured and follows existing patterns in the codebase.

Strengths:

  • Clear separation of concerns: ConnectionManager handles timeout tracking, WebsocketConnection handles FSM transitions
  • Proper cleanup in disconnect() to prevent memory leaks (line 227 in ConnectionManager.ts)
  • Good defensive programming with isConnected() check before reconnect (line 250 in ConnectionManager.ts)
  • Well-documented constants with rationale (HEARTBEAT_ACK_TIMEOUT_MS in constants.ts:76-83)
  • Clean docstrings following numpydoc style in the Python code

Minor observations:

  • The awaitingHeartbeatAck flag (line 104 in ConnectionManager.ts) is technically redundant since heartbeatAckTimeoutId !== undefined serves the same purpose, but it adds clarity and is an acceptable defensive pattern.

Test Coverage

Test coverage is comprehensive and follows best practices.

Python tests (app_session_test.py:1279-1291):

  • Tests that heartbeat_ack ForwardMsg is enqueued with correct type
  • Includes negative assertion (WhichOneof("type") == "heartbeat_ack") to verify correct message type

Frontend tests:

  1. ConnectionManager.test.ts (new file, 194 lines):

    • Tests timeout starting when heartbeat is sent
    • Tests reconnect triggered on timeout
    • Tests timeout clearing when new heartbeat sent (proper debounce)
    • Tests timeout clearing on ack received
    • Tests no error when ack received without pending heartbeat
    • Tests timeout cleared on disconnect
    • Tests no reconnect attempt when already disconnected
    • Good use of vi.useFakeTimers() for deterministic timing tests
  2. WebsocketConnection.test.tsx (lines 1018-1032):

    • Tests reconnect() transitions to PINGING_SERVER when connected
    • Tests reconnect() does nothing when not connected (negative assertion)
  3. App.test.tsx (lines 4590-4606):

    • Tests onHeartbeatSent called on SEND_APP_HEARTBEAT message
    • Tests onHeartbeatAckReceived called on heartbeatAck ForwardMsg

E2E test infrastructure:

  • Updated hostframe.html with "Send Heartbeat" button for manual testing

Backwards Compatibility

Fully backwards compatible:

  • Proto field 26 is purely additive; old clients ignore unknown fields
  • In OSS Streamlit, SEND_APP_HEARTBEAT is never sent by hosts, so this code path is never executed
  • Old servers won't send heartbeat_ack, but this only matters in hosted environments that would update both client and server together
  • The reconnect() method uses existing FSM transitions (PINGING_SERVER), not new states

Security & Risk

No security concerns identified:

  • No new attack vectors introduced
  • Heartbeat mechanism doesn't expose sensitive information
  • No new authentication or authorization paths

Risk assessment:

  • Low risk: The feature is opt-in (only triggered by hosted environments sending SEND_APP_HEARTBEAT)
  • Graceful degradation: If server doesn't respond with ack, worst case is a reconnect attempt
  • Timeout value: 29s timeout assumes 30s heartbeat interval. If configured differently, could cause unnecessary reconnects, but this is documented and acceptable for hosted use cases

Recommendations

No blocking issues found. The implementation is clean, well-tested, and follows established patterns.

Optional consideration (non-blocking):

  1. The timeout constant HEARTBEAT_ACK_TIMEOUT_MS = 29 * 1000 assumes a 30s heartbeat interval. If this interval ever becomes configurable per-deployment, consider making the timeout configurable as well. This is documented in the code comments and acceptable for current use cases.

Verdict

APPROVED: This is a well-implemented feature with comprehensive test coverage, proper cleanup handling, and full backwards compatibility. The code follows existing patterns and best practices, and the PR description clearly explains the motivation and testing approach.


This is an automated AI review using opus-4.5-thinking. Please verify the feedback and use your judgment.

* Timeout for heartbeat acknowledgment in milliseconds.
* If we send a heartbeat and don't receive an ack within this time,
* we consider the connection unhealthy. This time is set to be 1 second
* shorter than the heartbeat interval of 30s that we expect to be commonly
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is there any possibility of race conditions with this value?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. We could have the ack be received and timeout processed nearly simultaneously, but since browser JS is single-threaded and non-preemptive, we should have one of the two clearly happening first.

I think the only thing we risk with having some sort of race is potentially taking a bit longer to finally show the network error dialog, so the consequences aren't too dire if there is something we're missing.

To err on the side of caution, though, I think we should change this to 59s since SiS currently sends the heartbeat request every 60s. I initially wanted to see if we could go drop it on the SiS side to 30s so that we'd be more responsive to network issues, but thinking back, we have been burned previously by timeouts that I thought were very generous being problematic on some user's internet connections, so better safe than sorry I guess.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You bring up a good point about timing constraints. Would it make sense to turn this into a config option so that SiS could tweak this value without also having to make a code change here? Or do you feel more confident with a hardcoded value?

Copy link
Copy Markdown
Collaborator Author

@vdonato vdonato Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think coordinating having parameters to control whether we turn on the ack behavior as well as setting a config option for the timeout might start to get messy (not sure if we can easily get parameter values plumbed to config options if we wanted to stage that rollout, so that itself might end up being involved if we wanted to control this via config option).

We might be able to change the expectAck field to be something like ackTimeout, and control both whether the ack is expected and what its timeout should be when truthy. Then we'll just have one param to worry about rollout for on the SiS side.

Does this seem reasonable?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the OSS side, that seems reasonable to me. If you think that would be beneficial for SiS, then please feel free to implement it. If you think it's over-engineering also feel free to push back.

Copy link
Copy Markdown
Collaborator Author

@vdonato vdonato Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something we'd want to do on the SiS side. I'm not sure if we'll look into pinging more often / with shorter timeouts soon, but if we eventually choose to do so it'll be good to have this configurable so that we don't have to worry about the timeout value on older Streamlit versions. I ended up naming this new message field ackTimeoutMilliseconds.

@vdonato vdonato force-pushed the vdonato/ack-heartbeats-from-server branch from cf9e264 to b520036 Compare February 5, 2026 23:16
@vdonato
Copy link
Copy Markdown
Collaborator Author

vdonato commented Feb 6, 2026

@sfc-gh-bnisco thanks for the review / sorry for the slow turnaround on my side!

I've addressed your PR feedback + also had a realization that we'll want some way to control the rollout from the SiS side, so I made another change adding a new expectAck field to the SEND_APP_HEARTBEAT message type so that we can control whether the OSS side uses the new reconnect behavior from an app's host.

@vdonato vdonato force-pushed the vdonato/ack-heartbeats-from-server branch from 501d6a7 to 098ab1e Compare February 6, 2026 20:58
@sfc-gh-bnisco sfc-gh-bnisco added the ai-review If applied to PR or issue will run AI review workflow label Feb 6, 2026
@github-actions github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 6, 2026

Summary

This PR extends the existing SEND_APP_HEARTBEAT host communication mechanism to support server-side acknowledgment and connection health monitoring. When a host (e.g., SiS, Community Cloud) sends a heartbeat, the server now responds with a heartbeat_ack ForwardMsg. The host can optionally specify an ackTimeoutMilliseconds value; if an ack is not received within that window, the frontend will proactively close and reconnect its WebSocket, enabling faster detection of broken connections.

Key changes across the stack:

  • Proto: New bool heartbeat_ack = 26 field in ForwardMsg.
  • Python backend: _handle_app_heartbeat_request now enqueues a heartbeat_ack ForwardMsg instead of being a noop.
  • Frontend host comm: SEND_APP_HEARTBEAT gains an optional ackTimeoutMilliseconds field.
  • Frontend App: Dispatches heartbeatAck messages and passes timeout to ConnectionManager.
  • Frontend ConnectionManager: New heartbeat timeout tracking with onHeartbeatSent / onHeartbeatAckReceived, plus a private reconnect() method.
  • Frontend WebsocketConnection: New public reconnect() method that closes the connection and transitions the FSM to PINGING_SERVER.

Code Quality

The code is well-structured and follows existing patterns throughout the codebase.

Strengths:

  • Clean separation of concerns: WebsocketConnection.reconnect() handles the FSM transition, ConnectionManager handles the timeout logic, and App wires the dispatch.
  • Good defensive programming: the isConnected() guard inside the timeout callback prevents reconnect attempts on already-disconnected sockets; clearHeartbeatAckTimeout() is called from all appropriate teardown paths (disconnect, reconnect, state change).
  • The setConnectionState callback correctly clears stale timeouts when transitioning away from CONNECTED, preventing misleading log messages.
  • Optional chaining (this.connectionManager?.onHeartbeatSent(...)) is used consistently.
  • JSDoc comments are thorough and explain the "why" behind the timeout mechanism.

Minor observations:

  1. In ConnectionManager.onHeartbeatSent (line 237-249 of ConnectionManager.ts), inside the setTimeout callback, this.heartbeatAckTimeoutId is set to undefined before calling this.reconnect(), which itself calls clearHeartbeatAckTimeout(). This is harmless (the clear method checks for undefined), but the manual assignment on line 241 is slightly redundant since reconnect() will clear it anyway.

  2. The guard if (!ackTimeoutMilliseconds) in onHeartbeatSent (line 233) correctly handles 0, NaN, and undefined. Note that a negative value would pass this guard (since -1 is truthy) and create a near-immediate timeout. This is extremely unlikely since the host controls this value, but worth being aware of.

Test Coverage

Test coverage is thorough and follows best practices.

Python tests (app_session_test.py):

  • test_app_heartbeat_sends_ack properly verifies the ForwardMsg is enqueued with the correct type field (WhichOneof("type") == "heartbeat_ack"). Good negative assertion pattern.

Frontend - ConnectionManager.test.ts (new file, 241 lines):

  • Comprehensive coverage of all heartbeat timeout scenarios: start/clear timeout, reconnect on timeout, timer reset on new heartbeat, ack clears timeout, no-error on spurious ack, disconnect cleanup, no reconnect when already disconnected, timeout cleared on state transition.
  • Uses vi.useFakeTimers() appropriately with proper cleanup in afterEach.

Frontend - App.test.tsx:

  • Tests for SEND_APP_HEARTBEAT with and without ackTimeoutMilliseconds.
  • Tests that heartbeatAck ForwardMsg triggers onHeartbeatAckReceived.
  • Each test includes a complementary negative assertion (e.g., sending heartbeat doesn't trigger ack handler, and vice versa).

Frontend - HostCommunicationManager.test.tsx:

  • Existing test updated and new test added for ackTimeoutMilliseconds passthrough.

Frontend - WebsocketConnection.test.tsx:

  • reconnect() tested for both connected and non-connected states.

E2E:

  • hostframe.html updated with a "Send Heartbeat" button and button count incremented in the test. Updated snapshots provided.

Overall, the test coverage is excellent for this feature.

Backwards Compatibility

This change is fully backwards compatible:

  1. Host communication: ackTimeoutMilliseconds is optional in the SEND_APP_HEARTBEAT message type and defaults to 0 when absent (message.ackTimeoutMilliseconds ?? 0). When 0, no timeout is started, preserving existing behavior.
  2. Server-side: The server now always responds with heartbeat_ack, even for existing hosts that don't monitor for it. The frontend ignores it unless a timeout is pending. The heartbeat_ack message is a single boolean and has negligible bandwidth impact.
  3. OSS usage: In the OSS world, SEND_APP_HEARTBEAT messages are never sent, so none of this new code is ever invoked.
  4. sendAppHeartbeat signature change: The function signature changes from () => void to (ackTimeoutMilliseconds: number) => void. The AppNavigation.test.ts mock is updated, and the HostCommunicationProps interface is updated. All callers go through the host communication manager which defaults the value to 0.

Security & Risk

  • Low risk: The feature is opt-in (host must send a non-zero ackTimeoutMilliseconds). The worst case for a misbehaving host sending a very small timeout is extra reconnection cycles, which is bounded by the existing reconnection logic.
  • No new user inputs are processed; the ackTimeoutMilliseconds comes from the trusted host frame.
  • The heartbeat_ack proto field is a simple boolean with no payload, posing no deserialization risk.
  • No credentials, tokens, or sensitive data are involved in this flow.

Accessibility

This PR has no user-facing UI changes. The only frontend-visible change is a new "Send Heartbeat" button in the test-only hostframe.html file, which is not user-facing. No accessibility concerns.

Recommendations

No blocking issues found. Minor suggestions for consideration:

  1. Consider validating ackTimeoutMilliseconds: In ConnectionManager.onHeartbeatSent, you could add a guard for negative values (e.g., if (ackTimeoutMilliseconds <= 0) return). While unlikely to be sent by a host, negative values would create a near-immediate timeout leading to a reconnect cycle. This is a very minor hardening suggestion.

  2. Redundant cleanup in timeout callback: In ConnectionManager.ts line 241, this.heartbeatAckTimeoutId = undefined is set before this.reconnect(), which also calls clearHeartbeatAckTimeout(). This is harmless but could be simplified by removing the manual assignment and letting reconnect() handle it. Very minor, not blocking.

Verdict

APPROVED: Well-designed, backwards-compatible extension of the heartbeat mechanism with comprehensive test coverage across all layers. The opt-in approach with host-configurable timeout is a clean design. No blocking issues identified.


This is an automated AI review using opus-4.6-thinking. Please verify the feedback and use your judgment.

Copy link
Copy Markdown
Collaborator

@sfc-gh-bnisco sfc-gh-bnisco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revisions, one possible suggestion inline, but LGTM!


if (message.type === "SEND_APP_HEARTBEAT") {
this.props.sendAppHeartbeat()
this.props.sendAppHeartbeat(message.ackTimeoutMilliseconds ?? 0)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): Might be good to do some lightweight validation on this value:

typeof message.ackTimeoutMilliseconds === 'number' && 
Number.isFinite(message.ackTimeoutMilliseconds) && 
message.ackTimeoutMilliseconds > 0

@vdonato vdonato enabled auto-merge (squash) February 6, 2026 22:01
@vdonato vdonato merged commit ed04c1c into develop Feb 6, 2026
43 checks passed
@vdonato vdonato deleted the vdonato/ack-heartbeats-from-server branch February 6, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:feature PR contains new feature or enhancement implementation impact:users PR changes affect end users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants