Skip to content

Close WebRTC session when cloud WebSocket disconnects#638

Merged
mjh1 merged 1 commit intomainfrom
emran/webrtc-close
Mar 10, 2026
Merged

Close WebRTC session when cloud WebSocket disconnects#638
mjh1 merged 1 commit intomainfrom
emran/webrtc-close

Conversation

@emranemran
Copy link
Copy Markdown
Contributor

@emranemran emranemran commented Mar 9, 2026

Problem

When the fal WebSocket closes (e.g. MAX_DURATION_EXCEEDED after 60 minutes), the video stream keeps playing indefinitely instead of stopping.

Why this happens

There are 3 actors on the fal machine:

  1. fal_app.py — WebSocket proxy, owns the 60-minute timer
  2. Scope backend — runs the pipeline, owns the WebRTC peer connection
  3. Client — browser (direct mode) or local backend (relay mode)

The WebRTC peer connection uses UDP and is completely independent of the WebSocket (TCP). When fal_app.py closes the WebSocket, the scope backend has no idea — nobody tells it to stop. The pipeline keeps running and frames keep flowing over UDP.

Current (broken) sequence

           Client                    fal_app.py                Scope backend (on fal)
           (browser or               (WebSocket proxy)         (runs pipeline, owns
            local backend)                                      WebRTC peer connection)
               │                          │                          │
               │◄─── WebRTC (UDP) ────────┼──────────────────────────►│
               │     video frames flowing both directions             │
               │                          │                          │
               │◄──── WebSocket ─────────►│                          │
               │     signaling channel    │                          │
               │                          │                          │
          ─ ─ ─│─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ ─ ─  60 min timer ─ ─ ─│─ ─ ─
               │                          │                          │
               │    {"type":"error",      │                          │
               │◄────"MAX_DURATION_       │                          │
               │      EXCEEDED"}          │                          │
               │                          │                          │
               │     WebSocket CLOSED     │                          │
               │◄─────────────────────────│                          │
               │                          │                          │
               │  (client detects close)  │   finally block:         │
               │  sets _connected=False   │   - cancel log forwarder │
               │  publishes kafka event   │   - publish kafka event  │
               │                          │   - cleanup files/plugins│
               │  ┌─────────────────┐     │                          │
               │  │ BUT: does NOT   │     │  ┌─────────────────────┐ │
               │  │ close WebRTC    │     │  │ KNOWS NOTHING about │ │
               │  │ connection      │     │  │ WebSocket closing.  │ │
               │  └─────────────────┘     │  │ Pipeline keeps      │ │
               │                          │  │ running, frames     │ │
               │◄─── WebRTC (UDP) ────────┼──┤ keep processing.    │ │
               │     STILL FLOWING        │  └─────────────────────┘ │
               │                          │                          │
               ▼                          ▼                          ▼
         Video continues              Handler exits           Happily running
         indefinitely              (awaits next conn)         with no idea

Fix

Fixed sequence

           Client                    fal_app.py                Scope backend (on fal)
               │                          │                          │
          ─ ─ ─│─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ ─ ─  60 min timer ─ ─ ─│─ ─ ─
               │                          │                          │
               │◄── error: MAX_DURATION ──│                          │
               │◄── WebSocket CLOSED ─────│                          │
               │                          │                          │
               │                          │  DELETE /webrtc/offer/   │
               │                          │───── {session_id} ──────►│
               │                          │                          │
               │                          │                    closes RTCPeerConnection
               │                          │                    stops pipeline
               │                          │                          │
               │   WebRTC track ends      │                          │
               │◄─────────────────────────┼──────────────────────────│
               │   (MediaStreamError)     │                          │
               │                          │                          │
         Stream stops cleanly         Cleanup done             Session removed

Changes

  1. DELETE /api/v1/webrtc/offer/{session_id} — new endpoint on the scope backend that closes and removes a WebRTC session via webrtc_manager.remove_session()

  2. fal_app.py finally block — calls the DELETE endpoint with the session ID before cleaning up files/plugins, so the WebRTC peer connection is torn down when the WebSocket closes for any reason (MAX_DURATION_EXCEEDED, client disconnect, errors)

Test plan

  • Stream for 60+ minutes (or temporarily lower MAX_CONNECTION_DURATION_SECONDS) and verify the stream stops when the timer fires
  • Verify logs show Closed WebRTC session <id> in fal output
  • Verify normal disconnect (user closes browser) still cleans up properly
  • Verify reconnection works after session cleanup

Summary by CodeRabbit

  • New Features
    • Added automatic cleanup of WebRTC sessions when connections disconnect
    • Added new HTTP API endpoint for closing WebRTC sessions

When the fal WebSocket closes (e.g. MAX_DURATION_EXCEEDED timer), the
WebRTC peer connection (UDP) between the client and the scope backend
continues running independently — the pipeline keeps processing frames
and the video keeps streaming.

Add a DELETE /api/v1/webrtc/offer/{session_id} endpoint to the backend,
and call it from fal_app.py's finally block to explicitly tear down the
WebRTC session when the signaling WebSocket closes.

Signed-off-by: emranemran <[email protected]>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

These changes implement proper WebRTC session lifecycle management by adding a server-side HTTP DELETE endpoint for closing sessions and client-side logic to invoke it during WebSocket disconnect cleanup, ensuring symmetric session teardown.

Changes

Cohort / File(s) Summary
WebRTC Session Teardown
src/scope/cloud/fal_app.py, src/scope/server/app.py
Server-side DELETE endpoint at /api/v1/webrtc/offer/{session_id} removes WebRTC sessions via WebRTCManager; client-side invokes this endpoint during WebSocket disconnect cleanup with error handling and logging.

Sequence Diagram

sequenceDiagram
    participant Client as WebSocket Client<br/>(fal_app.py)
    participant Server as Scope Server<br/>(app.py)
    participant Manager as WebRTCManager
    
    Client->>Client: WebSocket disconnect event
    Client->>Server: DELETE /api/v1/webrtc/offer/{session_id}
    Server->>Manager: remove_session(session_id)
    Manager->>Manager: Clean up session resources
    Manager-->>Server: Success
    Server-->>Client: 204 No Content
    Client->>Client: Cleanup complete
Loading

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Whiskers twitch with delight,
Sessions close with DELETE's might,
Disconnect flows cleanup true,
WebRTC bids farewell to you! 🌐✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding logic to close WebRTC sessions when the cloud WebSocket disconnects, which is the primary objective of this PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch emran/webrtc-close

Comment @coderabbitai help to get the list of available commands and usage tips.

@emranemran emranemran requested review from leszko and mjh1 March 9, 2026 21:40
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/scope/cloud/fal_app.py (1)

1154-1189: ⚠️ Potential issue | 🟠 Major

Move _cleanup_event.clear() before starting the WebRTC DELETE.

Because Lines 591-595 wait on _cleanup_event before sending ready, leaving the event set until after this DELETE returns lets a fast reconnect bypass the gate while remove_session() is still closing the previous peer connection. Clear the event before the teardown starts, then set it again after the rest of cleanup finishes.

Suggested fix
         finally:
+            event = _get_cleanup_event()
+            event.clear()
+            try:
                 # Cancel log forwarder task
                 if log_forwarder_task is not None:
                     log_forwarder_task.cancel()
                     try:
                         await log_forwarder_task
@@
                 if kafka_publisher and kafka_publisher.is_running:
                     end_time = time.time()
                     elapsed_ms = int((end_time - connection_start_time) * 1000)
                     await kafka_publisher.publish(
                         "websocket_disconnected",
@@
                 if session_id:
                     import httpx

                     try:
                         async with httpx.AsyncClient() as client:
                             resp = await client.delete(
@@
                     except Exception as e:
                         print(
                             f"[{log_prefix()}] Warning: Failed to close WebRTC "
                             f"session {session_id}: {e}"
                         )

-            # Clean up session data to prevent data leakage between users.
-            # Block the next connection's "ready" message until cleanup finishes.
-            event = _get_cleanup_event()
-            event.clear()
-            try:
+                # Clean up session data to prevent data leakage between users.
+                # Block the next connection's "ready" message until cleanup finishes.
                 await cleanup_installed_plugins()
                 cleanup_session_data()
             finally:
                 event.set()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scope/cloud/fal_app.py` around lines 1154 - 1189, Move the cleanup gating
so the `_cleanup_event()` is cleared before beginning the WebRTC teardown: call
_get_cleanup_event() and event.clear() immediately when entering the cleanup
block (before the HTTP DELETE that uses session_id), then perform the DELETE to
SCOPE_LOCAL_URL for the WebRTC session (the existing
session_id/httpx.AsyncClient block), followed by cleanup_installed_plugins() and
cleanup_session_data(), and finally call event.set() in the finally block; this
ensures the `_cleanup_event` prevents a fast reconnect from sending "ready"
while remove_session/peer-connection teardown is still in progress.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/scope/cloud/fal_app.py`:
- Around line 1154-1189: Move the cleanup gating so the `_cleanup_event()` is
cleared before beginning the WebRTC teardown: call _get_cleanup_event() and
event.clear() immediately when entering the cleanup block (before the HTTP
DELETE that uses session_id), then perform the DELETE to SCOPE_LOCAL_URL for the
WebRTC session (the existing session_id/httpx.AsyncClient block), followed by
cleanup_installed_plugins() and cleanup_session_data(), and finally call
event.set() in the finally block; this ensures the `_cleanup_event` prevents a
fast reconnect from sending "ready" while remove_session/peer-connection
teardown is still in progress.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 10e86be0-85d4-470a-befa-249de4880f52

📥 Commits

Reviewing files that changed from the base of the PR and between 3a76284 and e1c49ed.

📒 Files selected for processing (2)
  • src/scope/cloud/fal_app.py
  • src/scope/server/app.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

🚀 fal.ai Preview Deployment

App ID daydream/scope-pr-638--preview
WebSocket wss://fal.run/daydream/scope-pr-638--preview/ws
Commit e1c49ed

Testing

Connect to this preview deployment by running this on your branch:

uv run build && SCOPE_CLOUD_APP_ID="daydream/scope-pr-638--preview/ws" uv run daydream-scope

🧪 E2E tests will run automatically against this deployment.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

✅ E2E Tests passed

Status passed
fal App daydream/scope-pr-638--preview
Run View logs

Test Artifacts

Check the workflow run for screenshots.

@emranemran
Copy link
Copy Markdown
Contributor Author

Tested by overwriting the timeout to 5min for a quick test and it worked as expected - stream stopped and backend stopped cleanly.

@mjh1 mjh1 merged commit 3e19e6d into main Mar 10, 2026
10 checks passed
@mjh1 mjh1 deleted the emran/webrtc-close branch March 10, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants