Fix WebRTC VP8 decode errors and PLI keyframe requests by emranemran · Pull Request #614 · daydreamlive/scope

emranemran · 2026-03-06T20:38:47Z

Problem

Cloud relay WebRTC connections would permanently die from transient VP8 decode errors. When a decoder receives P-frames before a keyframe (common at connection start), the exception killed the frame receive loop with no recovery.

Additionally, aiortc 1.14.0 changed _send_rtcp_pli() to require a media_ssrc argument, causing PLI keyframe requests to fail with warnings on every cloud connection.

Two fixes

1. VP8 decode error retry (`tracks.py`, `cloud_track.py`, `cloud_webrtc_client.py`)

All three frame input loops now tolerate transient errors instead of dying on the first one:

Track 10 consecutive errors with a 10ms sleep between each
Any successful frame resets the counter to 0
Only bail after 10 consecutive failures (~400ms at 25fps)
Covers the window between connection start and first keyframe arrival

2. PLI keyframe request fix (`cloud_webrtc_client.py`)

Requests a keyframe from the cloud encoder to speed up decoder startup:

Polls every 100ms until aiortc's remote_streams dict is populated (up to 5s)
Sends a single PLI with the correct media_ssrc (required by aiortc >= 1.14.0)
Logs the SSRC on success for verification
Logs first frame receipt to confirm the keyframe roundtrip

How they work together

Connection established
  ├─ PLI task starts polling for remote_streams
  ├─ Frame receive loop starts
  │   ├─ P-frame arrives → decode error → retry (counter=1)
  │   ├─ P-frame arrives → decode error → retry (counter=2)
  │   ├─ PLI sent once remote_streams populated
  │   ├─ Keyframe arrives → decoded successfully → counter=0
  │   └─ Normal operation continues
  └─ "First frame received from cloud (keyframe decoded successfully)"

In practice, the cloud encoder sends a keyframe on its own at connection start, so the first frame usually arrives before the PLI is even sent. The PLI is a safety net, and the retry logic is a safety net for the safety net.

Expected logs

INFO - Sent PLI keyframe request (media_ssrc=2642438216)
INFO - First frame received from cloud (keyframe decoded successfully)
INFO - [FRAME-PROCESSOR] First frame produced, playback ready (ttff=1312ms)

Test plan

Verify server starts without errors (uv run daydream-scope)
Test cloud relay mode — no PLI warnings in logs
Verify "Sent PLI keyframe request" and "First frame received" appear in logs
VP8 decode warnings should be logged but not kill the session
Clean disconnects still work (MediaStreamError path)

coderabbitai · 2026-03-06T20:39:05Z

📝 Walkthrough

Walkthrough

Introduces consecutive-error tolerance (counter + max 10) to server input loops; treats MediaStreamError as end-of-source. Updates cloud_webrtc_client to track/reset consecutive errors, log first/frame progress, and to request keyframes by polling for remote streams (0.1s interval, up to 5s) and sending a PLI using the remote stream SSRC when available.

Changes

Cohort / File(s)	Summary
Input loop error tolerance `src/scope/server/cloud_track.py`, `src/scope/server/tracks.py`	Add `consecutive_errors` with `max_consecutive_errors = 10`; reset on successful frame; treat `MediaStreamError` as end-of-source; on other exceptions increment counter, log transient warnings, sleep 0.01s between retries, stop and log when threshold reached.
WebRTC client receive & keyframe logic `src/scope/server/cloud_webrtc_client.py`	In `_receive_frames` track/reset consecutive errors, handle `MediaStreamError` to terminate loop, log first-frame and per-100-frame progress. Rework `_request_keyframe` to poll for `remote_streams` (0.1s intervals, up to 5s), extract a media SSRC from found remote streams, and send a PLI via `receiver._send_rtcp_pli(media_ssrc)`; skip PLI if no remote streams found within timeout and log accordingly.

Sequence Diagram

sequenceDiagram
    participant Client as WebRTC Caller
    participant Receiver as Video Receiver
    participant Remote as Remote Streams Registry

    Client->>Receiver: request_keyframe()
    Receiver->>Remote: poll for remote_streams (0.1s intervals, up to 5s)
    alt remote_streams found
        Remote-->>Receiver: provide remote_streams (media SSRC)
        Receiver->>Receiver: extract media_ssrc
        Receiver->>Receiver: call _send_rtcp_pli(media_ssrc)
        Receiver-->>Client: log success / return
    else no remote_streams within timeout
        Remote-->>Receiver: no streams
        Receiver-->>Client: skip PLI, debug log
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nudge the stream with patient hops,

I count the misses, mend the drops.
A little wait, a gentle plea,
The SSRC hums — frames come to me.
Hooray — the picture dances free!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly reflects the main objectives: addressing WebRTC VP8 decode errors through retry logic and fixing PLI keyframe requests, which are the core changes across all three modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch emran/webrtc-retry-fix-vp8

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scope/server/cloud_webrtc_client.py`:
- Around line 336-342: The receive-loop branch that logs and breaks when
consecutive_errors >= max_consecutive_errors must also mark the client as failed
by flipping the connection state before exiting; update the block that
references consecutive_errors, max_consecutive_errors and logger.error to set
self.is_connected = False (or call the existing failure/cleanup method if one
exists) immediately prior to the break so callers stop queuing frames to a dead
output path.
- Around line 365-383: The _request_keyframe coroutine can access self.pc after
disconnect() sets it to None, causing AttributeError; to fix, inside the retry
loop (the block that iterates delays and calls self.pc.getReceivers()) snapshot
the peer connection into a local variable (e.g., pc = self.pc), then check if pc
is None or pc.isClosed()/pc.connectionState indicates closed before calling
pc.getReceivers(); skip the attempt or return if closed/None to avoid calling
methods on None. Ensure you reference _request_keyframe, self.pc, and disconnect
when making the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d78d11f1-6edc-4f6e-ad0e-8bf7a4a1f6d2

📥 Commits

Reviewing files that changed from the base of the PR and between 0a52f7c and a746734.

📒 Files selected for processing (3)

src/scope/server/cloud_track.py
src/scope/server/cloud_webrtc_client.py
src/scope/server/tracks.py

coderabbitai · 2026-03-06T20:43:51Z

src/scope/server/cloud_webrtc_client.py

+                    consecutive_errors += 1
+                    if consecutive_errors >= max_consecutive_errors:
+                        logger.error(
+                            f"Error receiving frame, stopping after "
+                            f"{consecutive_errors} consecutive errors: {e}"
+                        )
                        break


⚠️ Potential issue | 🟠 Major

Mark the client failed before exiting the receive loop.

Once this branch trips, the task stops but is_connected can still stay true, so callers may keep queueing frames against a dead output path and wait forever for processed video. Flip the connection state here before break.

Suggested fix

except Exception as e: consecutive_errors += 1 if consecutive_errors >= max_consecutive_errors: logger.error( f"Error receiving frame, stopping after " f"{consecutive_errors} consecutive errors: {e}" ) + self._connected = False + self._stats["connection_state"] = "failed" break

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

consecutive_errors += 1

if consecutive_errors >= max_consecutive_errors:

logger.error(

f"Error receiving frame, stopping after "

f"{consecutive_errors} consecutive errors: {e}"

)

break

consecutive_errors += 1

if consecutive_errors >= max_consecutive_errors:

logger.error(

f"Error receiving frame, stopping after "

f"{consecutive_errors} consecutive errors: {e}"

)

self._connected = False

self._stats["connection_state"] = "failed"

break

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/scope/server/cloud_webrtc_client.py` around lines 336 - 342, The receive-loop branch that logs and breaks when consecutive_errors >= max_consecutive_errors must also mark the client as failed by flipping the connection state before exiting; update the block that references consecutive_errors, max_consecutive_errors and logger.error to set self.is_connected = False (or call the existing failure/cleanup method if one exists) immediately prior to the break so callers stop queuing frames to a dead output path.

seems the same as the previous behaviour though? not that that means it's fine, but if this is a bug it doesn't seem like it's being introduced here

@mjh1 — you're right, this isn't a regression introduced by this PR. Neither the old except Exception path nor the new one updated self._connected before breaking, so the gap existed before these changes too.

The suggestion is still worthwhile as a correctness improvement (callers checking is_connected after the loop exits won't know the receive path has died), but it's a pre-existing latent issue rather than something this PR introduces. Happy to leave it for a follow-up if you'd prefer to keep this PR focused on the VP8/PLI fixes.

Would you like me to open a follow-up issue to track it?

coderabbitai · 2026-03-06T20:43:51Z

src/scope/server/cloud_webrtc_client.py

+        delays = [0.1, 0.5, 1.0]
+        for attempt, delay in enumerate(delays):
+            await asyncio.sleep(delay)
+            sent = False
+            for receiver in self.pc.getReceivers():
+                if receiver.track and receiver.track.kind == "video":
+                    try:
+                        await receiver._send_rtcp_pli()
+                        logger.info(
+                            f"Sent PLI (keyframe request), attempt {attempt + 1}"
+                        )
+                        sent = True
+                    except Exception as e:
+                        logger.warning(
+                            f"Could not send PLI (attempt {attempt + 1}): {e}"
+                        )
+            if sent:
+                return
+        logger.warning("Failed to send PLI after all retry attempts")


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's explore the repository structure git ls-files | head -20

Repository: daydreamlive/scope

Length of output: 591

🏁 Script executed:

# Find and examine the cloud_webrtc_client.py file find . -type f -name "cloud_webrtc_client.py" 2>/dev/null

Repository: daydreamlive/scope

Length of output: 103

🏁 Script executed:

# Check the file size to determine best approach for reading wc -l src/scope/server/cloud_webrtc_client.py

Repository: daydreamlive/scope

Length of output: 105

🏁 Script executed:

# Read the full file to understand the context cat -n src/scope/server/cloud_webrtc_client.py

Repository: daydreamlive/scope

Length of output: 18440

🏁 Script executed:

# Check if connectionState is used elsewhere to confirm it's a valid property rg "connectionState" src/scope/server/cloud_webrtc_client.py

Repository: daydreamlive/scope

Length of output: 105

🏁 Script executed:

# Check dependencies to understand aiortc version fd -e txt -e toml -e cfg -e lock | grep -iE "(requirements|setup|pyproject|poetry|pipfile)"

Repository: daydreamlive/scope

Length of output: 211

🏁 Script executed:

# Check the main pyproject.toml for aiortc version cat pyproject.toml | grep -A 30 -B 5 aiortc

Repository: daydreamlive/scope

Length of output: 1216

🏁 Script executed:

# Check if _send_rtcp_pli is used elsewhere in the codebase rg "_send_rtcp_pli" --type py

Repository: daydreamlive/scope

Length of output: 157

🏁 Script executed:

# Check aiortc documentation by looking for any aiortc usage patterns in comments or documentation find . -type f \( -name "*.md" -o -name "*.txt" -o -name "*.rst" \) | xargs grep -l "aiortc" 2>/dev/null | head -5

Repository: daydreamlive/scope

Length of output: 44

🏁 Script executed:

# Let's verify that self.pc can indeed become None during the method execution # Check the disconnect method and how pc is set to None sed -n '412,434p' src/scope/server/cloud_webrtc_client.py

Repository: daydreamlive/scope

Length of output: 679

Add self.pc None check to prevent crash if connection closes during keyframe request.

_request_keyframe() runs as a detached task and sleeps up to 1.6s total across retries. If disconnect() runs during this window, self.pc becomes None (line 428), causing self.pc.getReceivers() to crash with AttributeError. Snapshot self.pc at each retry attempt and check for None or closed state before iterating receivers.

Suggested fix

delays = [0.1, 0.5, 1.0] for attempt, delay in enumerate(delays): await asyncio.sleep(delay) + pc = self.pc + if pc is None or pc.connectionState == "closed": + return sent = False - for receiver in self.pc.getReceivers(): + for receiver in pc.getReceivers(): if receiver.track and receiver.track.kind == "video": try: await receiver._send_rtcp_pli()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

delays = [0.1, 0.5, 1.0]

for attempt, delay in enumerate(delays):

await asyncio.sleep(delay)

sent = False

for receiver in self.pc.getReceivers():

if receiver.track and receiver.track.kind == "video":

try:

await receiver._send_rtcp_pli()

logger.info(

f"Sent PLI (keyframe request), attempt {attempt + 1}"

)

sent = True

except Exception as e:

logger.warning(

f"Could not send PLI (attempt {attempt + 1}): {e}"

)

if sent:

return

logger.warning("Failed to send PLI after all retry attempts")

delays = [0.1, 0.5, 1.0]

for attempt, delay in enumerate(delays):

await asyncio.sleep(delay)

pc = self.pc

if pc is None or pc.connectionState == "closed":

return

sent = False

for receiver in pc.getReceivers():

if receiver.track and receiver.track.kind == "video":

try:

await receiver._send_rtcp_pli()

logger.info(

f"Sent PLI (keyframe request), attempt {attempt + 1}"

)

sent = True

except Exception as e:

logger.warning(

f"Could not send PLI (attempt {attempt + 1}): {e}"

)

if sent:

return

logger.warning("Failed to send PLI after all retry attempts")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/scope/server/cloud_webrtc_client.py` around lines 365 - 383, The _request_keyframe coroutine can access self.pc after disconnect() sets it to None, causing AttributeError; to fix, inside the retry loop (the block that iterates delays and calls self.pc.getReceivers()) snapshot the peer connection into a local variable (e.g., pc = self.pc), then check if pc is None or pc.isClosed()/pc.connectionState indicates closed before calling pc.getReceivers(); skip the attempt or return if closed/None to avoid calling methods on None. Ensure you reference _request_keyframe, self.pc, and disconnect when making the change.

github-actions · 2026-03-06T21:30:23Z

🚀 fal.ai Preview Deployment


App ID	`daydream/scope-pr-614--preview`
WebSocket	`wss://fal.run/daydream/scope-pr-614--preview/ws`
Commit	`bb69e0a`

Testing

Connect to this preview deployment by running this on your branch:

uv run build && SCOPE_CLOUD_APP_ID="daydream/scope-pr-614--preview/ws" uv run daydream-scope

🧪 E2E tests will run automatically against this deployment.

github-actions · 2026-03-06T21:36:49Z

✅ E2E Tests passed


Status	passed
fal App	`daydream/scope-pr-614--preview`
Run	View logs

Test Artifacts

Check the workflow run for screenshots.

emranemran · 2026-03-09T06:52:33Z

@leszko @mjh1: FYI I'm still testing this PR and it's not ready for merge yet.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/scope/server/cloud_webrtc_client.py (1)
311-312: Consider moving import to file level for consistency.

MediaStreamError is imported inside the method, but tracks.py imports it at the file level (line 8). Since line 24 already imports from aiortc.mediastreams, adding MediaStreamError there maintains consistency.
Suggested change

At line 24:
-from aiortc.mediastreams import VIDEO_TIME_BASE, MediaStreamTrack
+from aiortc.mediastreams import VIDEO_TIME_BASE, MediaStreamError, MediaStreamTrack
Then remove lines 311-312 from _receive_frames.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scope/server/cloud_webrtc_client.py` around lines 311 - 312, Move the
MediaStreamError import out of the _receive_frames method and into the
file-level imports: add MediaStreamError to the existing top-level import from
aiortc.mediastreams (the same import already present near the other aiortc
imports) and then remove the local "from aiortc.mediastreams import
MediaStreamError" inside _receive_frames so the symbol is consistently imported
at module scope.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scope/server/cloud_webrtc_client.py`:
- Around line 377-386: The code is brittle because it directly accesses the
name-mangled private attribute receiver._RTCRtpReceiver__remote_streams; replace
this fragile access with a safe runtime compatibility check and explicit
failure: at function entry validate aiortc version (ensure >= 1.14.0) or attempt
to read the attribute with getattr/hasattr (e.g., check for
"_RTCRtpReceiver__remote_streams" or public alternatives) and if missing raise a
clear RuntimeError explaining the required aiortc version; keep the existing
retry/exception handling but log the compatibility error and fail fast instead
of silently continuing, and update documentation/dependencies to declare the
minimum aiortc version.

---

Nitpick comments:
In `@src/scope/server/cloud_webrtc_client.py`:
- Around line 311-312: Move the MediaStreamError import out of the
_receive_frames method and into the file-level imports: add MediaStreamError to
the existing top-level import from aiortc.mediastreams (the same import already
present near the other aiortc imports) and then remove the local "from
aiortc.mediastreams import MediaStreamError" inside _receive_frames so the
symbol is consistently imported at module scope.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7697451f-d207-4edb-8a2f-2d31fc5c332f

📥 Commits

Reviewing files that changed from the base of the PR and between a746734 and 6411c07.

📒 Files selected for processing (3)

src/scope/server/cloud_track.py
src/scope/server/cloud_webrtc_client.py
src/scope/server/tracks.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/scope/server/cloud_track.py

coderabbitai · 2026-03-09T06:56:29Z

src/scope/server/cloud_webrtc_client.py

+                        # Get remote SSRC from receiver's internal stream tracking
+                        # (required by aiortc >= 1.14.0)
+                        remote_streams = receiver._RTCRtpReceiver__remote_streams
+                        if not remote_streams:
+                            logger.debug(
+                                f"No remote streams yet (attempt {attempt + 1})"
+                            )
+                            continue
+                        media_ssrc = next(iter(remote_streams))
+                        await receiver._send_rtcp_pli(media_ssrc)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

aiortc 1.14.0 _send_rtcp_pli media_ssrc parameter

💡 Result:

In aiortc 1.14.0, RTCRtpReceiver._send_rtcp_pli(media_ssrc) expects the SSRC of the media source you want the sender to generate a new keyframe for—i.e., the SSRC of the incoming RTP stream (the “media source” SSRC in the RTCP feedback common header). aiortc then puts that value into the RTCP PSFB PLI packet’s media_ssrc field. (raw.githubusercontent.com)

Concretely, aiortc calls it like this when the jitter buffer decides a PLI is needed:

await self._send_rtcp_pli(packet.ssrc) (raw.githubusercontent.com)

And _send_rtcp_pli builds:

RtcpPsfbPacket(fmt=RTCP_PSFB_PLI, ssrc=self.__rtcp_ssrc, media_ssrc=media_ssrc) (raw.githubusercontent.com)

Notes:

ssrc=self.__rtcp_ssrc is the SSRC of the feedback sender (your receiver endpoint), while media_ssrc is the SSRC of the media stream being referenced. (raw.githubusercontent.com)

If RTX is in use, aiortc unwraps RTX and rewrites the packet SSRC to the original (primary) SSRC before potentially sending PLI, so the PLI targets the main media SSRC, not the RTX SSRC. (raw.githubusercontent.com)

Citations:

1: https://raw.githubusercontent.com/aiortc/aiortc/1.14.0/src/aiortc/rtcrtpreceiver.py

2: https://raw.githubusercontent.com/aiortc/aiortc/1.14.0/src/aiortc/rtcrtpreceiver.py

3: https://raw.githubusercontent.com/aiortc/aiortc/1.14.0/src/aiortc/rtcrtpreceiver.py

4: https://raw.githubusercontent.com/aiortc/aiortc/1.14.0/src/aiortc/rtcrtpreceiver.py

5: https://raw.githubusercontent.com/aiortc/aiortc/1.14.0/src/aiortc/rtcrtpreceiver.py

🏁 Script executed:

# Search for the full context of this code section cd /tmp && find . -name "cloud_webrtc_client.py" 2>/dev/null | head -5

Repository: daydreamlive/scope

Length of output: 44

🏁 Script executed:

# Check if the file exists in the repo and examine surrounding code git ls-files | grep -i cloud_webrtc

Repository: daydreamlive/scope

Length of output: 101

🏁 Script executed:

# Look at the full method/function containing these lines rg -A 20 -B 10 "_RTCRtpReceiver__remote_streams" --type py

Repository: daydreamlive/scope

Length of output: 2735

🏁 Script executed:

# Search for alternative approaches to getting media_ssrc in the codebase rg "_send_rtcp_pli" --type py -B 5 -A 5

Repository: daydreamlive/scope

Length of output: 1058

🏁 Script executed:

# Check aiortc imports and version handling in the file rg "import aiortc|from aiortc" --type py -A 2 -B 2

Repository: daydreamlive/scope

Length of output: 2875

Private attribute access is fragile despite defensive handling.

receiver._RTCRtpReceiver__remote_streams accesses a name-mangled private attribute. While the code already wraps this in a try/except block with retry logic, the direct access remains vulnerable to silent breakage if aiortc's internals change. The broad exception handling provides defense, but consider adding a version check at runtime (the comment indicates aiortc >= 1.14.0 is required) or documenting the minimum aiortc version as a hard dependency to catch incompatibilities earlier.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/scope/server/cloud_webrtc_client.py` around lines 377 - 386, The code is brittle because it directly accesses the name-mangled private attribute receiver._RTCRtpReceiver__remote_streams; replace this fragile access with a safe runtime compatibility check and explicit failure: at function entry validate aiortc version (ensure >= 1.14.0) or attempt to read the attribute with getattr/hasattr (e.g., check for "_RTCRtpReceiver__remote_streams" or public alternatives) and if missing raise a clear RuntimeError explaining the required aiortc version; keep the existing retry/exception handling but log the compatibility error and fail fast instead of silently continuing, and update documentation/dependencies to declare the minimum aiortc version.

- Add retry logic for transient VP8 decode errors in cloud WebRTC input loops - Fix PLI keyframe request for aiortc >= 1.14.0 (pass media_ssrc) - Poll for remote streams before sending PLI instead of blind retries - Log first frame receipt to verify PLI/keyframe roundtrip Signed-off-by: emranemran <[email protected]>

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/scope/server/cloud_track.py (1)

132-154: ⚠️ Potential issue | 🟠 Major

Reset consecutive_errors after the cloud handoff succeeds.

consecutive_errors is cleared before self.frame_processor.put(frame) runs, so a persistent put() failure never reaches the 10-error cutoff — every successful recv() zeroes the counter first.

🐛 Proposed fix

                 try:
                     # Get frame from browser
                     frame = await self._source_track.recv()
-                    consecutive_errors = 0

                     # Send through FrameProcessor (which relays to cloud)
                     if self.frame_processor:
                         self.frame_processor.put(frame)
+                    consecutive_errors = 0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/scope/server/cloud_track.py` around lines 132 - 154, The bug is that
consecutive_errors is reset immediately after a successful _source_track.recv()
so failures in self.frame_processor.put(frame) never accumulate; move the reset
to only run after the cloud handoff succeeds by wrapping the handoff in its own
try/except: call self.frame_processor.put(frame) (or await if needed) inside a
try block, on success set consecutive_errors = 0, and on exception increment
consecutive_errors and log/handle using the existing max_consecutive_errors
logic (use the same logger messages and break behavior), keeping the recv()
exception handling for MediaStreamError and other recv errors unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scope/server/cloud_webrtc_client.py`:
- Around line 370-399: The current PLI send logic polls for remote_streams then
sends a single RTCP PLI via receiver._send_rtcp_pli(media_ssrc) and bails if it
fails; change this to retry sending PLIs on a bounded schedule (e.g.,
exponential or fixed-interval backoff with a max attempts/timeout) until a video
frame arrives or the retry budget is exhausted: after detecting remote_streams
and selecting receiver/media_ssrc, loop up to N attempts (or until a global
timeout) calling receiver._send_rtcp_pli(media_ssrc), awaiting a short interval
between attempts, break if the first frame is observed (check r.track.readyState
or other indicator from pc.getReceivers()/remote_streams), and log each attempt
and final failure using logger (replace the single try/except block around
receiver._send_rtcp_pli with this retry loop and preserve existing debug/info
messages).

In `@src/scope/server/tracks.py`:
- Around line 63-92: The loop calls self.frame_processor.put() but
self.frame_processor is lazily created by await self.track.recv(), so initialize
the processor before entering the retry loop: perform an initial await
self.track.recv() (or call the existing FrameProcessor initializer)
outside/above the while self.input_task_running loop to force creation of
self.frame_processor, handling asyncio.CancelledError and MediaStreamError the
same way as in-loop, then continue into the loop using
self.frame_processor.put() as before so transient recv-driven creation cannot
cause perpetual single-error retries.

---

Outside diff comments:
In `@src/scope/server/cloud_track.py`:
- Around line 132-154: The bug is that consecutive_errors is reset immediately
after a successful _source_track.recv() so failures in
self.frame_processor.put(frame) never accumulate; move the reset to only run
after the cloud handoff succeeds by wrapping the handoff in its own try/except:
call self.frame_processor.put(frame) (or await if needed) inside a try block, on
success set consecutive_errors = 0, and on exception increment
consecutive_errors and log/handle using the existing max_consecutive_errors
logic (use the same logger messages and break behavior), keeping the recv()
exception handling for MediaStreamError and other recv errors unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9433c2e5-b52b-4495-802e-32712d1485cc

📥 Commits

Reviewing files that changed from the base of the PR and between 6411c07 and bb69e0a.

📒 Files selected for processing (3)

src/scope/server/cloud_track.py
src/scope/server/cloud_webrtc_client.py
src/scope/server/tracks.py

coderabbitai · 2026-03-09T18:34:13Z

src/scope/server/cloud_webrtc_client.py

+        # Poll until remote_streams is populated (RTP packets have arrived)
+        timeout = 5.0
+        poll_interval = 0.1
+        elapsed = 0.0
+        receiver = None
+        remote_streams = None
+
+        while elapsed < timeout:
+            await asyncio.sleep(poll_interval)
+            elapsed += poll_interval
+            for r in self.pc.getReceivers():
+                if r.track and r.track.kind == "video":
+                    streams = r._RTCRtpReceiver__remote_streams
+                    if streams:
+                        receiver = r
+                        remote_streams = streams
+                        break
+            if remote_streams:
+                break
+
+        if not remote_streams or not receiver:
+            logger.debug("No remote streams after %.1fs, skipping PLI", timeout)
+            return
+
+        try:
+            media_ssrc = next(iter(remote_streams))
+            await receiver._send_rtcp_pli(media_ssrc)
+            logger.info(f"Sent PLI keyframe request (media_ssrc={media_ssrc})")
+        except Exception as e:
+            logger.debug(f"Could not send PLI: {e}")


⚠️ Potential issue | 🟠 Major

Retry the PLI instead of sending it once.

Polling remote_streams fixes the “too early” case, but after that this code makes exactly one RTCP PLI attempt. If that packet is lost or the sender still is not ready, startup falls back to waiting for the next natural keyframe and the VP8 decode loop can keep failing. Keep a bounded retry schedule until the first frame arrives or the retry budget is exhausted.

🔁 Proposed fix

- try: - media_ssrc = next(iter(remote_streams)) - await receiver._send_rtcp_pli(media_ssrc) - logger.info(f"Sent PLI keyframe request (media_ssrc={media_ssrc})") - except Exception as e: - logger.debug(f"Could not send PLI: {e}") + media_ssrc = next(iter(remote_streams)) + for attempt, delay in enumerate((0.0, 0.5, 1.0, 2.0), start=1): + if delay: + await asyncio.sleep(delay) + if self._stats["frames_received"] > 0: + return + try: + await receiver._send_rtcp_pli(media_ssrc) + logger.info( + "Sent PLI keyframe request (media_ssrc=%s, attempt %s)", + media_ssrc, + attempt, + ) + except Exception as e: + logger.debug("Could not send PLI on attempt %s: %s", attempt, e)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/scope/server/cloud_webrtc_client.py` around lines 370 - 399, The current PLI send logic polls for remote_streams then sends a single RTCP PLI via receiver._send_rtcp_pli(media_ssrc) and bails if it fails; change this to retry sending PLIs on a bounded schedule (e.g., exponential or fixed-interval backoff with a max attempts/timeout) until a video frame arrives or the retry budget is exhausted: after detecting remote_streams and selecting receiver/media_ssrc, loop up to N attempts (or until a global timeout) calling receiver._send_rtcp_pli(media_ssrc), awaiting a short interval between attempts, break if the first frame is observed (check r.track.readyState or other indicator from pc.getReceivers()/remote_streams), and log each attempt and final failure using logger (replace the single try/except block around receiver._send_rtcp_pli with this retry loop and preserve existing debug/info messages).

coderabbitai · 2026-03-09T18:34:13Z

src/scope/server/tracks.py

+        consecutive_errors = 0
+        max_consecutive_errors = 10
        while self.input_task_running:
            try:
                input_frame = await self.track.recv()
+                consecutive_errors = 0

                # Store raw VideoFrame for later processing (tracks input FPS internally)
                self.frame_processor.put(input_frame)

            except asyncio.CancelledError:
                break
-            except Exception as e:
-                # Stop the input loop on connection errors to avoid spam
-                logger.error(f"Error in input loop, stopping: {e}")
+            except MediaStreamError:
+                logger.info("Source track ended")
                self.input_task_running = False
                break
+            except Exception as e:
+                consecutive_errors += 1
+                if consecutive_errors >= max_consecutive_errors:
+                    logger.error(
+                        f"Error in input loop, stopping after "
+                        f"{consecutive_errors} consecutive errors: {e}"
+                    )
+                    self.input_task_running = False
+                    break
+                logger.warning(
+                    f"Transient error in input loop "
+                    f"({consecutive_errors}/{max_consecutive_errors}): {e}"
+                )
+                await asyncio.sleep(0.01)


⚠️ Potential issue | 🟠 Major

Initialize the processor before starting this retry loop.

recv() is what lazily creates self.frame_processor, but this loop dereferences it earlier. If input starts first, put() can fail on every frame, and because the counter is reset before the call the loop can stay at 1/10 forever.

🐛 Proposed fix

def initialize_input_processing(self, track: MediaStreamTrack): self.track = track + self.initialize_output_processing() self.input_task_running = True self.input_task = asyncio.create_task(self.input_loop())

while self.input_task_running: try: input_frame = await self.track.recv() - consecutive_errors = 0 # Store raw VideoFrame for later processing (tracks input FPS internally) self.frame_processor.put(input_frame) + consecutive_errors = 0

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

consecutive_errors = 0

max_consecutive_errors = 10

while self.input_task_running:

try:

input_frame = await self.track.recv()

consecutive_errors = 0

# Store raw VideoFrame for later processing (tracks input FPS internally)

self.frame_processor.put(input_frame)

except asyncio.CancelledError:

break

except Exception as e:

# Stop the input loop on connection errors to avoid spam

logger.error(f"Error in input loop, stopping: {e}")

except MediaStreamError:

logger.info("Source track ended")

self.input_task_running = False

break

except Exception as e:

consecutive_errors += 1

if consecutive_errors >= max_consecutive_errors:

logger.error(

f"Error in input loop, stopping after "

f"{consecutive_errors} consecutive errors: {e}"

)

self.input_task_running = False

break

logger.warning(

f"Transient error in input loop "

f"({consecutive_errors}/{max_consecutive_errors}): {e}"

)

await asyncio.sleep(0.01)

consecutive_errors = 0

max_consecutive_errors = 10

while self.input_task_running:

try:

input_frame = await self.track.recv()

# Store raw VideoFrame for later processing (tracks input FPS internally)

self.frame_processor.put(input_frame)

consecutive_errors = 0

except asyncio.CancelledError:

break

except MediaStreamError:

logger.info("Source track ended")

self.input_task_running = False

break

except Exception as e:

consecutive_errors += 1

if consecutive_errors >= max_consecutive_errors:

logger.error(

f"Error in input loop, stopping after "

f"{consecutive_errors} consecutive errors: {e}"

)

self.input_task_running = False

break

logger.warning(

f"Transient error in input loop "

f"({consecutive_errors}/{max_consecutive_errors}): {e}"

)

await asyncio.sleep(0.01)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/scope/server/tracks.py` around lines 63 - 92, The loop calls self.frame_processor.put() but self.frame_processor is lazily created by await self.track.recv(), so initialize the processor before entering the retry loop: perform an initial await self.track.recv() (or call the existing FrameProcessor initializer) outside/above the while self.input_task_running loop to force creation of self.frame_processor, handling asyncio.CancelledError and MediaStreamError the same way as in-loop, then continue into the loop using self.frame_processor.put() as before so transient recv-driven creation cannot cause perpetual single-error retries.

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

emranemran requested review from leszko, mjh1 and ryanontheinside March 6, 2026 20:44

emranemran changed the title ~~Fix WebRTC input loops dying on transient VP8 decode errors~~ Fix WebRTC VP8 decode errors and PLI keyframe requests Mar 9, 2026

emranemran force-pushed the emran/webrtc-retry-fix-vp8 branch from a746734 to 6411c07 Compare March 9, 2026 06:52

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

mjh1 approved these changes Mar 9, 2026

View reviewed changes

emranemran force-pushed the emran/webrtc-retry-fix-vp8 branch from 6411c07 to bb69e0a Compare March 9, 2026 18:24

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

ryanontheinside approved these changes Mar 9, 2026

View reviewed changes

emranemran merged commit 3a76284 into main Mar 9, 2026
10 checks passed

Conversation

emranemran commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Two fixes

1. VP8 decode error retry (tracks.py, cloud_track.py, cloud_webrtc_client.py)

2. PLI keyframe request fix (cloud_webrtc_client.py)

How they work together

Expected logs

Test plan

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjh1 Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 fal.ai Preview Deployment

Testing

Uh oh!

github-actions bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ E2E Tests passed

Test Artifacts

Uh oh!

emranemran commented Mar 9, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

emranemran commented Mar 6, 2026 •

edited

Loading

1. VP8 decode error retry (`tracks.py`, `cloud_track.py`, `cloud_webrtc_client.py`)

2. PLI keyframe request fix (`cloud_webrtc_client.py`)

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

coderabbitai bot Mar 6, 2026 •

edited

Loading

mjh1 Mar 9, 2026 •

edited

Loading

github-actions bot commented Mar 6, 2026 •

edited

Loading

github-actions bot commented Mar 6, 2026 •

edited

Loading