Skip to content

fix: batch-level FPS tracking for smooth frame delivery#617

Merged
leszko merged 2 commits intomainfrom
fix/smooth-frame-pacing
Mar 9, 2026
Merged

fix: batch-level FPS tracking for smooth frame delivery#617
leszko merged 2 commits intomainfrom
fix/smooth-frame-pacing

Conversation

@ryanontheinside
Copy link
Copy Markdown
Collaborator

@ryanontheinside ryanontheinside commented Mar 7, 2026

Replace per-frame delta FPS tracking with batch-level throughput measurement (sum(frames) / sum(intervals)) in PipelineProcessor.

The old approach measured inter-frame deltas, but in batched pipelines (e.g., 12 frames emitted at once), near-zero intra-batch deltas mixed with large inter-batch gaps caused the FPS estimate to oscillate permanently. This produced unstable frame pacing downstream.

The new approach records one sample per pipeline call as a (num_frames, interval) tuple, then computes FPS over a sliding window of 10 batches. This gives a stable, accurate estimate regardless of batch size.

Benchmark results (10s runs, real PipelineProcessor with stub pipelines):

Scenario Metric Old New
LongLive (batch=12) Jitter 84.1% 0.3%
Stalls (>1.5x) 7.9% 0.0%
FPS error 34.3% 0.2%
FPS stdev 7.23 0.01
SDv2 (batch=4) Jitter 29.1% 5.8%
Stalls (>1.5x) 4.6% 0.0%
FPS error 13.8% 1.0%
FPS stdev 5.24 0.69
Single-frame Jitter 2.3% 4.5%
FPS error 2.7% 2.7%
FPS stdev 1.89 1.35
Pause/resume Jitter 112.9% 0.3%
Stalls (>1.5x) 7.3% 0.0%
FPS error 53.8% 0.2%
FPS stdev 12.10 0.01
Single-frame results are equivalent, confirming no regression for non-batching pipelines.

Summary by CodeRabbit

  • Refactor

    • Improved performance metrics tracking by switching to batch-based FPS calculation, reducing per-frame overhead.
  • Tests

    • Added comprehensive frame delivery validation tests covering varying batch sizes, processing delays, and pause/resume scenarios.

Replace per-frame delta FPS tracking with batch-level throughput
measurement (sum(frames) / sum(intervals)) in PipelineProcessor.

The old approach measured inter-frame deltas, but in batched pipelines
(e.g., 12 frames emitted at once), near-zero intra-batch deltas mixed
with large inter-batch gaps caused the FPS estimate to oscillate
permanently. This produced unstable frame pacing downstream.

The new approach records one sample per pipeline call as a
(num_frames, interval) tuple, then computes FPS over a sliding window
of 10 batches. This gives a stable, accurate estimate regardless of
batch size.

Benchmark results (10s runs, real PipelineProcessor with stub pipelines):

Scenario            | Metric         |     Old |     New
--------------------|----------------|---------|--------
LongLive (batch=12) | Jitter         |  84.1%  |   0.3%
                    | Stalls (>1.5x) |   7.9%  |   0.0%
                    | FPS error      |  34.3%  |   0.2%
                    | FPS stdev      |   7.23  |   0.01
SDv2 (batch=4)      | Jitter         |  29.1%  |   5.8%
                    | Stalls (>1.5x) |   4.6%  |   0.0%
                    | FPS error      |  13.8%  |   1.0%
                    | FPS stdev      |   5.24  |   0.69
Single-frame        | Jitter         |   2.3%  |   4.5%
                    | FPS error      |   2.7%  |   2.7%
                    | FPS stdev      |   1.89  |   1.35
Pause/resume        | Jitter         | 112.9%  |   0.3%
                    | Stalls (>1.5x) |   7.3%  |   0.0%
                    | FPS error      |  53.8%  |   0.2%
                    | FPS stdev      |  12.10  |   0.01

Single-frame results are equivalent, confirming no regression for
non-batching pipelines.

Signed-off-by: RyanOnTheInside <[email protected]>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 7, 2026

📝 Walkthrough

Walkthrough

Refactored FPS tracking in PipelineProcessor from per-frame delta measurements to batch-based throughput sampling. Replaced frame-level state tracking with batch accumulation (num_frames, interval) for improved averaging. Added comprehensive test suite validating frame delivery under various batch sizes, delays, and pause/resume scenarios.

Changes

Cohort / File(s) Summary
FPS Tracking Refactoring
src/scope/server/pipeline_processor.py
Replaced per-frame FPS logic with batch-based sampling. Swapped _track_output_frame() for _track_output_batch(num_frames, processing_time). Changed constants from OUTPUT_FPS_SAMPLE_SIZE/OUTPUT_FPS_MIN_SAMPLES to BATCH_FPS_SAMPLE_SIZE. Updated internal state: _batch_samples deque and _last_batch_time replacing frame-level tracking. Revised pause handling and FPS calculation to operate on accumulated batch data.
Frame Delivery Test Suite
tests/test_frame_delivery.py
New test module with TestFrameDelivery class containing four test methods covering large batch, small batch, single-frame, and pause/resume delivery scenarios. Includes StubPipeline helper for simulating configurable processing delays and _measure_delivery() orchestrator that captures FPS, jitter, stalls, and timing metrics across varying conditions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Batches now flow smooth as clover,
No more frame-by-frame we hover,
Throughput tallied, intervals measured true,
Delivery tested in every hue,
Pipeline hops with newfound pace! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: replacing per-frame FPS tracking with batch-level FPS tracking, which directly addresses the core problem of smooth frame delivery in batched scenarios.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/smooth-frame-pacing

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/scope/server/pipeline_processor.py (1)

510-525: ⚠️ Potential issue | 🟠 Major

Track only frames that actually made it into output_queue.

num_frames is the tensor batch size, but Lines 516-522 can drop frames on queue.Full. Feeding the full batch into _track_output_batch() inflates current_output_fps under backpressure, so consumers pace against frames they will never see.

Suggested fix
-            for frame in output:
+            enqueued_frames = 0
+            for frame in output:
                 frame = frame.unsqueeze(0)
                 try:
                     self.output_queue.put_nowait(frame)
+                    enqueued_frames += 1
                 except queue.Full:
                     logger.debug(
                         f"Output queue full for {self.pipeline_id}, dropping processed frame"
                     )
                     continue

             # Track batch-level throughput for FPS calculation
-            self._track_output_batch(num_frames, processing_time)
+            if enqueued_frames > 0:
+                self._track_output_batch(enqueued_frames, processing_time)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scope/server/pipeline_processor.py` around lines 510 - 525, The code
currently calls self._track_output_batch(num_frames, processing_time) using the
original tensor batch size even though some frames may be dropped on queue.Full;
change the logic in the output loop (the for frame in output block that calls
self.output_queue.put_nowait(frame) for pipeline_id) to count only successfully
enqueued frames (increment a local counter when put_nowait succeeds) and pass
that successful_frames count to self._track_output_batch(successful_frames,
processing_time) instead of num_frames so throughput/FPS reflects only frames
actually delivered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_frame_delivery.py`:
- Around line 39-46: The tests rely on real time via the helper
_measure_delivery which makes them slow and flaky; modify _measure_delivery to
accept injectable time and sleep callables (e.g., time_fn and sleep_fn) and use
those instead of time.time()/time.sleep(), then update its callers to pass a
deterministic fake clock/sleeper in unit tests (keeping the real time functions
only for an explicit integration/slow test). Alternatively, mark the existing
tests with a pytest marker like pytest.mark.slow or pytest.mark.integration so
CI can skip them by default; reference the _measure_delivery helper and all its
callers in this file when applying the change.

---

Outside diff comments:
In `@src/scope/server/pipeline_processor.py`:
- Around line 510-525: The code currently calls
self._track_output_batch(num_frames, processing_time) using the original tensor
batch size even though some frames may be dropped on queue.Full; change the
logic in the output loop (the for frame in output block that calls
self.output_queue.put_nowait(frame) for pipeline_id) to count only successfully
enqueued frames (increment a local counter when put_nowait succeeds) and pass
that successful_frames count to self._track_output_batch(successful_frames,
processing_time) instead of num_frames so throughput/FPS reflects only frames
actually delivered.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 362446c8-a606-40c5-b09b-2363c385aab7

📥 Commits

Reviewing files that changed from the base of the PR and between 0a52f7c and be5befe.

📒 Files selected for processing (2)
  • src/scope/server/pipeline_processor.py
  • tests/test_frame_delivery.py

Comment on lines +39 to +46
def _measure_delivery(
batch_size: int,
delay: float,
duration: float,
variance: float = 0.0,
pause_at: float | None = None,
pause_duration: float = 2.0,
) -> dict:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These tests are likely to be slow and flaky in CI.

The helper measures jitter from real time.sleep() / time.time() behavior, and the four callers keep the process alive for about 45 seconds total. On a busy runner, those thresholds can fail because of scheduler noise rather than an FPS regression. Please move this behind a slow/integration marker or inject a fake clock/sleeper so the pacing math is deterministic.

Also applies to: 68-121, 137-196

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_frame_delivery.py` around lines 39 - 46, The tests rely on real
time via the helper _measure_delivery which makes them slow and flaky; modify
_measure_delivery to accept injectable time and sleep callables (e.g., time_fn
and sleep_fn) and use those instead of time.time()/time.sleep(), then update its
callers to pass a deterministic fake clock/sleeper in unit tests (keeping the
real time functions only for an explicit integration/slow test). Alternatively,
mark the existing tests with a pytest marker like pytest.mark.slow or
pytest.mark.integration so CI can skip them by default; reference the
_measure_delivery helper and all its callers in this file when applying the
change.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 7, 2026

🚀 fal.ai Preview Deployment

App ID daydream/scope-pr-617--preview
WebSocket wss://fal.run/daydream/scope-pr-617--preview/ws
Commit 71791c2

Testing

Connect to this preview deployment by running this on your branch:

uv run build && SCOPE_CLOUD_APP_ID="daydream/scope-pr-617--preview/ws" uv run daydream-scope

🧪 E2E tests will run automatically against this deployment.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 7, 2026

✅ E2E Tests passed

Status passed
fal App daydream/scope-pr-617--preview
Run View logs

Test Artifacts

Check the workflow run for screenshots.

Copy link
Copy Markdown
Collaborator

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and this change caused longlive FPS drop from 18.8 -> 17.2 FPS. I wonder what is the cause, maybe related to the fact that now we measure time after all frames pass frame = frame.unsqueeze(0). Or maybe it's something about the fallback mechanism, now we fallback to the processing_time which may be lower when the pipeline warms up.

Definitely something we need to investigate before merging this PR.

But conceptually, it looks fine.

@ryanontheinside
Copy link
Copy Markdown
Collaborator Author

The lower FPS you're seeing is expected. The old per-frame delta tracking was overestimating FPS for batched pipelines.

The root cause is in next_timestamp() in tracks.py (line 81), which actively sleeps based on frame_ptime = 1.0 / get_fps(). When get_fps() returns an inflated number, frames get paced faster than they're actually produced.

Here's why the old code inflates FPS for batched pipelines: when 12 frames are enqueued in a tight loop, 11 of the 30 sampled deltas are near-zero (intra-batch), which drags the average delta down and pushes the FPS estimate up.

Simulating both algorithms against identical production patterns:

LongLive (batch=12, 0.5s per batch, true throughput = 24 FPS):

  • Old estimate: 30 FPS (25% inflation), oscillates between 20-30
  • New estimate: 24 FPS (accurate), rock steady
  • Old code makes next_timestamp() sleep 8.3ms less per frame than it should

SDv2 (batch=4, 0.2s per batch, true throughput = 20 FPS):

  • Old estimate: 20.4 FPS (1.8% inflation), oscillates between 18.8-21.4
  • New estimate: 20 FPS (accurate), steady

Single-frame (no batching, true throughput = 30 FPS):

  • Both algorithms report 30.3 FPS, identical behavior

The pipeline isn't producing frames any slower. The old number was inflated by burst delivery, and the client-side WebRTC stats were averaging over those bursts. The new code paces frames evenly at the true production rate, which looks smoother but shows a lower (accurate) number.

@leszko
Copy link
Copy Markdown
Collaborator

leszko commented Mar 9, 2026

The lower FPS you're seeing is expected. The old per-frame delta tracking was overestimating FPS for batched pipelines.

The root cause is in next_timestamp() in tracks.py (line 81), which actively sleeps based on frame_ptime = 1.0 / get_fps(). When get_fps() returns an inflated number, frames get paced faster than they're actually produced.

Here's why the old code inflates FPS for batched pipelines: when 12 frames are enqueued in a tight loop, 11 of the 30 sampled deltas are near-zero (intra-batch), which drags the average delta down and pushes the FPS estimate up.

Simulating both algorithms against identical production patterns:

LongLive (batch=12, 0.5s per batch, true throughput = 24 FPS):

  • Old estimate: 30 FPS (25% inflation), oscillates between 20-30
  • New estimate: 24 FPS (accurate), rock steady
  • Old code makes next_timestamp() sleep 8.3ms less per frame than it should

SDv2 (batch=4, 0.2s per batch, true throughput = 20 FPS):

  • Old estimate: 20.4 FPS (1.8% inflation), oscillates between 18.8-21.4
  • New estimate: 20 FPS (accurate), steady

Single-frame (no batching, true throughput = 30 FPS):

  • Both algorithms report 30.3 FPS, identical behavior

The pipeline isn't producing frames any slower. The old number was inflated by burst delivery, and the client-side WebRTC stats were averaging over those bursts. The new code paces frames evenly at the true production rate, which looks smoother but shows a lower (accurate) number.

Ok, I reduced the OUTPUT_QUEUE_MAX_SIZE_FACTOR from 3 to 2. I think it's ok for now. This makes the latency slightly better.

@leszko leszko merged commit 66e8d31 into main Mar 9, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants