fix: batch-level FPS tracking for smooth frame delivery#617
Conversation
Replace per-frame delta FPS tracking with batch-level throughput
measurement (sum(frames) / sum(intervals)) in PipelineProcessor.
The old approach measured inter-frame deltas, but in batched pipelines
(e.g., 12 frames emitted at once), near-zero intra-batch deltas mixed
with large inter-batch gaps caused the FPS estimate to oscillate
permanently. This produced unstable frame pacing downstream.
The new approach records one sample per pipeline call as a
(num_frames, interval) tuple, then computes FPS over a sliding window
of 10 batches. This gives a stable, accurate estimate regardless of
batch size.
Benchmark results (10s runs, real PipelineProcessor with stub pipelines):
Scenario | Metric | Old | New
--------------------|----------------|---------|--------
LongLive (batch=12) | Jitter | 84.1% | 0.3%
| Stalls (>1.5x) | 7.9% | 0.0%
| FPS error | 34.3% | 0.2%
| FPS stdev | 7.23 | 0.01
SDv2 (batch=4) | Jitter | 29.1% | 5.8%
| Stalls (>1.5x) | 4.6% | 0.0%
| FPS error | 13.8% | 1.0%
| FPS stdev | 5.24 | 0.69
Single-frame | Jitter | 2.3% | 4.5%
| FPS error | 2.7% | 2.7%
| FPS stdev | 1.89 | 1.35
Pause/resume | Jitter | 112.9% | 0.3%
| Stalls (>1.5x) | 7.3% | 0.0%
| FPS error | 53.8% | 0.2%
| FPS stdev | 12.10 | 0.01
Single-frame results are equivalent, confirming no regression for
non-batching pipelines.
Signed-off-by: RyanOnTheInside <[email protected]>
📝 WalkthroughWalkthroughRefactored FPS tracking in PipelineProcessor from per-frame delta measurements to batch-based throughput sampling. Replaced frame-level state tracking with batch accumulation (num_frames, interval) for improved averaging. Added comprehensive test suite validating frame delivery under various batch sizes, delays, and pause/resume scenarios. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/scope/server/pipeline_processor.py (1)
510-525:⚠️ Potential issue | 🟠 MajorTrack only frames that actually made it into
output_queue.
num_framesis the tensor batch size, but Lines 516-522 can drop frames onqueue.Full. Feeding the full batch into_track_output_batch()inflatescurrent_output_fpsunder backpressure, so consumers pace against frames they will never see.Suggested fix
- for frame in output: + enqueued_frames = 0 + for frame in output: frame = frame.unsqueeze(0) try: self.output_queue.put_nowait(frame) + enqueued_frames += 1 except queue.Full: logger.debug( f"Output queue full for {self.pipeline_id}, dropping processed frame" ) continue # Track batch-level throughput for FPS calculation - self._track_output_batch(num_frames, processing_time) + if enqueued_frames > 0: + self._track_output_batch(enqueued_frames, processing_time)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scope/server/pipeline_processor.py` around lines 510 - 525, The code currently calls self._track_output_batch(num_frames, processing_time) using the original tensor batch size even though some frames may be dropped on queue.Full; change the logic in the output loop (the for frame in output block that calls self.output_queue.put_nowait(frame) for pipeline_id) to count only successfully enqueued frames (increment a local counter when put_nowait succeeds) and pass that successful_frames count to self._track_output_batch(successful_frames, processing_time) instead of num_frames so throughput/FPS reflects only frames actually delivered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_frame_delivery.py`:
- Around line 39-46: The tests rely on real time via the helper
_measure_delivery which makes them slow and flaky; modify _measure_delivery to
accept injectable time and sleep callables (e.g., time_fn and sleep_fn) and use
those instead of time.time()/time.sleep(), then update its callers to pass a
deterministic fake clock/sleeper in unit tests (keeping the real time functions
only for an explicit integration/slow test). Alternatively, mark the existing
tests with a pytest marker like pytest.mark.slow or pytest.mark.integration so
CI can skip them by default; reference the _measure_delivery helper and all its
callers in this file when applying the change.
---
Outside diff comments:
In `@src/scope/server/pipeline_processor.py`:
- Around line 510-525: The code currently calls
self._track_output_batch(num_frames, processing_time) using the original tensor
batch size even though some frames may be dropped on queue.Full; change the
logic in the output loop (the for frame in output block that calls
self.output_queue.put_nowait(frame) for pipeline_id) to count only successfully
enqueued frames (increment a local counter when put_nowait succeeds) and pass
that successful_frames count to self._track_output_batch(successful_frames,
processing_time) instead of num_frames so throughput/FPS reflects only frames
actually delivered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 362446c8-a606-40c5-b09b-2363c385aab7
📒 Files selected for processing (2)
src/scope/server/pipeline_processor.pytests/test_frame_delivery.py
| def _measure_delivery( | ||
| batch_size: int, | ||
| delay: float, | ||
| duration: float, | ||
| variance: float = 0.0, | ||
| pause_at: float | None = None, | ||
| pause_duration: float = 2.0, | ||
| ) -> dict: |
There was a problem hiding this comment.
These tests are likely to be slow and flaky in CI.
The helper measures jitter from real time.sleep() / time.time() behavior, and the four callers keep the process alive for about 45 seconds total. On a busy runner, those thresholds can fail because of scheduler noise rather than an FPS regression. Please move this behind a slow/integration marker or inject a fake clock/sleeper so the pacing math is deterministic.
Also applies to: 68-121, 137-196
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_frame_delivery.py` around lines 39 - 46, The tests rely on real
time via the helper _measure_delivery which makes them slow and flaky; modify
_measure_delivery to accept injectable time and sleep callables (e.g., time_fn
and sleep_fn) and use those instead of time.time()/time.sleep(), then update its
callers to pass a deterministic fake clock/sleeper in unit tests (keeping the
real time functions only for an explicit integration/slow test). Alternatively,
mark the existing tests with a pytest marker like pytest.mark.slow or
pytest.mark.integration so CI can skip them by default; reference the
_measure_delivery helper and all its callers in this file when applying the
change.
🚀 fal.ai Preview Deployment
TestingConnect to this preview deployment by running this on your branch: 🧪 E2E tests will run automatically against this deployment. |
✅ E2E Tests passed
Test ArtifactsCheck the workflow run for screenshots. |
leszko
left a comment
There was a problem hiding this comment.
I checked and this change caused longlive FPS drop from 18.8 -> 17.2 FPS. I wonder what is the cause, maybe related to the fact that now we measure time after all frames pass frame = frame.unsqueeze(0). Or maybe it's something about the fallback mechanism, now we fallback to the processing_time which may be lower when the pipeline warms up.
Definitely something we need to investigate before merging this PR.
But conceptually, it looks fine.
|
The lower FPS you're seeing is expected. The old per-frame delta tracking was overestimating FPS for batched pipelines. The root cause is in Here's why the old code inflates FPS for batched pipelines: when 12 frames are enqueued in a tight loop, 11 of the 30 sampled deltas are near-zero (intra-batch), which drags the average delta down and pushes the FPS estimate up. Simulating both algorithms against identical production patterns: LongLive (batch=12, 0.5s per batch, true throughput = 24 FPS):
SDv2 (batch=4, 0.2s per batch, true throughput = 20 FPS):
Single-frame (no batching, true throughput = 30 FPS):
The pipeline isn't producing frames any slower. The old number was inflated by burst delivery, and the client-side WebRTC stats were averaging over those bursts. The new code paces frames evenly at the true production rate, which looks smoother but shows a lower (accurate) number. |
Signed-off-by: Rafal Leszko <[email protected]>
Ok, I reduced the |
Replace per-frame delta FPS tracking with batch-level throughput measurement (sum(frames) / sum(intervals)) in PipelineProcessor.
The old approach measured inter-frame deltas, but in batched pipelines (e.g., 12 frames emitted at once), near-zero intra-batch deltas mixed with large inter-batch gaps caused the FPS estimate to oscillate permanently. This produced unstable frame pacing downstream.
The new approach records one sample per pipeline call as a (num_frames, interval) tuple, then computes FPS over a sliding window of 10 batches. This gives a stable, accurate estimate regardless of batch size.
Benchmark results (10s runs, real PipelineProcessor with stub pipelines):
Summary by CodeRabbit
Refactor
Tests