Conversation
…ia clock in webrtc. Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
…dio. Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
Signed-off-by: BuffMcBigHuge <[email protected]>
j0sh
left a comment
There was a problem hiding this comment.
Thanks, this does seem somewhat simpler than the last iteration.
I don't want to block this for the sake of shipping something if it seems to be working alright for now, but there are a couple things I don't quite understand.
- Video and audio are being paced differently. Video effectively uses wall-clock while audio is using sample counts. Is there a reason for this?
- Using wall-clock makes things susceptible to pipeline jitter.
- If audio output is a little delayed, it doesn't get a chance for another 20ms. This accumulates and will lead to desync. Conversely, audio that might be a little bursty can be unnecessarily delayed.
- Is there a reason the audio queue is non-blocking? It seems preferable to block (up to a reasonable duration, then silence can be inserted) instead of sleeping. But maybe I'm missing something about why media pulls are intended to be non-blocking.
- In general, I'd consider using input timestamps or a "reference clock" to timestamp and pace the output, rather depending on wall-clock after the pipeline. Pipelines may produce output at different rates and this architecture generally doesn't account for that. More on this in Discord.
|
Thank you @j0sh, let's continue in Discord. |
src/scope/server/tracks.py
Outdated
| # Interleave into buffer: [L0, R0, L1, R1, ...] | ||
| for i in range(audio_np.shape[1]): | ||
| for ch in range(self.channels): | ||
| self._audio_buffer.append(audio_np[ch, i]) | ||
|
|
||
| # Serve a 20ms frame from the buffer | ||
| samples_needed = self._samples_per_frame * self.channels | ||
| if len(self._audio_buffer) >= samples_needed: | ||
| samples = [self._audio_buffer.popleft() for _ in range(samples_needed)] |
There was a problem hiding this comment.
Do these loop over individual samples? That is probably pretty expensive; best to work on complete 20ms frames if possible. There's probably some numpy wizardry for fast interleaving and effective frame chunking.
🚀 fal.ai Preview Deployment
TestingConnect to this preview deployment by running this on your branch: 🧪 E2E tests will run automatically against this deployment. |
✅ E2E Tests passed
Test ArtifactsCheck the workflow run for screenshots. |
…ioProcessingTrack Addresses review feedback on #534. The audio buffer interleaving and frame extraction used O(n) Python loops over individual samples, which is expensive for real-time audio. Now uses np.ravel(order="F") for interleaving and numpy slicing for frame extraction. Also adds 42 tests covering interleaving, buffering, resampling, channel conversion, frame construction, and adversarial inputs. Signed-off-by: RyanOnTheInside <[email protected]>
…perf Fix AudioProcessingTrack per-sample loop performance
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Signed-off-by: BuffMcBigHuge <[email protected]>
…oProcessingTrack Move FrameProcessor ownership from VideoProcessingTrack to Session so it can be shared cleanly between video and audio tracks. When a pipeline declares produces_video=False, skip video track creation entirely and deliver audio through the existing AudioProcessingTrack path. Key changes: - Session owns and manages FrameProcessor lifecycle - FrameProcessor injected into VideoProcessingTrack via constructor (DI) - Audio callback fires before video early-return in PipelineProcessor - Add produces_video / requires_audio_input ClassVars to BasePipelineConfig - Frontend shows audio-only indicator when no video track is present Signed-off-by: RyanOnTheInside <[email protected]>
Clean up unused requires_audio_input ClassVar from BasePipelineConfig and remove test_tone pipeline from the registry. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
Add explanatory comments and docstring documenting why Fortran-order ravel produces the correct packed interleaved format for PyAV s16 layout. Add round-trip test verifying samples survive interleave, int16 conversion, AudioFrame, and planes[0] readback. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
Remove 15 tests that tested numpy operations (ravel, slicing, vstack, mean) rather than actual class methods. The remaining 27 tests all call real methods (recv, stop, _create_audio_frame, _resample_audio). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
- Initialize FrameProcessor.paused to False to prevent AttributeError when AudioProcessingTrack.recv() is called before any pause/resume - Stop audio_track in Session.close() to ensure buffer cleanup on teardown - Consolidate AUDIO_CLOCK_RATE in media_clock.py, remove duplicate from tracks.py and unused VIDEO_CLOCK_RATE from media_clock.py Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
MediaClock was only used by VideoProcessingTrack while AudioProcessingTrack used its own independent clock, so it added complexity without delivering actual synchronization. Video track now uses a simple monotonic PTS counter matching the audio pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
…n-audio pipelines - recv() now drains all queued audio chunks per call instead of one, reducing latency for bursty/small-chunk pipelines - Cap audio buffer at 1 second to prevent unbounded memory growth - Add produces_audio ClassVar to BasePipelineConfig; only create AudioProcessingTrack when a pipeline declares audio support - Update tests for drain-loop compatibility and add coverage for queue drain and buffer cap behaviors Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
…, use deque buffer
- FFT resampling caused spectral leakage artifacts at chunk boundaries;
linear interpolation (np.interp) is chunk-boundary safe
- Restore None guard on output_dict before .get("audio") to prevent
AttributeError when pipeline returns None
- Replace np.concatenate/slice buffer with collections.deque to avoid
O(buffer_size) copies on every chunk append
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Signed-off-by: Rafał Leszko <[email protected]>
…ame creation Move chain_produces_video/chain_produces_audio from webrtc.py into PipelineRegistry as class methods for better encapsulation. Guard _create_audio_frame against NaN/inf with nan_to_num. Consolidate and clean up audio processing track tests using parametrize. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
The log message assumed a non-video pipeline must be audio-only, but the audio check happens separately afterwards. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
Consistent with cloud_track.py pattern — each track type gets its own file. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
Fix audio-only pipeline bugs where pause, parameter broadcasts, and get_frame_processor() silently skipped sessions without a video track. All three now fall back to session.frame_processor when video_track is None. Add audio recording support: RecordingManager accepts optional video and audio tracks, recording setup moved outside the produces_video block so it works for video-only, audio-only, or combined sessions. AudioTimestampNormalizingTrack normalizes PTS for recording restarts. Fix test import to use audio_track module after earlier refactor. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
Audio output from pipelines now flows through a queue on PipelineProcessor (audio_output_queue) instead of a callback, making it symmetric with how video frames are delivered. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
…xtraction
Move audio pacing responsibility to individual pipelines instead of
the processor loop. Remove redundant second output_dict.get("video") call.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Signed-off-by: Rafał Leszko <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
Port the audio-beep plugin into the core pipelines directory so it is available without installing a separate plugin. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
Cloud pipelines that produce audio now deliver it to the browser. The audio flows through: cloud WebRTC track → CloudWebRTCClient → CloudConnectionManager → FrameProcessor cloud audio queue → AudioProcessingTrack → browser. Key changes: - CloudWebRTCClient handles audio tracks and receives audio frames - CloudConnectionManager forwards audio via callbacks - FrameProcessor queues cloud audio for AudioProcessingTrack - CloudTrack accepts an injected FrameProcessor (shared with audio) - handle_offer_with_relay creates AudioProcessingTrack and wires the browser audio transceiver Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
Without this, the cloud handle_offer has no audio m-line to attach an AudioProcessingTrack to, so pipeline audio is produced but never sent back -- filling and overflowing the audio output queue. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
Cloud sends packed s16 stereo where to_ndarray() returns (1, 1920) -- interleaved channel pairs in a single plane. Without de-interleaving, AudioProcessingTrack misidentifies this as mono and duplicates it, producing 2x the expected data rate (constant buffer overflow) and garbled output. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
In relay mode we cannot know the cloud pipeline capabilities, so we always create both CloudTrack and AudioProcessingTrack. For audio-only pipelines the CloudTrack emits small black frames at 1fps to keep the browser MediaStream active so audio playback is not blocked. For video-only pipelines the AudioProcessingTrack sends silence harmlessly. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
…ities Expose produces_video and produces_audio flags from the pipeline registry through the status endpoint and into WebRTC negotiation. This avoids creating unnecessary tracks (no video track for audio-only pipelines, no audio transceiver for video-only pipelines) and removes the black-frame fallback hack from CloudTrack. Also increases the audio buffer to 3s to handle bursty pipelines like LTX2, and improves buffer trimming to keep newest samples instead of dropping entire chunks. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
These test pipelines are available as separate plugins and don't need to be bundled in core. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>
Audio Support for Scope (Reworked)
Overall, this approach is simplified and seems to solve audio quality issues. The complex gating mechanics and resampling have been removed, leveraging WebRTC aiortc-style timing.
Summary
Adds end-to-end audio support to Scope's WebRTC streaming pipeline. Pipelines can return audio alongside video in their output dict; the server streams audio over WebRTC. This is a simplified rewrite of the audio path that fixes clipping and audio quality issues reported in PR #480.
What's New
Backend
{"video": ..., "audio": ..., "audio_sample_rate": ...}. Audio keys are optional; pipelines that don't produce audio are unchanged.audio_callbackinstead of a queue. Only the last processor in a chain receives the callback.audio_queuefor raw(audio_tensor, sample_rate)tuples. No background drain thread, no video-gated release, no resampling.Frontend
MediaStream. Adds a recvonly audio transceiver so the SDP offer includes an audio m-line for the backend to attach its track.WebRTC Handshake
The browser adds
addTransceiver("audio", { direction: "recvonly" })so the offer includes an audio m-line. AftersetRemoteDescription, the backend finds the audio transceiver, attaches itsAudioProcessingTrack, and sets direction tosendonly. The answer then indicates that the server will send audio.Why This Version (audio-sync-2)
PR #480 received feedback about:
This branch is a simplified rewrite that:
Architecture
Trade-offs
Files Changed
src/scope/server/frame_processor.py– Simplified audio path (~185 net lines removed)src/scope/server/pipeline_processor.py– Callback-based audio deliverysrc/scope/server/tracks.py– Stereo AudioProcessingTrack with per-channel resamplingsrc/scope/server/webrtc.py– Audio track wiring (no MediaClock)Related