Audio Support for Scope Rework by BuffMcBigHuge · Pull Request #534 · daydreamlive/scope

BuffMcBigHuge · 2026-02-24T20:21:20Z

Audio Support for Scope (Reworked)

Overall, this approach is simplified and seems to solve audio quality issues. The complex gating mechanics and resampling have been removed, leveraging WebRTC aiortc-style timing.

Summary

Adds end-to-end audio support to Scope's WebRTC streaming pipeline. Pipelines can return audio alongside video in their output dict; the server streams audio over WebRTC. This is a simplified rewrite of the audio path that fixes clipping and audio quality issues reported in PR #480.

What's New

Backend

Pipeline interface: Pipelines may return {"video": ..., "audio": ..., "audio_sample_rate": ...}. Audio keys are optional; pipelines that don't produce audio are unchanged.
PipelineProcessor: Uses an audio_callback instead of a queue. Only the last processor in a chain receives the callback.
FrameProcessor: Simple audio_queue for raw (audio_tensor, sample_rate) tuples. No background drain thread, no video-gated release, no resampling.
AudioProcessingTrack: Receives raw audio from FrameProcessor, resamples to 48 kHz when needed (per-channel, preserves stereo), buffers samples, and delivers 20ms stereo frames for WebRTC/Opus. Uses aiortc-style monotonic pacing.

Frontend

VideoOutput: Mute/unmute toggle (speaker icon). Starts muted to satisfy browser autoplay policy; user can unmute once the stream is playing.
useUnifiedWebRTC: Merges video and audio tracks into a single MediaStream. Adds a recvonly audio transceiver so the SDP offer includes an audio m-line for the backend to attach its track.

WebRTC Handshake

The browser adds addTransceiver("audio", { direction: "recvonly" }) so the offer includes an audio m-line. After setRemoteDescription, the backend finds the audio transceiver, attaches its AudioProcessingTrack, and sets direction to sendonly. The answer then indicates that the server will send audio.

Why This Version (audio-sync-2)

PR #480 received feedback about:

Audio quality / clipping – video and audio not synchronized, poor experience
Code complexity – video-gated release, drain thread, MediaClock coupling

This branch is a simplified rewrite that:

Keeps stereo – No forced mono mixdown. Pipelines (e.g. LTX-2) output stereo; we pass it through. Forced mono can cause phase cancellation and artifacts.
Passthrough when possible – If the pipeline outputs 48 kHz (WebRTC standard), resampling is skipped entirely.
Per-channel resampling – When resampling is needed, each channel is resampled separately, preserving stereo.
Simpler architecture – Callback → queue → track. No drain thread, no video-gated slicing, no MediaClock for audio.

Architecture

Pipeline.__call__() → {"video": tensor, "audio": tensor, "audio_sample_rate": int}
    │
    ▼
PipelineProcessor.audio_callback(audio_tensor, sample_rate)
    │
    ▼
FrameProcessor.audio_queue  (raw tuples)
    │
    ▼
AudioProcessingTrack.recv()  (resample if needed, buffer, 20ms frames)
    │
    ▼
WebRTC

Trade-offs

Item	Status
NDI audio	Not included in this PR. NDI output sinks do not receive audio. Can be re-added in a follow-up.
A/V sync	Previous video-gated approach removed. Uses independent aiortc-style pacing. Please test sync behavior.

Files Changed

src/scope/server/frame_processor.py – Simplified audio path (~185 net lines removed)
src/scope/server/pipeline_processor.py – Callback-based audio delivery
src/scope/server/tracks.py – Stereo AudioProcessingTrack with per-channel resampling
src/scope/server/webrtc.py – Audio track wiring (no MediaClock)

Audio Support for Scope #480 – Original audio support PR (superseded by this approach)
Architecture doc
Pipelines that produce audio (e.g. LTX-2, https://github.com/daydreamlive/scope-ltx-2/tree/marco/feat/performance-audio) are wired separately; this PR provides the infrastructure.

…ia clock in webrtc. Signed-off-by: BuffMcBigHuge <[email protected]>

Signed-off-by: BuffMcBigHuge <[email protected]>

…dio. Signed-off-by: BuffMcBigHuge <[email protected]>

Signed-off-by: BuffMcBigHuge <[email protected]>

j0sh

Thanks, this does seem somewhat simpler than the last iteration.

I don't want to block this for the sake of shipping something if it seems to be working alright for now, but there are a couple things I don't quite understand.

Video and audio are being paced differently. Video effectively uses wall-clock while audio is using sample counts. Is there a reason for this?
Using wall-clock makes things susceptible to pipeline jitter.
If audio output is a little delayed, it doesn't get a chance for another 20ms. This accumulates and will lead to desync. Conversely, audio that might be a little bursty can be unnecessarily delayed.
Is there a reason the audio queue is non-blocking? It seems preferable to block (up to a reasonable duration, then silence can be inserted) instead of sleeping. But maybe I'm missing something about why media pulls are intended to be non-blocking.
In general, I'd consider using input timestamps or a "reference clock" to timestamp and pace the output, rather depending on wall-clock after the pipeline. Pipelines may produce output at different rates and this architecture generally doesn't account for that. More on this in Discord.

BuffMcBigHuge · 2026-02-25T00:45:24Z

Thank you @j0sh, let's continue in Discord.

j0sh · 2026-02-25T04:35:56Z

src/scope/server/tracks.py

+            # Interleave into buffer: [L0, R0, L1, R1, ...]
+            for i in range(audio_np.shape[1]):
+                for ch in range(self.channels):
+                    self._audio_buffer.append(audio_np[ch, i])
+
+        # Serve a 20ms frame from the buffer
+        samples_needed = self._samples_per_frame * self.channels
+        if len(self._audio_buffer) >= samples_needed:
+            samples = [self._audio_buffer.popleft() for _ in range(samples_needed)]


Do these loop over individual samples? That is probably pretty expensive; best to work on complete 20ms frames if possible. There's probably some numpy wizardry for fast interleaving and effective frame chunking.

i attempted to address this here #633

github-actions · 2026-03-02T16:19:16Z

🚀 fal.ai Preview Deployment


App ID	`daydream/scope-pr-534--preview`
WebSocket	`wss://fal.run/daydream/scope-pr-534--preview/ws`
Commit	`7f5e460`

Testing

Connect to this preview deployment by running this on your branch:

uv run build && SCOPE_CLOUD_APP_ID="daydream/scope-pr-534--preview/ws" uv run daydream-scope

🧪 E2E tests will run automatically against this deployment.

github-actions · 2026-03-02T16:22:24Z

✅ E2E Tests passed


Status	passed
fal App	`daydream/scope-pr-534--preview`
Run	View logs

Test Artifacts

Check the workflow run for screenshots.

…ioProcessingTrack Addresses review feedback on #534. The audio buffer interleaving and frame extraction used O(n) Python loops over individual samples, which is expensive for real-time audio. Now uses np.ravel(order="F") for interleaving and numpy slicing for frame extraction. Also adds 42 tests covering interleaving, buffering, resampling, channel conversion, frame construction, and adversarial inputs. Signed-off-by: RyanOnTheInside <[email protected]>

…perf Fix AudioProcessingTrack per-sample loop performance

coderabbitai · 2026-03-09T20:09:00Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ccfac82d-94d7-41ea-849d-7a68cd50815d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch marco/feat/audio-sync-2

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: BuffMcBigHuge <[email protected]>

…oProcessingTrack Move FrameProcessor ownership from VideoProcessingTrack to Session so it can be shared cleanly between video and audio tracks. When a pipeline declares produces_video=False, skip video track creation entirely and deliver audio through the existing AudioProcessingTrack path. Key changes: - Session owns and manages FrameProcessor lifecycle - FrameProcessor injected into VideoProcessingTrack via constructor (DI) - Audio callback fires before video early-return in PipelineProcessor - Add produces_video / requires_audio_input ClassVars to BasePipelineConfig - Frontend shows audio-only indicator when no video track is present Signed-off-by: RyanOnTheInside <[email protected]>

Clean up unused requires_audio_input ClassVar from BasePipelineConfig and remove test_tone pipeline from the registry. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

Add explanatory comments and docstring documenting why Fortran-order ravel produces the correct packed interleaved format for PyAV s16 layout. Add round-trip test verifying samples survive interleave, int16 conversion, AudioFrame, and planes[0] readback. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

Remove 15 tests that tested numpy operations (ravel, slicing, vstack, mean) rather than actual class methods. The remaining 27 tests all call real methods (recv, stop, _create_audio_frame, _resample_audio). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

- Initialize FrameProcessor.paused to False to prevent AttributeError when AudioProcessingTrack.recv() is called before any pause/resume - Stop audio_track in Session.close() to ensure buffer cleanup on teardown - Consolidate AUDIO_CLOCK_RATE in media_clock.py, remove duplicate from tracks.py and unused VIDEO_CLOCK_RATE from media_clock.py Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

MediaClock was only used by VideoProcessingTrack while AudioProcessingTrack used its own independent clock, so it added complexity without delivering actual synchronization. Video track now uses a simple monotonic PTS counter matching the audio pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

…n-audio pipelines - recv() now drains all queued audio chunks per call instead of one, reducing latency for bursty/small-chunk pipelines - Cap audio buffer at 1 second to prevent unbounded memory growth - Add produces_audio ClassVar to BasePipelineConfig; only create AudioProcessingTrack when a pipeline declares audio support - Update tests for drain-loop compatibility and add coverage for queue drain and buffer cap behaviors Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

…, use deque buffer - FFT resampling caused spectral leakage artifacts at chunk boundaries; linear interpolation (np.interp) is chunk-boundary safe - Restore None guard on output_dict before .get("audio") to prevent AttributeError when pipeline returns None - Replace np.concatenate/slice buffer with collections.deque to avoid O(buffer_size) copies on every chunk append Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

…ame creation Move chain_produces_video/chain_produces_audio from webrtc.py into PipelineRegistry as class methods for better encapsulation. Guard _create_audio_frame against NaN/inf with nan_to_num. Consolidate and clean up audio processing track tests using parametrize. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

The log message assumed a non-video pipeline must be audio-only, but the audio check happens separately afterwards. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

Consistent with cloud_track.py pattern — each track type gets its own file. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

Fix audio-only pipeline bugs where pause, parameter broadcasts, and get_frame_processor() silently skipped sessions without a video track. All three now fall back to session.frame_processor when video_track is None. Add audio recording support: RecordingManager accepts optional video and audio tracks, recording setup moved outside the produces_video block so it works for video-only, audio-only, or combined sessions. AudioTimestampNormalizingTrack normalizes PTS for recording restarts. Fix test import to use audio_track module after earlier refactor. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

Audio output from pipelines now flows through a queue on PipelineProcessor (audio_output_queue) instead of a callback, making it symmetric with how video frames are delivered. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

…xtraction Move audio pacing responsibility to individual pipelines instead of the processor loop. Remove redundant second output_dict.get("video") call. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

Port the audio-beep plugin into the core pipelines directory so it is available without installing a separate plugin. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

Cloud pipelines that produce audio now deliver it to the browser. The audio flows through: cloud WebRTC track → CloudWebRTCClient → CloudConnectionManager → FrameProcessor cloud audio queue → AudioProcessingTrack → browser. Key changes: - CloudWebRTCClient handles audio tracks and receives audio frames - CloudConnectionManager forwards audio via callbacks - FrameProcessor queues cloud audio for AudioProcessingTrack - CloudTrack accepts an injected FrameProcessor (shared with audio) - handle_offer_with_relay creates AudioProcessingTrack and wires the browser audio transceiver Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

Without this, the cloud handle_offer has no audio m-line to attach an AudioProcessingTrack to, so pipeline audio is produced but never sent back -- filling and overflowing the audio output queue. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

Cloud sends packed s16 stereo where to_ndarray() returns (1, 1920) -- interleaved channel pairs in a single plane. Without de-interleaving, AudioProcessingTrack misidentifies this as mono and duplicates it, producing 2x the expected data rate (constant buffer overflow) and garbled output. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

In relay mode we cannot know the cloud pipeline capabilities, so we always create both CloudTrack and AudioProcessingTrack. For audio-only pipelines the CloudTrack emits small black frames at 1fps to keep the browser MediaStream active so audio playback is not blocked. For video-only pipelines the AudioProcessingTrack sends silence harmlessly. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

…ities Expose produces_video and produces_audio flags from the pipeline registry through the status endpoint and into WebRTC negotiation. This avoids creating unnecessary tracks (no video track for audio-only pipelines, no audio transceiver for video-only pipelines) and removes the black-frame fallback hack from CloudTrack. Also increases the audio buffer to 3s to handle bursty pipelines like LTX2, and improves buffer trimming to keep newest samples instead of dropping entire chunks. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

These test pipelines are available as separate plugins and don't need to be bundled in core. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

BuffMcBigHuge added 17 commits February 17, 2026 17:39

Audio with NDI, audio buffer in frame loop, added audio track and med…

b96e00e

…ia clock in webrtc. Signed-off-by: BuffMcBigHuge <[email protected]>

Frontend audio work.

ad8f5d0

Signed-off-by: BuffMcBigHuge <[email protected]>

Import fixes.

c1b098a

Signed-off-by: BuffMcBigHuge <[email protected]>

Modified order of operations for audio track.

ef3aff2

Signed-off-by: BuffMcBigHuge <[email protected]>

Modification to audio handshake.

32c8833

Signed-off-by: BuffMcBigHuge <[email protected]>

Solving issues with audio handshake.

acd30d5

Signed-off-by: BuffMcBigHuge <[email protected]>

Audio support testing and logging.

9f9662f

Signed-off-by: BuffMcBigHuge <[email protected]>

Fighting with audio connection handshake issue.

fb35011

Signed-off-by: BuffMcBigHuge <[email protected]>

Merge branch 'main' into marco/feat/audio.

65d65f7

Signed-off-by: BuffMcBigHuge <[email protected]>

Mediaclock rework.

073a6a2

Signed-off-by: BuffMcBigHuge <[email protected]>

Added video/audio gating for syncing.

6a9eec6

Signed-off-by: BuffMcBigHuge <[email protected]>

Removed gating, added frame rate handling for smooth playback with au…

d0c80bc

…dio. Signed-off-by: BuffMcBigHuge <[email protected]>

Attempt at polyphase audio version with gating.

8f92de4

Signed-off-by: BuffMcBigHuge <[email protected]>

Simplified audio sampling, added native fps handler.

ebf1392

Signed-off-by: BuffMcBigHuge <[email protected]>

Simplified re-write of audio with no resampling, just passthrough

9bba416

Signed-off-by: BuffMcBigHuge <[email protected]>

Removed unused code.

630f988

Signed-off-by: BuffMcBigHuge <[email protected]>

Merge branch 'main' into marco/feat/audio-sync-2

db175c1

This was referenced Feb 24, 2026

Audio support, controls, performance. daydreamlive/scope-ltx-2#3

Merged

Audio Support for Scope #480

Closed

j0sh reviewed Feb 24, 2026

View reviewed changes

j0sh reviewed Feb 25, 2026

View reviewed changes

Merge branch 'main' into marco/feat/audio-sync-2

acf091d

ryanontheinside mentioned this pull request Mar 9, 2026

Fix AudioProcessingTrack per-sample loop performance #633

Merged

3 tasks

Merge pull request #633 from daydreamlive/ryan/fix/audio-track-numpy-…

2c445ba

…perf Fix AudioProcessingTrack per-sample loop performance

ryanontheinside mentioned this pull request Mar 9, 2026

feat: audio-only pipeline support, decouple FrameProcessor from VideoProcessingTrack #639

Closed

6 tasks

BuffMcBigHuge and others added 25 commits March 17, 2026 16:16

Merge branch 'main' into marco/feat/audio-sync-2.

2723c47

Signed-off-by: BuffMcBigHuge <[email protected]>

refactor: move AudioProcessingTrack into its own audio_track.py module

7279c68

Consistent with cloud_track.py pattern — each track type gets its own file. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

fix: add missing FrameProcessor import for type checking

6a553a5

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

feat: add audio-beep as built-in pipeline

f03b0b6

Port the audio-beep plugin into the core pipelines directory so it is available without installing a separate plugin. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

feat: add audio-video-test as built-in pipeline

78fa81b

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

Fix cloud

31cab45

Remove built-in audio-beep and audio-video-test pipelines

7f5e460

These test pipelines are available as separate plugins and don't need to be bundled in core. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]>

leszko marked this pull request as ready for review March 19, 2026 15:45

leszko approved these changes Mar 19, 2026

View reviewed changes

leszko merged commit 53a49cf into main Mar 19, 2026
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio Support for Scope Rework#534

Audio Support for Scope Rework#534
leszko merged 45 commits intomainfrom
marco/feat/audio-sync-2

BuffMcBigHuge commented Feb 24, 2026

Uh oh!

j0sh left a comment

Uh oh!

BuffMcBigHuge commented Feb 25, 2026

Uh oh!

j0sh Feb 25, 2026

Uh oh!

ryanontheinside Mar 9, 2026

Uh oh!

github-actions bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

BuffMcBigHuge commented Feb 24, 2026

Audio Support for Scope (Reworked)

Summary

What's New

Backend

Frontend

WebRTC Handshake

Why This Version (audio-sync-2)

Architecture

Trade-offs

Files Changed

Related

Uh oh!

j0sh left a comment

Choose a reason for hiding this comment

Uh oh!

BuffMcBigHuge commented Feb 25, 2026

Uh oh!

j0sh Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

ryanontheinside Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 fal.ai Preview Deployment

Testing

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ E2E Tests passed

Test Artifacts

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading