Comparing changes

Signed-off-by: BuffMcBigHuge <[email protected]>

* Audio with NDI, audio buffer in frame loop, added audio track and media clock in webrtc. Signed-off-by: BuffMcBigHuge <[email protected]> * Frontend audio work. Signed-off-by: BuffMcBigHuge <[email protected]> * Import fixes. Signed-off-by: BuffMcBigHuge <[email protected]> * Modified order of operations for audio track. Signed-off-by: BuffMcBigHuge <[email protected]> * Modification to audio handshake. Signed-off-by: BuffMcBigHuge <[email protected]> * Solving issues with audio handshake. Signed-off-by: BuffMcBigHuge <[email protected]> * Audio support testing and logging. Signed-off-by: BuffMcBigHuge <[email protected]> * Fighting with audio connection handshake issue. Signed-off-by: BuffMcBigHuge <[email protected]> * Mediaclock rework. Signed-off-by: BuffMcBigHuge <[email protected]> * Added video/audio gating for syncing. Signed-off-by: BuffMcBigHuge <[email protected]> * Removed gating, added frame rate handling for smooth playback with audio. Signed-off-by: BuffMcBigHuge <[email protected]> * Attempt at polyphase audio version with gating. Signed-off-by: BuffMcBigHuge <[email protected]> * Simplified audio sampling, added native fps handler. Signed-off-by: BuffMcBigHuge <[email protected]> * Simplified re-write of audio with no resampling, just passthrough Signed-off-by: BuffMcBigHuge <[email protected]> * Removed unused code. Signed-off-by: BuffMcBigHuge <[email protected]> * fix: replace per-sample Python loops with vectorized numpy ops in AudioProcessingTrack Addresses review feedback on #534. The audio buffer interleaving and frame extraction used O(n) Python loops over individual samples, which is expensive for real-time audio. Now uses np.ravel(order="F") for interleaving and numpy slicing for frame extraction. Also adds 42 tests covering interleaving, buffering, resampling, channel conversion, frame construction, and adversarial inputs. Signed-off-by: RyanOnTheInside <[email protected]> * feat: support audio-only pipelines, decouple FrameProcessor from VideoProcessingTrack Move FrameProcessor ownership from VideoProcessingTrack to Session so it can be shared cleanly between video and audio tracks. When a pipeline declares produces_video=False, skip video track creation entirely and deliver audio through the existing AudioProcessingTrack path. Key changes: - Session owns and manages FrameProcessor lifecycle - FrameProcessor injected into VideoProcessingTrack via constructor (DI) - Audio callback fires before video early-return in PipelineProcessor - Add produces_video / requires_audio_input ClassVars to BasePipelineConfig - Frontend shows audio-only indicator when no video track is present Signed-off-by: RyanOnTheInside <[email protected]> * refactor: remove requires_audio_input and unregister test_tone pipeline Clean up unused requires_audio_input ClassVar from BasePipelineConfig and remove test_tone pipeline from the registry. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * docs: clarify s16 packed interleave correctness in AudioProcessingTrack Add explanatory comments and docstring documenting why Fortran-order ravel produces the correct packed interleaved format for PyAV s16 layout. Add round-trip test verifying samples survive interleave, int16 conversion, AudioFrame, and planes[0] readback. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * test: remove AudioProcessingTrack tests that only exercise numpy Remove 15 tests that tested numpy operations (ravel, slicing, vstack, mean) rather than actual class methods. The remaining 27 tests all call real methods (recv, stop, _create_audio_frame, _resample_audio). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * fix: address PR review issues in audio support - Initialize FrameProcessor.paused to False to prevent AttributeError when AudioProcessingTrack.recv() is called before any pause/resume - Stop audio_track in Session.close() to ensure buffer cleanup on teardown - Consolidate AUDIO_CLOCK_RATE in media_clock.py, remove duplicate from tracks.py and unused VIDEO_CLOCK_RATE from media_clock.py Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * refactor: remove MediaClock, defer A/V sync to follow-up PR MediaClock was only used by VideoProcessingTrack while AudioProcessingTrack used its own independent clock, so it added complexity without delivering actual synchronization. Video track now uses a simple monotonic PTS counter matching the audio pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * fix: drain audio queue fully, cap buffer, and skip audio track for non-audio pipelines - recv() now drains all queued audio chunks per call instead of one, reducing latency for bursty/small-chunk pipelines - Cap audio buffer at 1 second to prevent unbounded memory growth - Add produces_audio ClassVar to BasePipelineConfig; only create AudioProcessingTrack when a pipeline declares audio support - Update tests for drain-loop compatibility and add coverage for queue drain and buffer cap behaviors Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * fix: replace FFT resampler with linear interp, guard None output_dict, use deque buffer - FFT resampling caused spectral leakage artifacts at chunk boundaries; linear interpolation (np.interp) is chunk-boundary safe - Restore None guard on output_dict before .get("audio") to prevent AttributeError when pipeline returns None - Replace np.concatenate/slice buffer with collections.deque to avoid O(buffer_size) copies on every chunk append Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * refactor: move modality checks into PipelineRegistry, harden audio frame creation Move chain_produces_video/chain_produces_audio from webrtc.py into PipelineRegistry as class methods for better encapsulation. Guard _create_audio_frame against NaN/inf with nan_to_num. Consolidate and clean up audio processing track tests using parametrize. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * fix: correct misleading 'audio-only' log when video track is skipped The log message assumed a non-video pipeline must be audio-only, but the audio check happens separately afterwards. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * refactor: move AudioProcessingTrack into its own audio_track.py module Consistent with cloud_track.py pattern — each track type gets its own file. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * fix: support audio-only sessions and add audio recording Fix audio-only pipeline bugs where pause, parameter broadcasts, and get_frame_processor() silently skipped sessions without a video track. All three now fall back to session.frame_processor when video_track is None. Add audio recording support: RecordingManager accepts optional video and audio tracks, recording setup moved outside the produces_video block so it works for video-only, audio-only, or combined sessions. AudioTimestampNormalizingTrack normalizes PTS for recording restarts. Fix test import to use audio_track module after earlier refactor. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * refactor: replace audio callback with queue to match video flow Audio output from pipelines now flows through a queue on PipelineProcessor (audio_output_queue) instead of a callback, making it symmetric with how video frames are delivered. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * refactor: remove audio-only sleep from processor, deduplicate video extraction Move audio pacing responsibility to individual pipelines instead of the processor loop. Remove redundant second output_dict.get("video") call. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * fix: add missing FrameProcessor import for type checking Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * feat: add audio-beep as built-in pipeline Port the audio-beep plugin into the core pipelines directory so it is available without installing a separate plugin. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * feat: add audio support for cloud relay mode Cloud pipelines that produce audio now deliver it to the browser. The audio flows through: cloud WebRTC track → CloudWebRTCClient → CloudConnectionManager → FrameProcessor cloud audio queue → AudioProcessingTrack → browser. Key changes: - CloudWebRTCClient handles audio tracks and receives audio frames - CloudConnectionManager forwards audio via callbacks - FrameProcessor queues cloud audio for AudioProcessingTrack - CloudTrack accepts an injected FrameProcessor (shared with audio) - handle_offer_with_relay creates AudioProcessingTrack and wires the browser audio transceiver Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * fix: add recvonly audio transceiver to cloud WebRTC offer Without this, the cloud handle_offer has no audio m-line to attach an AudioProcessingTrack to, so pipeline audio is produced but never sent back -- filling and overflowing the audio output queue. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * feat: add audio-video-test as built-in pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * fix: de-interleave packed audio from cloud before queuing Cloud sends packed s16 stereo where to_ndarray() returns (1, 1920) -- interleaved channel pairs in a single plane. Without de-interleaving, AudioProcessingTrack misidentifies this as mono and duplicates it, producing 2x the expected data rate (constant buffer overflow) and garbled output. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * fix: always create both video and audio tracks in cloud relay mode In relay mode we cannot know the cloud pipeline capabilities, so we always create both CloudTrack and AudioProcessingTrack. For audio-only pipelines the CloudTrack emits small black frames at 1fps to keep the browser MediaStream active so audio playback is not blocked. For video-only pipelines the AudioProcessingTrack sends silence harmlessly. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * feat: conditionally create audio/video tracks based on pipeline modalities Expose produces_video and produces_audio flags from the pipeline registry through the status endpoint and into WebRTC negotiation. This avoids creating unnecessary tracks (no video track for audio-only pipelines, no audio transceiver for video-only pipelines) and removes the black-frame fallback hack from CloudTrack. Also increases the audio buffer to 3s to handle bursty pipelines like LTX2, and improves buffer trimming to keep newest samples instead of dropping entire chunks. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> * Fix cloud * Remove built-in audio-beep and audio-video-test pipelines These test pipelines are available as separate plugins and don't need to be bundled in core. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> --------- Signed-off-by: BuffMcBigHuge <[email protected]> Signed-off-by: RyanOnTheInside <[email protected]> Signed-off-by: Rafał Leszko <[email protected]> Co-authored-by: RyanOnTheInside <[email protected]> Co-authored-by: Rafał Leszko <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>

…ly outputs (#718) Signed-off-by: Rafal Leszko <[email protected]>

When prompts change, drain the audio output queue and send a flush sentinel so the audio track discards buffered speech from the previous prompt. Also increases the max audio buffer to 60s for TTS pipelines and reduces the audio output queue size to 10 (since it's now flushed on prompt changes). Signed-off-by: Rafal Leszko <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: leszko <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on Mar 19, 2026

Commits on Mar 20, 2026

This comparison is taking too long to generate.

Uh oh!