Skip to content

Native video decoder via ffmpeg CLI #7607

@emilk

Description

@emilk

Send samples to ffmpeg over CLI, read back the results, show the user.

A basic demo of this is in this internal repository: https://github.com/rerun-io/video-experiments

The main approach is fairly simple:

 let mut ffmpeg = FfmpegCommand::new()
        .hide_banner()
        // Keep in mind that all arguments that are about the input, need to go before!
        .format("h264")
        .input("-")
        //.args(&["-c", "copy", "out.mp4"]) // For testing.
        .rawvideo() // Output rgb24 on stdout. (todo for later: any format we can read directly on re_renderer would be better!)
        .spawn()
        .expect("faild to spawn ffmpeg");

    let mut stdin = ffmpeg.take_stdin().unwrap();
    std::thread::spawn(move || {
          // Wait for samples to be sent on stdin.
          // then send them to ffmpeg.
          // See `write_sample_to_nalu_stream`
    });

    // On the main thread, run the output instance to completion
    ffmpeg.iter().unwrap().for_each(|e| match e {
        FfmpegEvent::Log(LogLevel::Error, e) => println!("Error: {}", e),
        FfmpegEvent::Progress(p) => println!("Progress: {} / 00:00:15", p.time),
        FfmpegEvent::OutputFrame(frame) => println!(
            "Received frame: time {:?} fmt {:?} size {}x{}",
            frame.timestamp, frame.pix_fmt, frame.width, frame.height
        ),
        evt => println!("Event: {evt:?}"),
    });

The difficult part is that we need to provide ffmpeg with a format that it can stream in on stdin. In this snippet this is done with format("h264"), so ffmpeg expects the .h264 format which is a stream of NAL units in Annex B format (== replace NAL lengths with start headers). Also, at every IDR frame, sequence parameter sets (SPS) and frame parameter sets (FPS) need to be inserted:

fn write_sample_to_nalu_stream(
    avc_box: &re_mp4::Avc1Box,
    nalu_stream: &mut dyn std::io::Write,
    sample: &re_mp4::Sample,
    video_track_data: &[u8],
    state: &mut NaluStreamState,
) -> Result<(), Box<dyn std::error::Error>> {
    let avcc = &avc_box.avcc;

    // Append SPS & PPS NAL unit whenever encountering an IDR frame unless the previous frame was an IDR frame.
    // TODO(andreas): Should we detect this rather from the NALU stream rather than the samples?
    if sample.is_sync && !state.previous_frame_was_idr {
        for sps in (&avcc.sequence_parameter_sets).iter() {
            nalu_stream.write_all(&NAL_START_CODE)?;
            nalu_stream.write_all(&sps.bytes)?;
        }
        for pps in (&avcc.picture_parameter_sets).iter() {
            nalu_stream.write_all(&NAL_START_CODE)?;
            nalu_stream.write_all(&pps.bytes)?;
        }
        state.previous_frame_was_idr = true;
    } else {
        state.previous_frame_was_idr = false;
    }

    // A single sample, may consist of multiple NAL units, each of which need our special treatment.
    // (most of the time it's 1:1, but there might be extra NAL units for info, especially at the start)
    let mut buffer_offset = sample.offset as usize;
    let sample_end = buffer_offset + sample.size as usize;
    while buffer_offset < sample_end {
        // Each NAL unit in mp4 is prefixed with a length prefix.
        // In Annex B this doesn't exist.
        let length_prefix_size = avcc.length_size_minus_one as usize + 1;

        // TODO: improve the error handling here.
        let nal_unit_size = match length_prefix_size {
            4 => u32::from_be_bytes(
                video_track_data[buffer_offset..(buffer_offset + 4)]
                    .try_into()
                    .unwrap(),
            ) as usize,
            2 => u16::from_be_bytes(
                video_track_data[buffer_offset..(buffer_offset + 2)]
                    .try_into()
                    .unwrap(),
            ) as usize,
            1 => video_track_data[buffer_offset] as usize,
            _ => panic!("invalid length prefix size"),
        };
        //println!("nal unit size: {}", nal_unit_size);

        if (sample.size as usize) < nal_unit_size {
            panic!(
                "sample size {} is smaller than nal unit size {nal_unit_size}",
                sample.size
            );
        }

        nalu_stream.write_all(&NAL_START_CODE)?;
        let data_start = buffer_offset + length_prefix_size; // Skip the size.
        let data_end = buffer_offset + nal_unit_size + length_prefix_size;
        let data = &video_track_data[data_start..data_end];

        // Note that we don't have to insert "emulation prevention bytes" since mp4 NALU still use them.
        // (unlike the NAL start code, the preventation bytes are part of the NAL spec!)

        nalu_stream.write_all(data)?;

        buffer_offset = data_end;
    }

    Ok(())
}

Open questions:

  • Does this work with H.265 as well? It uses the same formats overall
  • What about other codecs?
  • Does this approach scale well with other formats?
  • ffmpeg allows arbitrary output formats. Can we be clever about what to pick? E.g. if the decoder internally gives us YUV420 (that's most of the time but not always!), we should pass YUV420 on instead of rgb24 and processes this directly.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestfeat-videoanything video decoding, player, querying, data modelling of videos etc.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions