what audio input files are supported by ASR model "openai/whisper-large-v2" on huggingface?

Hi,

I am trying to use hugging face pipeline to invoke the model "openai/whisper-large-v2' to perform ASR task. Theerafter, I am wrapping the model within gradio as below but when I submit, I get an error. 

Model card of this model says "To transcribe audio samples, the model has to be used alongside a [WhisperProcessor]"

How do I know what audio input is supported by this model and also how can I convert my audio sample (.m4a file) default generated by macbook voice memos app to a format supported by this model?

code as here:

```
from transformers import pipeline
import gradio as gr

#model = pipeline(model = "facebook/wav2vec2-base-960h")
model = pipeline( model= "openai/whisper-large-v2")

def transcribe_audio(mic=None, file = None):
    if mic is not None:
        audio = mic
    elif file is not None:
        audio = file
    else:
        return "You must either provide a mic recording or a file"
    transcription = model(audio)["text"]
    return transcription


gr.Interface(
    fn=transcribe_audio,
    inputs=[
        gr.Audio(source="microphone", type="filepath", optional=True),
        gr.Audio(source="upload", type="filepath", optional=True),
    ],
    outputs="text",
).launch(share = True)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what audio input files are supported by ASR model "openai/whisper-large-v2" on huggingface? #26075

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

what audio input files are supported by ASR model "openai/whisper-large-v2" on huggingface? #26075

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions