FR: audio preprompt

Hi,

I have a suggestion which might sound a bit radical but actually might make ASR really shine in professional settings.

You see, whisper is able to take a text prompt as argument to bias the model towards some language and specific vocabulary. It is working well in my experience.

Other models including parakeet don't support that. But you can make a universal audio preprompt by recording an audio and prepending it to the recording, then stripping the text.

For example if you say "épanchement" with no context or language at all, it will struggle to make sense with so little data to work with. But if you prepend it with an audio saying "Chez le patient, je note la présence de ", then strip that sentence in the transcript, the model will be much more likely to pick up the context and transcribe accurately.

I think it would make murmure an outstanding tool for productivity but acknowledge that there are UI details to work out to specify when to use a given preprompt or not.

When I made [quick whisper typer](https://github.com/thiswillbeyourgithub/Quick-Whisper-Typer) I settled on making the shortcut trigger a popup that listens to the next keyboard press. This way I could trigger any number of user defined scripts and post processing etc. I think it makes sense UI wise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FR: audio preprompt #75

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

FR: audio preprompt #75

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions