Conversation
|
Sound like a bad patch for hallucinations. And there are legitimate cases where repitition can happen. |
|
Ignoring repeated prompts does not stop the output of repeated text, the model can still produce repeated text. I think this patch is a method to improve the model's robustness, similar to I understand that this change may affect some repeated sentences, but it's still better than |
Of course song lyrics are a legitimate case where repetition happens. But looking at the PR it looks to only remove repeated text from the prompt, not from the output, so the model is still allowed to output repetition if there really is repetition in the audio, but it is not influenced more so in that direction by having repetition in the prompt.
Do you have an example? |
Yes, I have noticed one example. https://www.youtube.com/watch?v=a6lvjW8xunc whisper --model large --language Japanese --task transribe audio.webmThe correct output is However, after this patch, the |
|
Inaccurate end timestamps cause the next window to start too late and miss what was spoken, and this can often be fixed by enabling word_timestamps which produces more accurate timestamps. Since your timestamps look like they're all integers, it suggests you don't have word_timestamps enabled. If that's the case, does it improve with word_timestamps? |
Same with openai/whisper#1253
- Same with openai/whisper#1253
|
May I ask if there is an official solution to the problem of repeated hallucinations? This problem is very serious. |
I am transcribing music and long audio recordings. I have noticed that when there is no speech for a long time, the whisper output will repeat the same text and continue to repeat without being able to recognize the subsequent speech.
Related discussion:
Using
--condition_on_previous_text Falseseems to solve this issue, but it may also result in lower quality output text. Personally, I do not want to introduce additional VAD, so I have made a little prompt engineering to avoid them containing repetitive text. It has been working really well for me on all my use cases.One of my use cases:
Music with long intro https://www.youtube.com/watch?v=gcZvK1zvIbQ
Before this change
After this change
Similarly, it has been performing well on my 2-hour long meeting audio recording where the first 30 minutes had no speech. Prior to this modification, Whisper could only transcribe repetitive
..