Ignore repeated prompt by heimoshuiyu · Pull Request #1253 · openai/whisper

heimoshuiyu · 2023-04-18T04:58:33Z

I am transcribing music and long audio recordings. I have noticed that when there is no speech for a long time, the whisper output will repeat the same text and continue to repeat without being able to recognize the subsequent speech.

Related discussion:

Using --condition_on_previous_text False seems to solve this issue, but it may also result in lower quality output text. Personally, I do not want to introduce additional VAD, so I have made a little prompt engineering to avoid them containing repetitive text. It has been working really well for me on all my use cases.

One of my use cases:

whisper --model large --language Japanese music.webm

Music with long intro https://www.youtube.com/watch?v=gcZvK1zvIbQ

Before this change

WEBVTT

00:00.000 --> 00:10.000
作詞・作曲・編曲 初音ミク

00:30.000 --> 00:40.000
作詞・作曲・編曲 初音ミク

01:00.000 --> 01:10.000
作詞・作曲・編曲 初音ミク

01:30.000 --> 01:40.000
作詞・作曲・編曲 初音ミク

01:40.000 --> 01:50.000
作詞・作曲・編曲 初音ミク

01:50.000 --> 02:00.000
作詞・作曲・編曲 初音ミク

02:00.000 --> 02:10.000
作詞・作曲・編曲 初音ミク

02:10.000 --> 02:20.000
作詞・作曲・編曲 初音ミク

02:20.000 --> 02:30.000
作詞・作曲・編曲 初音ミク

... (repeat until the end)

After this change

WEBVTT

00:00.000 --> 00:10.000
作詞・作曲・編曲 初音ミク

00:30.000 --> 00:40.000
作詞・作曲・編曲 初音ミク

01:00.000 --> 01:10.000
作詞・作曲・編曲 初音ミク

01:30.000 --> 01:40.000
作詞・作曲・編曲 初音ミク

01:53.000 --> 01:58.000
Now, I have to say again

01:58.000 --> 02:01.000
あなたの言葉を

02:01.000 --> 02:05.000
誰かが憂えるのに

... (work as expected)

Similarly, it has been performing well on my 2-hour long meeting audio recording where the first 30 minutes had no speech. Prior to this modification, Whisper could only transcribe repetitive ..

ExtReMLapin · 2023-04-22T11:57:32Z

Sound like a bad patch for hallucinations.

And there are legitimate cases where repitition can happen.

heimoshuiyu · 2023-04-22T15:18:58Z

Ignoring repeated prompts does not stop the output of repeated text, the model can still produce repeated text. I think this patch is a method to improve the model's robustness, similar to --temperature_increment_on_fallback. This patch reduces the possibility of falling into endless loops due to repeated text. As mentioned above, I am using Whisper to process songs and conferences recording, and I have observed that after applying this change, there are hardly any cases where the entire transcription is ruined due to repeated hallucinations (if any, re-running it once can solve the problem).

I understand that this change may affect some repeated sentences, but it's still better than --condition_on_previous_text False or ruining the entire transcription.

ryanheise · 2023-04-23T01:33:34Z

And there are legitimate cases where repitition can happen.

Of course song lyrics are a legitimate case where repetition happens. But looking at the PR it looks to only remove repeated text from the prompt, not from the output, so the model is still allowed to output repetition if there really is repetition in the audio, but it is not influenced more so in that direction by having repetition in the prompt.

I understand that this change may affect some repeated sentences

Do you have an example?

heimoshuiyu · 2023-04-23T05:45:52Z

Do you have an example?

Yes, I have noticed one example. https://www.youtube.com/watch?v=a6lvjW8xunc

whisper --model large --language Japanese --task transribe audio.webm

The correct output is

[01:12.000 --> 01:15.000] Always I miss you
[01:16.000 --> 01:18.000] Miss you         
[01:19.000 --> 01:20.000] Miss you                                                                                                                                                                                                             
[01:21.000 --> 01:22.000] Oh miss you
......
[01:34.000 --> 01:36.000] Always I miss you
[01:37.000 --> 01:38.000] Miss you
[01:38.000 --> 01:40.000] Miss you
[01:41.000 --> 01:43.000] Oh miss you
[01:44.000 --> 01:46.000] Miss you
......
[03:13.000 --> 03:15.000] Always I miss you
[03:16.000 --> 03:18.000] Miss you
[03:19.000 --> 03:20.000] Miss you
[03:21.000 --> 03:22.000] Oh miss you
......
[03:33.000 --> 03:35.000] Always I miss you
[03:36.000 --> 03:38.000] Miss you
[03:39.000 --> 03:40.000] Miss you
[03:41.000 --> 03:42.000] Oh miss you
......
[03:54.000 --> 03:56.000] Always I miss you
[03:56.000 --> 03:59.000] Always I miss you
[04:07.000 --> 04:09.000] Miss you

However, after this patch, the [01:38.000 --> 01:40.000] Miss you is missing

[01:12.000 --> 01:15.000] Always I miss you
[01:16.000 --> 01:18.000] Miss you
[01:19.000 --> 01:20.000] Miss you
[01:21.000 --> 01:22.000] Oh miss you
......
[01:34.000 --> 01:36.000] Always I miss you
[01:37.000 --> 01:41.000] Miss you
[01:41.000 --> 01:44.000] Oh miss you
[01:44.000 --> 01:46.000] Miss you
......
[03:13.000 --> 03:15.000] Always I miss you
[03:16.000 --> 03:18.000] Miss you
[03:19.000 --> 03:20.000] Miss you
[03:21.000 --> 03:22.000] Oh miss you
......
[03:33.000 --> 03:35.000] Always I miss you
[03:36.000 --> 03:38.000] Miss you
[03:39.000 --> 03:40.000] Miss you
[03:41.000 --> 03:42.000] Oh miss you
......
[03:54.000 --> 03:56.000] Always I miss you
[03:56.000 --> 03:59.000] Always I miss you
[04:07.000 --> 04:09.000] Miss you

ryanheise · 2023-04-23T08:21:43Z

Inaccurate end timestamps cause the next window to start too late and miss what was spoken, and this can often be fixed by enabling word_timestamps which produces more accurate timestamps.

Since your timestamps look like they're all integers, it suggests you don't have word_timestamps enabled. If that's the case, does it improve with word_timestamps?

Same with openai/whisper#1253

- Same with openai/whisper#1253

dfengpo · 2024-04-10T10:48:33Z

May I ask if there is an official solution to the problem of repeated hallucinations? This problem is very serious.

heimoshuiyu added 3 commits April 18, 2023 12:15

Ignore repeated prompt

5575628

fix id start

6fc4e6f

Check multiple prompts

67fac2a

heimoshuiyu mentioned this pull request Apr 19, 2023

Ignore repeated prompt SYSTRAN/faster-whisper#163

Open

format code

62e34f0

dhx mentioned this pull request May 20, 2023

Large model starts to repeat itself / gets stuck on a phrase ggml-org/whisper.cpp#924

Open

aleksandr-smechov mentioned this pull request Jun 4, 2023

Remove hallucinated repeats, caps lock items Wordcab/wordcab-transcribe#64

Closed

hoonlight added a commit to hoonlight/faster-whisper that referenced this pull request Jun 6, 2023

Ignore repeated prompt

783a57e

Same with openai/whisper#1253

hoonlight added a commit to hoonlight/faster-whisper that referenced this pull request Jul 5, 2023

Ignore repeated prompt

1a9949c

- Same with openai/whisper#1253

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore repeated prompt#1253

Ignore repeated prompt#1253
heimoshuiyu wants to merge 4 commits intoopenai:mainfrom
heimoshuiyu:prompt

heimoshuiyu commented Apr 18, 2023

Uh oh!

ExtReMLapin commented Apr 22, 2023 •

edited

Loading

Uh oh!

heimoshuiyu commented Apr 22, 2023

Uh oh!

ryanheise commented Apr 23, 2023

Uh oh!

heimoshuiyu commented Apr 23, 2023

Uh oh!

ryanheise commented Apr 23, 2023

Uh oh!

dfengpo commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

heimoshuiyu commented Apr 18, 2023

Uh oh!

ExtReMLapin commented Apr 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heimoshuiyu commented Apr 22, 2023

Uh oh!

ryanheise commented Apr 23, 2023

Uh oh!

heimoshuiyu commented Apr 23, 2023

Uh oh!

ryanheise commented Apr 23, 2023

Uh oh!

dfengpo commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ExtReMLapin commented Apr 22, 2023 •

edited

Loading