Skip to content

Ignore repeated prompt#1253

Open
heimoshuiyu wants to merge 4 commits intoopenai:mainfrom
heimoshuiyu:prompt
Open

Ignore repeated prompt#1253
heimoshuiyu wants to merge 4 commits intoopenai:mainfrom
heimoshuiyu:prompt

Conversation

@heimoshuiyu
Copy link
Copy Markdown

I am transcribing music and long audio recordings. I have noticed that when there is no speech for a long time, the whisper output will repeat the same text and continue to repeat without being able to recognize the subsequent speech.

Related discussion:

Using --condition_on_previous_text False seems to solve this issue, but it may also result in lower quality output text. Personally, I do not want to introduce additional VAD, so I have made a little prompt engineering to avoid them containing repetitive text. It has been working really well for me on all my use cases.

One of my use cases:

whisper --model large --language Japanese music.webm

Music with long intro https://www.youtube.com/watch?v=gcZvK1zvIbQ

Before this change

WEBVTT

00:00.000 --> 00:10.000
作詞・作曲・編曲 初音ミク

00:30.000 --> 00:40.000
作詞・作曲・編曲 初音ミク

01:00.000 --> 01:10.000
作詞・作曲・編曲 初音ミク

01:30.000 --> 01:40.000
作詞・作曲・編曲 初音ミク

01:40.000 --> 01:50.000
作詞・作曲・編曲 初音ミク

01:50.000 --> 02:00.000
作詞・作曲・編曲 初音ミク

02:00.000 --> 02:10.000
作詞・作曲・編曲 初音ミク

02:10.000 --> 02:20.000
作詞・作曲・編曲 初音ミク

02:20.000 --> 02:30.000
作詞・作曲・編曲 初音ミク

... (repeat until the end)

After this change

WEBVTT

00:00.000 --> 00:10.000
作詞・作曲・編曲 初音ミク

00:30.000 --> 00:40.000
作詞・作曲・編曲 初音ミク

01:00.000 --> 01:10.000
作詞・作曲・編曲 初音ミク

01:30.000 --> 01:40.000
作詞・作曲・編曲 初音ミク

01:53.000 --> 01:58.000
Now, I have to say again

01:58.000 --> 02:01.000
あなたの言葉を

02:01.000 --> 02:05.000
誰かが憂えるのに

... (work as expected)

Similarly, it has been performing well on my 2-hour long meeting audio recording where the first 30 minutes had no speech. Prior to this modification, Whisper could only transcribe repetitive ..

@ExtReMLapin
Copy link
Copy Markdown
Contributor

ExtReMLapin commented Apr 22, 2023

Sound like a bad patch for hallucinations.

And there are legitimate cases where repitition can happen.

@heimoshuiyu
Copy link
Copy Markdown
Author

Ignoring repeated prompts does not stop the output of repeated text, the model can still produce repeated text. I think this patch is a method to improve the model's robustness, similar to --temperature_increment_on_fallback. This patch reduces the possibility of falling into endless loops due to repeated text. As mentioned above, I am using Whisper to process songs and conferences recording, and I have observed that after applying this change, there are hardly any cases where the entire transcription is ruined due to repeated hallucinations (if any, re-running it once can solve the problem).

I understand that this change may affect some repeated sentences, but it's still better than --condition_on_previous_text False or ruining the entire transcription.

@ryanheise
Copy link
Copy Markdown
Contributor

And there are legitimate cases where repitition can happen.

Of course song lyrics are a legitimate case where repetition happens. But looking at the PR it looks to only remove repeated text from the prompt, not from the output, so the model is still allowed to output repetition if there really is repetition in the audio, but it is not influenced more so in that direction by having repetition in the prompt.

I understand that this change may affect some repeated sentences

Do you have an example?

@heimoshuiyu
Copy link
Copy Markdown
Author

Do you have an example?

Yes, I have noticed one example. https://www.youtube.com/watch?v=a6lvjW8xunc

whisper --model large --language Japanese --task transribe audio.webm

The correct output is

[01:12.000 --> 01:15.000] Always I miss you
[01:16.000 --> 01:18.000] Miss you         
[01:19.000 --> 01:20.000] Miss you                                                                                                                                                                                                             
[01:21.000 --> 01:22.000] Oh miss you
......
[01:34.000 --> 01:36.000] Always I miss you
[01:37.000 --> 01:38.000] Miss you
[01:38.000 --> 01:40.000] Miss you
[01:41.000 --> 01:43.000] Oh miss you
[01:44.000 --> 01:46.000] Miss you
......
[03:13.000 --> 03:15.000] Always I miss you
[03:16.000 --> 03:18.000] Miss you
[03:19.000 --> 03:20.000] Miss you
[03:21.000 --> 03:22.000] Oh miss you
......
[03:33.000 --> 03:35.000] Always I miss you
[03:36.000 --> 03:38.000] Miss you
[03:39.000 --> 03:40.000] Miss you
[03:41.000 --> 03:42.000] Oh miss you
......
[03:54.000 --> 03:56.000] Always I miss you
[03:56.000 --> 03:59.000] Always I miss you
[04:07.000 --> 04:09.000] Miss you

However, after this patch, the [01:38.000 --> 01:40.000] Miss you is missing

[01:12.000 --> 01:15.000] Always I miss you
[01:16.000 --> 01:18.000] Miss you
[01:19.000 --> 01:20.000] Miss you
[01:21.000 --> 01:22.000] Oh miss you
......
[01:34.000 --> 01:36.000] Always I miss you
[01:37.000 --> 01:41.000] Miss you
[01:41.000 --> 01:44.000] Oh miss you
[01:44.000 --> 01:46.000] Miss you
......
[03:13.000 --> 03:15.000] Always I miss you
[03:16.000 --> 03:18.000] Miss you
[03:19.000 --> 03:20.000] Miss you
[03:21.000 --> 03:22.000] Oh miss you
......
[03:33.000 --> 03:35.000] Always I miss you
[03:36.000 --> 03:38.000] Miss you
[03:39.000 --> 03:40.000] Miss you
[03:41.000 --> 03:42.000] Oh miss you
......
[03:54.000 --> 03:56.000] Always I miss you
[03:56.000 --> 03:59.000] Always I miss you
[04:07.000 --> 04:09.000] Miss you

@ryanheise
Copy link
Copy Markdown
Contributor

Inaccurate end timestamps cause the next window to start too late and miss what was spoken, and this can often be fixed by enabling word_timestamps which produces more accurate timestamps.

Since your timestamps look like they're all integers, it suggests you don't have word_timestamps enabled. If that's the case, does it improve with word_timestamps?

hoonlight added a commit to hoonlight/faster-whisper that referenced this pull request Jun 6, 2023
hoonlight added a commit to hoonlight/faster-whisper that referenced this pull request Jul 5, 2023
@dfengpo
Copy link
Copy Markdown

dfengpo commented Apr 10, 2024

May I ask if there is an official solution to the problem of repeated hallucinations? This problem is very serious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants