用1.0版本可将30分钟英文视频用本地whisper转录为文字，但1.1.1版本转录失败。

### 问题描述 | Problem Description

用1.0版本可将30分钟英文视频用本地whisper转录为文字，但1.1.1版本转录失败。几次测试都是如此。whisper v2、v3版本都如此。而且还有不稳退出的问题。

### 日志信息（可选）| Logs (Optional)

```shell
开始创建文件任务：D:/1129/111111.mp4
获取视频信息执行命令: ffmpeg -i D:/1129/111111.mp4
视频时长: 1773.77秒
执行命令: ffmpeg -ss 00:08:52.131 -i D:/1129/111111.mp4 -vframes 1 -q:v 2 -y D:/VideoCaptioner/work-dir/111111/thumbnail.jpg
文件任务创建完成：Task(id=0, queued_at=datetime.datetime(2024, 11, 29, 16, 10, 50, 727982), started_at=datetime.datetime(2024, 11, 29, 16, 10, 50, 727982), completed_at=None, status=<Status.PENDING: '待处理'>, fraction_downloaded=0, work_dir='D:\\VideoCaptioner\\work-dir\\111111', file_path='D:\\1129\\111111.mp4', url='', source=<Source.FILE_IMPORT: '文件导入'>, original_language=None, target_language='简体中文', video_info=VideoInfo(file_name='111111', width=640, height=368, fps=20.0, duration_seconds=1773.77, bitrate_kbps=360, video_codec='h264', audio_codec='aac', audio_sampling_rate=44100, thumbnail_path='D:\\VideoCaptioner\\work-dir\\111111\\thumbnail.jpg'), audio_format='mp3', audio_save_path='D:\\VideoCaptioner\\work-dir\\111111\\111111.wav', transcribe_model=<TranscribeModelEnum.WHISPER: 'Whisper [本地]'>, transcribe_language='en', use_asr_cache=True, need_word_time_stamp=False, original_subtitle_save_path='D:\\VideoCaptioner\\work-dir\\111111\\subtitle\\【原始字幕】Whisper [本地]large-v3-English.srt', whisper_model='large-v3', whisper_api_key='', whisper_api_base='', whisper_api_model='', whisper_api_prompt='', base_url='', api_key='', llm_model='gpt-4o-mini', need_translate=True, need_optimize=False, result_subtitle_save_path='D:\\VideoCaptioner\\work-dir\\111111\\subtitle\\【翻译字幕】样式字幕.ass', thread_num=10, batch_size=10, subtitle_layout='译文在上', video_save_path='D:\\VideoCaptioner\\work-dir\\111111\\【卡卡】111111.mp4', soft_subtitle=False, subtitle_style_srt='[V4+ Styles]\nFormat: Name,Fontname,Fontsize,PrimaryColour,SecondaryColour,OutlineColour,BackColour,Bold,Italic,Underline,StrikeOut,ScaleX,ScaleY,Spacing,Angle,BorderStyle,Outline,Shadow,Alignment,MarginL,MarginR,MarginV,Encoding\nStyle: Default,微软雅黑,50,&H005aff65,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100,100,2.0,0,1,2.0,0,2,10,10,22,1\nStyle: Secondary,微软雅黑,38,&H00ffffff,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100,100,0.4,0,1,2.0,0,2,10,10,22,1')

===========转录任务开始===========
时间：2024-11-29 16:10:50.732983
开始转换音频
开始语音转录
找到模型文件: D:\VideoCaptioner\AppData\models\ggml-large-v3.bin
WhisperCPP 执行命令: whisper-cpp -m D:\VideoCaptioner\AppData\models\ggml-large-v3.bin C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav -l en -osrt
转换完成
2024-11-29 16:10:51 - whisper_asr - ERROR - 生成 SRT 文件失败: Using GPU "NVIDIA GeForce RTX 3080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 100.5 kb RAM
Loaded vocabulary, 51866 strings, 3037.2 kb RAM
Loaded 1259 GPU tensors, 2951.01 MB VRAM
Computed CPU base frequency: 2.112 GHz
Loaded model from "D:\VideoCaptioner\AppData\models\ggml-large-v3.bin" to VRAM
Unable to decode audio file "C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav", MFCreateSourceReaderFromURL failed

2024-11-29 16:10:51 - whisper_asr - ERROR - 生成 SRT 文件失败: 生成 SRT 文件失败: Using GPU "NVIDIA GeForce RTX 3080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 100.5 kb RAM
Loaded vocabulary, 51866 strings, 3037.2 kb RAM
Loaded 1259 GPU tensors, 2951.01 MB VRAM
Computed CPU base frequency: 2.112 GHz
Loaded model from "D:\VideoCaptioner\AppData\models\ggml-large-v3.bin" to VRAM
Unable to decode audio file "C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav", MFCreateSourceReaderFromURL failed
Traceback (most recent call last):
  File "D:\VideoCaptioner\app\core\bk_asr\WhisperASR.py", line 136, in _run
RuntimeError: 生成 SRT 文件失败: Using GPU "NVIDIA GeForce RTX 3080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 100.5 kb RAM
Loaded vocabulary, 51866 strings, 3037.2 kb RAM
Loaded 1259 GPU tensors, 2951.01 MB VRAM
Computed CPU base frequency: 2.112 GHz
Loaded model from "D:\VideoCaptioner\AppData\models\ggml-large-v3.bin" to VRAM
Unable to decode audio file "C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav", MFCreateSourceReaderFromURL failed

2024-11-29 16:10:51 - transcript_thread - ERROR - 转录过程中发生错误: 生成 SRT 文件失败: 生成 SRT 文件失败: Using GPU "NVIDIA GeForce RTX 3080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 100.5 kb RAM
Loaded vocabulary, 51866 strings, 3037.2 kb RAM
Loaded 1259 GPU tensors, 2951.01 MB VRAM
Computed CPU base frequency: 2.112 GHz
Loaded model from "D:\VideoCaptioner\AppData\models\ggml-large-v3.bin" to VRAM
Unable to decode audio file "C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav", MFCreateSourceReaderFromURL failed
Traceback (most recent call last):
  File "D:\VideoCaptioner\app\core\bk_asr\WhisperASR.py", line 136, in _run
RuntimeError: 生成 SRT 文件失败: Using GPU "NVIDIA GeForce RTX 3080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 100.5 kb RAM
Loaded vocabulary, 51866 strings, 3037.2 kb RAM
Loaded 1259 GPU tensors, 2951.01 MB VRAM
Computed CPU base frequency: 2.112 GHz
Loaded model from "D:\VideoCaptioner\AppData\models\ggml-large-v3.bin" to VRAM
Unable to decode audio file "C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav", MFCreateSourceReaderFromURL failed


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\VideoCaptioner\app\core\thread\transcript_thread.py", line 93, in run
  File "D:\VideoCaptioner\app\core\bk_asr\BaseASR.py", line 73, in run
  File "D:\VideoCaptioner\app\core\bk_asr\WhisperASR.py", line 140, in _run
RuntimeError: 生成 SRT 文件失败: 生成 SRT 文件失败: Using GPU "NVIDIA GeForce RTX 3080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 100.5 kb RAM
Loaded vocabulary, 51866 strings, 3037.2 kb RAM
Loaded 1259 GPU tensors, 2951.01 MB VRAM
Computed CPU base frequency: 2.112 GHz
Loaded model from "D:\VideoCaptioner\AppData\models\ggml-large-v3.bin" to VRAM
Unable to decode audio file "C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav", MFCreateSourceReaderFromURL failed

终止 Whisper ASR 进程
终止 Whisper ASR 进程
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

用1.0版本可将30分钟英文视频用本地whisper转录为文字，但1.1.1版本转录失败。 #93

问题描述 | Problem Description

日志信息（可选）| Logs (Optional)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

用1.0版本可将30分钟英文视频用本地whisper转录为文字，但1.1.1版本转录失败。 #93

Description

问题描述 | Problem Description

日志信息（可选）| Logs (Optional)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions