Skip to content

用1.0版本可将30分钟英文视频用本地whisper转录为文字,但1.1.1版本转录失败。 #93

@i23i23

Description

@i23i23

问题描述 | Problem Description

用1.0版本可将30分钟英文视频用本地whisper转录为文字,但1.1.1版本转录失败。几次测试都是如此。whisper v2、v3版本都如此。而且还有不稳退出的问题。

日志信息(可选)| Logs (Optional)

开始创建文件任务:D:/1129/111111.mp4
获取视频信息执行命令: ffmpeg -i D:/1129/111111.mp4
视频时长: 1773.77秒
执行命令: ffmpeg -ss 00:08:52.131 -i D:/1129/111111.mp4 -vframes 1 -q:v 2 -y D:/VideoCaptioner/work-dir/111111/thumbnail.jpg
文件任务创建完成:Task(id=0, queued_at=datetime.datetime(2024, 11, 29, 16, 10, 50, 727982), started_at=datetime.datetime(2024, 11, 29, 16, 10, 50, 727982), completed_at=None, status=<Status.PENDING: '待处理'>, fraction_downloaded=0, work_dir='D:\\VideoCaptioner\\work-dir\\111111', file_path='D:\\1129\\111111.mp4', url='', source=<Source.FILE_IMPORT: '文件导入'>, original_language=None, target_language='简体中文', video_info=VideoInfo(file_name='111111', width=640, height=368, fps=20.0, duration_seconds=1773.77, bitrate_kbps=360, video_codec='h264', audio_codec='aac', audio_sampling_rate=44100, thumbnail_path='D:\\VideoCaptioner\\work-dir\\111111\\thumbnail.jpg'), audio_format='mp3', audio_save_path='D:\\VideoCaptioner\\work-dir\\111111\\111111.wav', transcribe_model=<TranscribeModelEnum.WHISPER: 'Whisper [本地]'>, transcribe_language='en', use_asr_cache=True, need_word_time_stamp=False, original_subtitle_save_path='D:\\VideoCaptioner\\work-dir\\111111\\subtitle\\【原始字幕】Whisper [本地]large-v3-English.srt', whisper_model='large-v3', whisper_api_key='', whisper_api_base='', whisper_api_model='', whisper_api_prompt='', base_url='', api_key='', llm_model='gpt-4o-mini', need_translate=True, need_optimize=False, result_subtitle_save_path='D:\\VideoCaptioner\\work-dir\\111111\\subtitle\\【翻译字幕】样式字幕.ass', thread_num=10, batch_size=10, subtitle_layout='译文在上', video_save_path='D:\\VideoCaptioner\\work-dir\\111111\\【卡卡】111111.mp4', soft_subtitle=False, subtitle_style_srt='[V4+ Styles]\nFormat: Name,Fontname,Fontsize,PrimaryColour,SecondaryColour,OutlineColour,BackColour,Bold,Italic,Underline,StrikeOut,ScaleX,ScaleY,Spacing,Angle,BorderStyle,Outline,Shadow,Alignment,MarginL,MarginR,MarginV,Encoding\nStyle: Default,微软雅黑,50,&H005aff65,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100,100,2.0,0,1,2.0,0,2,10,10,22,1\nStyle: Secondary,微软雅黑,38,&H00ffffff,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100,100,0.4,0,1,2.0,0,2,10,10,22,1')

===========转录任务开始===========
时间:2024-11-29 16:10:50.732983
开始转换音频
开始语音转录
找到模型文件: D:\VideoCaptioner\AppData\models\ggml-large-v3.bin
WhisperCPP 执行命令: whisper-cpp -m D:\VideoCaptioner\AppData\models\ggml-large-v3.bin C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav -l en -osrt
转换完成
2024-11-29 16:10:51 - whisper_asr - ERROR - 生成 SRT 文件失败: Using GPU "NVIDIA GeForce RTX 3080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 100.5 kb RAM
Loaded vocabulary, 51866 strings, 3037.2 kb RAM
Loaded 1259 GPU tensors, 2951.01 MB VRAM
Computed CPU base frequency: 2.112 GHz
Loaded model from "D:\VideoCaptioner\AppData\models\ggml-large-v3.bin" to VRAM
Unable to decode audio file "C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav", MFCreateSourceReaderFromURL failed

2024-11-29 16:10:51 - whisper_asr - ERROR - 生成 SRT 文件失败: 生成 SRT 文件失败: Using GPU "NVIDIA GeForce RTX 3080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 100.5 kb RAM
Loaded vocabulary, 51866 strings, 3037.2 kb RAM
Loaded 1259 GPU tensors, 2951.01 MB VRAM
Computed CPU base frequency: 2.112 GHz
Loaded model from "D:\VideoCaptioner\AppData\models\ggml-large-v3.bin" to VRAM
Unable to decode audio file "C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav", MFCreateSourceReaderFromURL failed
Traceback (most recent call last):
  File "D:\VideoCaptioner\app\core\bk_asr\WhisperASR.py", line 136, in _run
RuntimeError: 生成 SRT 文件失败: Using GPU "NVIDIA GeForce RTX 3080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 100.5 kb RAM
Loaded vocabulary, 51866 strings, 3037.2 kb RAM
Loaded 1259 GPU tensors, 2951.01 MB VRAM
Computed CPU base frequency: 2.112 GHz
Loaded model from "D:\VideoCaptioner\AppData\models\ggml-large-v3.bin" to VRAM
Unable to decode audio file "C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav", MFCreateSourceReaderFromURL failed

2024-11-29 16:10:51 - transcript_thread - ERROR - 转录过程中发生错误: 生成 SRT 文件失败: 生成 SRT 文件失败: Using GPU "NVIDIA GeForce RTX 3080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 100.5 kb RAM
Loaded vocabulary, 51866 strings, 3037.2 kb RAM
Loaded 1259 GPU tensors, 2951.01 MB VRAM
Computed CPU base frequency: 2.112 GHz
Loaded model from "D:\VideoCaptioner\AppData\models\ggml-large-v3.bin" to VRAM
Unable to decode audio file "C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav", MFCreateSourceReaderFromURL failed
Traceback (most recent call last):
  File "D:\VideoCaptioner\app\core\bk_asr\WhisperASR.py", line 136, in _run
RuntimeError: 生成 SRT 文件失败: Using GPU "NVIDIA GeForce RTX 3080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 100.5 kb RAM
Loaded vocabulary, 51866 strings, 3037.2 kb RAM
Loaded 1259 GPU tensors, 2951.01 MB VRAM
Computed CPU base frequency: 2.112 GHz
Loaded model from "D:\VideoCaptioner\AppData\models\ggml-large-v3.bin" to VRAM
Unable to decode audio file "C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav", MFCreateSourceReaderFromURL failed


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\VideoCaptioner\app\core\thread\transcript_thread.py", line 93, in run
  File "D:\VideoCaptioner\app\core\bk_asr\BaseASR.py", line 73, in run
  File "D:\VideoCaptioner\app\core\bk_asr\WhisperASR.py", line 140, in _run
RuntimeError: 生成 SRT 文件失败: 生成 SRT 文件失败: Using GPU "NVIDIA GeForce RTX 3080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 100.5 kb RAM
Loaded vocabulary, 51866 strings, 3037.2 kb RAM
Loaded 1259 GPU tensors, 2951.01 MB VRAM
Computed CPU base frequency: 2.112 GHz
Loaded model from "D:\VideoCaptioner\AppData\models\ggml-large-v3.bin" to VRAM
Unable to decode audio file "C:\Users\ADMINI~1\AppData\Local\Temp\bk_asr\20241129161050111111.wav", MFCreateSourceReaderFromURL failed

终止 Whisper ASR 进程
终止 Whisper ASR 进程

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions