开发者社区 > ModelScope模型即服务 > 语音 > 正文

一个测试数据在经过vad切分和自动标注之后,特征抽取部分出现报错,光听音频都是正常的,请求解答!

OS: linux Python/C++ Version:3.7.13
Package Version:pytorch、modelscope与funasr版本号:
torch 1.11.0+cu113 funasr 0.3.0 modelscope 1.4.1 Model:sambert-hifigan Command:

问题描述: 一个测试数据在经过vad切分和自动标注之后,特征抽取部分出现报错,光听音频都是正常的。请问这个错误发生的原因是什么?什么类型的音频会报这个错?怎么避免? Error log: 2023-05-29:16:08:44, ERROR [utils.py:332] swipe method: calc F0 is too low. 2023-05-29:16:08:44, INFO [audio_processor.py:483] [AudioProcessor] Pitch align with mel is proceeding... 16%|██████████████████████████████████████▊ | 18/111 [00:03<00:13, 6.76it/s]2023-05-29:16:08:44, ERROR [utils.py:332] rapt method: calc F0 is too low. fft : m must be a integer of power of 2! 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 107/111 [00:04<00:00, 24.96it/s] Traceback (most recent call last): File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/work/virtual-person-tts/modules/auto_label_and_finetune.py", line 165, in run(args.request_id, args.user_id, args.remote_path, args.now_time, args.is_auto_label) File "/home/work/virtual-person-tts/modules/auto_label_and_finetune.py", line 140, in run trainer.train() File "/opt/conda/lib/python3.7/site-packages/modelscope/trainers/audio/tts_trainer.py", line 229, in train self.prepare_data() File "/opt/conda/lib/python3.7/site-packages/modelscope/trainers/audio/tts_trainer.py", line 208, in prepare_data se_model) File "/opt/conda/lib/python3.7/site-packages/modelscope/preprocessors/tts.py", line 37, in call speaker_name, target_lang, skip_script, se_model) File "/opt/conda/lib/python3.7/site-packages/modelscope/preprocessors/tts.py", line 57, in do_data_process targetLang, skip_script, se_model) File "/opt/conda/lib/python3.7/site-packages/kantts/preprocess/data_process.py", line 190, in process_data raw_metafile, File "/opt/conda/lib/python3.7/site-packages/kantts/preprocess/audio_processor/audio_processor.py", line 757, in process train_wav_dir, out_f0_dir, out_frame_f0_dir, out_frame_uv_dir File "/opt/conda/lib/python3.7/site-packages/kantts/preprocess/audio_processor/audio_processor.py", line 485, in pitch_extract result = future.result() File "/opt/conda/lib/python3.7/concurrent/futures/_base.py", line 435, in result return self.__get_result() File "/opt/conda/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result raise self._exception concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

展开
收起
游客22fwimmggr6m6 2023-05-29 18:55:38 340 0
1 条回答
写回答
取消 提交回答
  • 北京阿里云ACE会长

    确认输入的音频文件格式和采样率是否与模型要求一致。例如,如果您使用的是 sambert-h 模型,它可能要求输入音频文件的采样率为 16kHz、16-bit、单声道。您可以使用 Python 的 librosa 库来检查音频文件的采样率、声道数和位数。例如:

    python
    Copy
    import librosa

    audio_file = '/path/to/audio/file.wav'
    y, sr = librosa.load(audio_file, sr=16000, mono=True)
    print(f'Sampling rate: {sr}, channels: {y.shape[1]}, bit depth: {y.dtype}')
    ```

    在上面的代码中,librosa.load 函数用于加载音频文件,并返回音频信号 y 和采样率 sr。如果采样率、声道数或位数与模型要求不一致,您可以使用 librosa 库来进行重采样、通道合并或位深度转换。

    确认您的代码是否正确处理了自动标注得到的标签。例如,如果您的自动标注算法在标注时使用了某种噪声抑制或信号增强技术,可能会导致标签与原始音频信号不完全一致。您需要确保在进行特征提取时,标签和音频信号之间的对应关系是正确的。

    2023-07-10 08:51:20
    赞同 展开评论 打赏

包括语音识别、语音合成、语音唤醒、声学设计及信号处理、声纹识别、音频事件检测等多个领域

相关电子书

更多
移动互联网测试到质量的转变 立即下载
给ITer的技术实战进阶课-阿里CIO学院独家教材(四) 立即下载
F2etest — 多浏览器兼容性测试整体解决方案 立即下载