OS:Ubuntu 22.04.1 LTS
报错:
一个是 ERROR:root:Language Cantonese not supported. Using PinYin as default
另一个是 size mismatch for text_encoder.sy_emb.weight: copying a param with shape torch.Size([107, 512]) from checkpoint, the shape in current model is torch.Size([147, 512]).
size mismatch for text_encoder.tone_emb.weight: copying a param with shape torch.Size([14, 512]) from checkpoint, the shape in current model is torch.Size([10, 512]).
(maas) user@user-virtual-machine:~/MyProjects/KAN-TTS-main$ python yueyu.py
2023-10-16 15:01:14,671 - modelscope - INFO - PyTorch version 1.13.1 Found.
2023-10-16 15:01:14,671 - modelscope - INFO - Loading ast index from /home/user/.cache/modelscope/ast_indexer
2023-10-16 15:01:15,039 - modelscope - INFO - Loading done! Current index file version is 1.9.2, with md5 32146877e89460b4730c48c414aaea6c and a total number of 941 components indexed
2023-10-16 15:01:17,399 - modelscope - INFO - Model revision not specified, use revision: v1.0.4
2023-10-16 15:01:22,417 - modelscope - INFO - initiate model from /home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k
2023-10-16 15:01:22,418 - modelscope - INFO - initiate model from location /home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k.
2023-10-16 15:01:22,420 - modelscope - INFO - initialize model from /home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k
2023-10-16 15:01:22,427 - modelscope - INFO - am_config=/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/am/config.yaml voc_config=/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/voc/config.yaml
2023-10-16 15:01:22,427 - modelscope - INFO - audio_config=/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/audio_config.yaml
2023-10-16 15:01:22,427 - modelscope - INFO - am_ckpts=OrderedDict([(980000, '/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/am/ckpt/checkpoint_980000.pth')])
2023-10-16 15:01:22,427 - modelscope - INFO - voc_ckpts=OrderedDict([(360000, '/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/voc/ckpt/checkpoint_360000.pth')])
2023-10-16 15:01:22,427 - modelscope - INFO - se_path=/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/am/se.npy se_model_path=/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/se/ckpt/se.onnx
2023-10-16 15:01:22,427 - modelscope - INFO - mvn_path=/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k/voices/F7/am/mvn.npy
ERROR:root:Language Cantonese not supported. Using PinYin as default
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
Load pinyin_en_mix_dict failed
text.cc: festival_Text_init
2023-10-16 15:01:34,330 - modelscope - WARNING - No preprocessor field found in cfg.
2023-10-16 15:01:34,331 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2023-10-16 15:01:34,331 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/user/.cache/modelscope/hub/speech_tts/speech_sambert-hifigan_tts_jiajia_Cantonese_16k'}. trying to build by task and model information.
2023-10-16 15:01:34,331 - modelscope - WARNING - No preprocessor key ('sambert-hifigan', 'text-to-speech') found in PREPROCESSOR_MAP, skip building preprocessor.
2023-10-16 15:01:34,332 - modelscope - INFO - cuda is not available, using cpu instead.
Traceback (most recent call last):
File "yueyu.py", line 8, in <module>
output = sambert_hifigan_tts(input=text)
File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/pipelines/base.py", line 219, in __call__
output = self._process_single(input, *args, **kwargs)
File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/pipelines/base.py", line 254, in _process_single
out = self.forward(out, **forward_params)
File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/pipelines/audio/text_to_speech_pipeline.py", line 38, in forward
output_wav = self.model.forward(input, forward_params.get('voice'))
File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/models/audio/tts/sambert_hifi.py", line 272, in forward
audio = self.synthesis_one_sentences(voice, line[1])
File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/models/audio/tts/sambert_hifi.py", line 192, in synthesis_one_sentences
return self.voices[voice_name].forward(text)
File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/models/audio/tts/voice.py", line 650, in forward
self.load_am()
File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/modelscope/models/audio/tts/voice.py", line 251, in load_am
self.am.load_state_dict(state_dict['model'], strict=False)
File "/home/user/anaconda3/envs/maas/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1672, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for KanTtsSAMBERT:
size mismatch for text_encoder.sy_emb.weight: copying a param with shape torch.Size([107, 512]) from checkpoint, the shape in current model is torch.Size([147, 512]).
size mismatch for text_encoder.tone_emb.weight: copying a param with shape torch.Size([14, 512]) from checkpoint, the shape in current model is torch.Size([10, 512]).
这个maas虚拟环境应该是没问题的,是直接按KAN-TTS中的environment.yaml生成的,同时安装了ModelScope Library,包括核心组件与语音组件。在这个虚拟环境中,能够成功跑出四川话、上海话以及普通话(发音人Zhiyan)。
尝试过在cache中删除所下模型重新下载,但仍有该问题。