modelscope-funasrbash的 finetune.sh命令下载训练模型显示错误,如何解决?speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch ,:
Downloading: 76%|███████▌ | 640M/840M [19:27<06:04, 575kB/s]
Downloading: 38%|███▊ | 320M/840M [19:27<31:36, 287kB/s]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4672) of binary: /home/dcb/anaconda3/envs/funasr/bin/python
Traceback (most recent call last):
File "/home/dcb/anaconda3/envs/funasr/bin/torchrun", line 8, in
sys.exit(main())
File "/home/dcb/anaconda3/envs/funasr/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(args, *kwargs)
File "/home/dcb/anaconda3/envs/funasr/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/home/dcb/anaconda3/envs/funasr/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/dcb/anaconda3/envs/funasr/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/dcb/anaconda3/envs/funasr/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
Failures:
[1]:
time : 2024-05-11_11:29:56
host : dcb-Legion-Y9000P-IRX8
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 4673)
error_file:
Root Cause (first observed failure):
[0]:
time : 2024-05-11_11:29:56
host : dcb-Legion-Y9000P-IRX8
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 4672)
error_file:
如果第一次运行,模型还没有下载,先用单gpu运行,测试没问题后,再多gpu运行,如果直接多gpu运行,每个gpu都去去modelscope上下载,会导致下载冲突了。此回答整理自钉群“modelscope-funasr社区交流”