" (modelscope) [root@localhost llm]# python llm_sft.py --model_type qwen1half-0_5b-chat --sft_type lora --tuner_backend swift --dtype AUTO --output_dir output --dataset ms-bench --train_dataset_sample 5000 --num_train_epochs 2 --max_length 1024 --check_dataset_strategy warning --lora_rank 8 --lora_alpha 32 --lora_dropout_p 0.05 --lora_target_modules ALL --gradient_checkpointing true --batch_size 1 --weight_decay 0.01 --learning_rate 1e-4 --gradient_accumulation_steps 16 --max_grad_norm 0.5 --warmup_ratio 0.03 --eval_steps 100 --save_steps 100 --save_total_limit 2 --logging_steps 10 --use_flash_attn false --self_cognition_sample 1000 --model_name 米强 --model_author 柯大师
2024-04-11 17:08:38,723 - modelscope - INFO - PyTorch version 2.2.2 Found.
2024-04-11 17:08:38,723 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-04-11 17:08:38,741 - modelscope - INFO - Loading done! Current index file version is 1.13.3, with md5 b69da8cf6fd4fc02f46c7e3d4aac0467 and a total number of 972 components indexed
[INFO:swift] Start time of running main: 2024-04-11 17:08:40.280420
[INFO:swift] Setting template_type: qwen
[INFO:swift] Setting args.lazy_tokenize: False
Traceback (most recent call last):
File ""llm_sft.py"", line 7, in
output = sft_main()
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/swift/utils/run_utils.py"", line 25, in x_main
args, remaining_argv = parse_args(args_class, argv)
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/swift/utils/utils.py"", line 98, in parse_args
args, remaining_args = parser.parse_args_into_dataclasses(
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/transformers/hf_argparser.py"", line 338, in parse_args_into_dataclasses
obj = dtype(**inputs)
File """", line 134, in init
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/swift/llm/utils/argument.py"", line 447, in post_init
self._init_training_args()
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/swift/llm/utils/argument.py"", line 472, in _init_training_args
training_args = Seq2SeqTrainingArguments(
File """", line 133, in init
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/swift/trainers/arguments.py"", line 44, in post_init
super().post_init()
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/transformers/training_args.py"", line 1551, in post_init
and (self.device.type != ""cuda"")
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/transformers/training_args.py"", line 2027, in device
return self._setup_devices
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/transformers/utils/generic.py"", line 63, in get
cached = self.fget(obj)
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/transformers/training_args.py"", line 1963, in _setup_devices
self.distributed_state = PartialState(
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/accelerate/state.py"", line 273, in init
self.num_processes = torch.distributed.get_world_size()
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py"", line 1555, in get_world_size
return _get_group_size(group)
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py"", line 836, in _get_group_size
default_pg = _get_default_group()
File ""/usr/local/anaconda3/envs/modelscope/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py"", line 977, in _get_default_group
raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group. modelscope问题怎么处理?"
ModelScope旨在打造下一代开源的模型即服务共享平台,为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品,让模型应用更简单!欢迎加入技术交流群:微信公众号:魔搭ModelScope社区,钉钉群号:44837352