ModelScope使用自定义模型训练的时候出现问题怎么办?
```# 18GB GPU memory
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1, 2, 3, 4, 5, 6, 7'
from swift.llm import DatasetName, ModelType, SftArguments, sft_main
from typing import Any, Dict
from modelscope import AutoConfig, AutoModelForCausalLM, AutoTokenizer
from torch import dtype as Dtype
from transformers.utils.versions import require_version
from swift.llm import LoRATM, TemplateType, get_model_tokenizer, register_model
from swift.utils import get_logger
logger = get_logger()
class CustomModelType:
tigerbot_7b = 'tigerbot-7b'
tigerbot_13b = 'tigerbot-13b'
tigerbot_13b_chat = 'tigerbot-13b-chat'
edgetest1 = 'edgetest1'
class CustomTemplateType:
tigerbot = 'tigerbot'
@register_model(CustomModelType.tigerbot_7b,
'TigerResearch/tigerbot-7b-base-v3', LoRATM.llama2,
TemplateType.default_generation)
@register_model(CustomModelType.tigerbot_13b,
'TigerResearch/tigerbot-13b-base-v2', LoRATM.llama2,
TemplateType.default_generation)
@register_model(CustomModelType.tigerbot_13b_chat,
'TigerResearch/tigerbot-13b-chat-v4', LoRATM.llama2,
CustomTemplateType.tigerbot)
@register_model(CustomModelType.edgetest1,
'/home/edge/Documents/yi-34b/swift-main/examples/pytorch/llm/scripts/qwen_7b_chat/lora_mp_ddp/output/qwen-7b-chat/v3-20240127-130615/checkpoint-1940-merged', LoRATM.llama2,
TemplateType.default_generation)
def get_tigerbot_model_tokenizer(model_dir: str,
torch_dtype: Dtype,
model_kwargs: Dict[str, Any],
load_model: bool = True,
kwargs):
use_flash_attn = kwargs.pop('use_flash_attn', False)
if use_flash_attn:
require_version('transformers>=4.34')
logger.info('Setting use_flash_attention_2: True')
model_kwargs['use_flash_attention_2'] = True
model_config = AutoConfig.from_pretrained(
model_dir, trust_remote_code=True)
model_config.pretraining_tp = 1
model_config.torch_dtype = torch_dtype
logger.info(f'model_config: {model_config}')
tokenizer = AutoTokenizer.from_pretrained(
model_dir, trust_remote_code=True)
model = None
if load_model:
model = AutoModelForCausalLM.from_pretrained(
model_dir,
config=model_config,
torch_dtype=torch_dtype,
trust_remote_code=True,
model_kwargs)
return model, tokenizer
sft_args = SftArguments(
model_type=CustomModelType.tigerbot_7b,
custom_train_dataset_path='/home/edge/llm/swift/moni_dataset.jsonl',
train_dataset_sample=500,
eval_steps=20,
logging_steps=5,
output_dir='output',
lora_target_modules=['ALL'],
self_cognition_sample=500
)
output = sft_main(sft_args)
best_model_checkpoint = output['best_model_checkpoint']
print(f'best_model_checkpoint: {best_model_checkpoint}')
下面是输出的结果中的traceback:
Traceback (most recent call last):
File "/home/edge/llm/swift/edgeft.py", line 88, in
output = sft_main(sft_args)
File "/home/edge/llm/swift/swift/utils/run_utils.py", line 31, in x_main
result = llm_x(args, **kwargs)
File "/home/edge/llm/swift/swift/llm/sft.py", line 125, in llm_sft
train_dataset = add_self_cognition_dataset(train_dataset,
File "/home/edge/llm/swift/swift/llm/utils/dataset.py", line 1001, in add_self_cognition_dataset
assert model_name[0] is not None
AssertionError
```
在此报错之前, 程序已经加载完了模型的所有节点.
以上的代码都是在官方的llm认知范例基础上修改拼接的, 是我的操作方法有问题还是一个bug?
我在注册中添加了我自己的模型, 但无论按照范例注册tigerbot还是注册我自己的模型都会遇到一样的问题, 即在加载完成全部节点之后出现assertionError.
我的运行环境是ubuntu22.04, 设备是8张p40, py环境是3.10.
根据您提供的代码和报错信息,问题出在model_name
参数上。在您的代码中,model_name
被设置为['小黄', 'Xiao Huang']
,而在add_self_cognition_dataset
函数中,有一个断言检查model_name[0]
是否为None
。由于model_name[0]
的值为'小黄'
,而不是None
,所以触发了AssertionError
。
要解决这个问题,您可以将model_name
参数设置为一个包含两个元素的列表,例如:
sft_args = SftArguments(
# ...其他参数...
model_name=['小黄', 'Xiao Huang'],
# ...其他参数...
)
这样,当add_self_cognition_dataset
函数中的断言检查model_name[0]
时,它将不会触发错误。
你需要将self_cognition_sample=500注释掉, 或者将model_name, model_author的注释取消掉
--此回答整理自钉群“魔搭ModelScope开发者联盟群 ①”
ModelScope旨在打造下一代开源的模型即服务共享平台,为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品,让模型应用更简单!欢迎加入技术交流群:微信公众号:魔搭ModelScope社区,钉钉群号:44837352