ModelScope使用自定义模型训练的时候出现问题怎么办？

```# 18GB GPU memory
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1, 2, 3, 4, 5, 6, 7'

from swift.llm import DatasetName, ModelType, SftArguments, sft_main

from typing import Any, Dict

from modelscope import AutoConfig, AutoModelForCausalLM, AutoTokenizer

from torch import dtype as Dtype
from transformers.utils.versions import require_version

from swift.llm import LoRATM, TemplateType, get_model_tokenizer, register_model
from swift.utils import get_logger

logger = get_logger()

class CustomModelType:
tigerbot_7b = 'tigerbot-7b'
tigerbot_13b = 'tigerbot-13b'
tigerbot_13b_chat = 'tigerbot-13b-chat'
edgetest1 = 'edgetest1'

class CustomTemplateType:
tigerbot = 'tigerbot'

@register_model(CustomModelType.tigerbot_7b,
'TigerResearch/tigerbot-7b-base-v3', LoRATM.llama2,
TemplateType.default_generation)
@register_model(CustomModelType.tigerbot_13b,
'TigerResearch/tigerbot-13b-base-v2', LoRATM.llama2,
TemplateType.default_generation)
@register_model(CustomModelType.tigerbot_13b_chat,
'TigerResearch/tigerbot-13b-chat-v4', LoRATM.llama2,
CustomTemplateType.tigerbot)
@register_model(CustomModelType.edgetest1,
'/home/edge/Documents/yi-34b/swift-main/examples/pytorch/llm/scripts/qwen_7b_chat/lora_mp_ddp/output/qwen-7b-chat/v3-20240127-130615/checkpoint-1940-merged', LoRATM.llama2,
TemplateType.default_generation)
def get_tigerbot_model_tokenizer(model_dir: str,
torch_dtype: Dtype,
model_kwargs: Dict[str, Any],
load_model: bool = True,
kwargs):
use_flash_attn = kwargs.pop('use_flash_attn', False)
if use_flash_attn:
require_version('transformers>=4.34')
logger.info('Setting use_flash_attention_2: True')
model_kwargs['use_flash_attention_2'] = True
model_config = AutoConfig.from_pretrained(
model_dir, trust_remote_code=True)
model_config.pretraining_tp = 1
model_config.torch_dtype = torch_dtype
logger.info(f'model_config: {model_config}')
tokenizer = AutoTokenizer.from_pretrained(
model_dir, trust_remote_code=True)
model = None
if load_model:
model = AutoModelForCausalLM.from_pretrained(
model_dir,
config=model_config,
torch_dtype=torch_dtype,
trust_remote_code=True, model_kwargs)
return model, tokenizer

sft_args = SftArguments(
model_type=CustomModelType.tigerbot_7b,

model_type=ModelType.qwen_7b_chat,

model_id_or_path='/home/edge/Documents/yi-34b/swift-main/examples/pytorch/llm/scripts/qwen_7b_chat/lora_mp_ddp/output/qwen-7b-chat/v3-20240127-130615/checkpoint-1940-merged',

model_id_or_path='edgetest1',

dataset=[DatasetName.alpaca_zh, DatasetName.alpaca_en],

model_cache_dir='/home/edge/Documents/yi-34b/swift-main/examples/pytorch/llm/scripts/qwen_7b_chat/lora_mp_ddp/output/qwen-7b-chat/v3-20240127-130615/checkpoint-1940-merged',

custom_train_dataset_path='/home/edge/llm/swift/moni_dataset.jsonl',
train_dataset_sample=500,
eval_steps=20,
logging_steps=5,
output_dir='output',
lora_target_modules=['ALL'],
self_cognition_sample=500

model_name=['小黄', 'Xiao Huang'],

model_author=['魔搭', 'ModelScope']

)
output = sft_main(sft_args)
best_model_checkpoint = output['best_model_checkpoint']
print(f'best_model_checkpoint: {best_model_checkpoint}')

下面是输出的结果中的traceback:
Traceback (most recent call last):
File "/home/edge/llm/swift/edgeft.py", line 88, in
output = sft_main(sft_args)
File "/home/edge/llm/swift/swift/utils/run_utils.py", line 31, in x_main
result = llm_x(args, **kwargs)
File "/home/edge/llm/swift/swift/llm/sft.py", line 125, in llm_sft
train_dataset = add_self_cognition_dataset(train_dataset,
File "/home/edge/llm/swift/swift/llm/utils/dataset.py", line 1001, in add_self_cognition_dataset
assert model_name[0] is not None
AssertionError
```
在此报错之前, 程序已经加载完了模型的所有节点.
以上的代码都是在官方的llm认知范例基础上修改拼接的, 是我的操作方法有问题还是一个bug?
我在注册中添加了我自己的模型, 但无论按照范例注册tigerbot还是注册我自己的模型都会遇到一样的问题, 即在加载完成全部节点之后出现assertionError.
我的运行环境是ubuntu22.04, 设备是8张p40, py环境是3.10.

ModelScope使用自定义模型训练的时候出现问题怎么办？

model_type=ModelType.qwen_7b_chat,

model_id_or_path='/home/edge/Documents/yi-34b/swift-main/examples/pytorch/llm/scripts/qwen_7b_chat/lora_mp_ddp/output/qwen-7b-chat/v3-20240127-130615/checkpoint-1940-merged',

model_id_or_path='edgetest1',

dataset=[DatasetName.alpaca_zh, DatasetName.alpaca_en],

model_cache_dir='/home/edge/Documents/yi-34b/swift-main/examples/pytorch/llm/scripts/qwen_7b_chat/lora_mp_ddp/output/qwen-7b-chat/v3-20240127-130615/checkpoint-1940-merged',

model_name=['小黄', 'Xiao Huang'],

model_author=['魔搭', 'ModelScope']

ModelScope模型即服务

相关产品

热门讨论

热门文章

相关课程

相关电子书

相关实验场景

ModelScope使用自定义模型训练的时候出现问题怎么办？

model_type=ModelType.qwen_7b_chat,

model_id_or_path='/home/edge/Documents/yi-34b/swift-main/examples/pytorch/llm/scripts/qwen_7b_chat/lora_mp_ddp/output/qwen-7b-chat/v3-20240127-130615/checkpoint-1940-merged',

model_id_or_path='edgetest1',

dataset=[DatasetName.alpaca_zh, DatasetName.alpaca_en],

model_cache_dir='/home/edge/Documents/yi-34b/swift-main/examples/pytorch/llm/scripts/qwen_7b_chat/lora_mp_ddp/output/qwen-7b-chat/v3-20240127-130615/checkpoint-1940-merged',

model_name=['小黄', 'Xiao Huang'],

model_author=['魔搭', 'ModelScope']

ModelScope模型即服务

相关产品

热门讨论

热门文章

相关课程

相关文章

相关电子书

相关实验场景