开发者社区 > ModelScope模型即服务 > 自然语言处理 > 正文

使用StructBERT FAQ问答 模型进行训练时报错,各位大佬进来看看,帮忙解决解决,十分感谢

问题描述

使用问答模型 damo/nlp_structbert_faq-question-answering_chinese-base 进行训练时,参照官方文档代码出现如下错误

** build_dataset error log: 'structbert is not in the custom_datasets registry group faq-question-answering. Please make sure the correct version of ModelScope library is used.'

官方地址

模型地址

使用的数据集格式 如下

textlabelanswer
数据采集1数据采集
采集数据1数据采集
数据收集1数据采集
开始采集1数据采集
采集开始1数据采集
收集数据1数据采集
数据收集1数据采集
问题反馈2问题反馈
反馈问题2问题反馈
上报问题2问题反馈
问题上报2问题反馈
开始反馈2问题反馈
反馈开始2问题反馈
汇报工作3工作汇报
工作汇报3工作汇报
工作上报3工作汇报
上报工作3工作汇报
开始汇报3工作汇报
汇报开始3工作汇报
工作填报4工作填报
填报工作4工作填报
开始填报4工作填报
工作填报4工作填报

调试完整代码


import os
from modelscope.metainfo import Trainers
from modelscope.msdatasets import MsDataset
from modelscope.pipelines import pipeline
from modelscope.trainers import build_trainer
from modelscope.utils.config import Config
from modelscope.utils.hub import read_config
  
train_dataset = MsDataset.load("./qa.csv", split='train').remap_columns({'text': 'text'})
print(train_dataset)
eval_dataset = train_dataset
cfg: Config = read_config("damo/nlp_structbert_faq-question-answering_chinese-base")
cfg.train.train_iters_per_epoch = 30
cfg.evaluation.val_iters_per_epoch = 2
cfg.train.seed = 1234
cfg.train.optimizer.lr = 2e-5
cfg.train.hooks = [{
    'type': 'CheckpointHook',
    'by_epoch': False,
    'interval': 50
}, {
    'type': 'EvaluationHook',
    'by_epoch': False,
    'interval': 50
}, {
    'type': 'TextLoggerHook',
    'by_epoch': False,
    'rounding_digits': 5,
    'interval': 10
}]
cfg_file = os.path.join("./model/temp", 'config.json')
cfg.dump(cfg_file)

trainer = build_trainer(
    Trainers.faq_question_answering_trainer,
    default_args=dict(
        model="damo/nlp_structbert_faq-question-answering_chinese-base",
        work_dir="./model/temp",
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        cfg_file=cfg_file))

trainer.train()

evaluate_result = trainer.evaluate()
print(evaluate_result)

完整错误信息

Dataset({
    features: ['text', 'label', 'answer'],
    num_rows: 26
})
2023-06-26 20:32:21,659 - modelscope - INFO - initialize model from ./model/damo/nlp_structbert_faq-question-answering_chinese-base
2023-06-26 20:32:22,949 - modelscope - INFO - faq task build protonet network
2023-06-26 20:32:28,135 - modelscope - INFO - All model checkpoint weights were used when initializing SbertForFaqQuestionAnswering.

2023-06-26 20:32:28,136 - modelscope - INFO - All the weights of SbertForFaqQuestionAnswering were initialized from the model checkpoint If your task is similar to the task the model of the checkpoint was trained on, you can already use SbertForFaqQuestionAnswering for predictions without further training.
2023-06-26 20:32:28,137 - modelscope - WARNING - No train key and type key found in preprocessor domain of configuration.json file.
2023-06-26 20:32:28,138 - modelscope - WARNING - Cannot find available config to build preprocessor at mode train, current config: {'max_seq_length': 50, 'model_dir': './model/damo/nlp_structbert_faq-question-answering_chinese-base'}. trying to build by task and model information.
2023-06-26 20:32:28,172 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2023-06-26 20:32:28,172 - modelscope - WARNING - Cannot find available config to build preprocessor at mode eval, current config: {'max_seq_length': 50, 'model_dir': './model/damo/nlp_structbert_faq-question-answering_chinese-base'}. trying to build by task and model information.
2023-06-26 20:32:28,185 - modelscope - WARNING - ('CUSTOM_DATASETS', 'faq-question-answering', 'structbert') not found in ast index file
2023-06-26 20:32:28,186 - modelscope - WARNING - ('CUSTOM_DATASETS', 'faq-question-answering', 'structbert') not found in ast index file
2023-06-26 20:32:28,187 - modelscope - INFO - cuda is not available, using cpu instead.
2023-06-26 20:32:28,187 - modelscope - INFO - ==========================Training Config Start==========================
2023-06-26 20:32:28,187 - modelscope - INFO - {
    "framework": "pytorch",
    "task": "faq-question-answering",
    "pipeline": {
        "type": "faq-question-answering"
    },
    "model": {
        "type": "structbert",
        "pooling": "avg",
        "metric": "relation"
    },
    "preprocessor": {
        "max_seq_length": 50,
        "model_dir": "./model/damo/nlp_structbert_faq-question-answering_chinese-base"
    },
    "train": {
        "seed": 1234,
        "hooks": [
            {
                "type": "IterTimerHook"
            }
        ],
        "train_iters_per_epoch": 30,
        "max_epochs": 1,
        "sampler": {
            "n_way": 5,
            "k_shot": 5,
            "r_query": 5,
            "min_labels": 2
        },
        "optimizer": {
            "type": "Adam",
            "lr": 2e-05,
            "options": {
                "grad_clip": {
                    "max_norm": 5.0
                }
            }
        },
        "lr_scheduler": {
            "type": "LinearLR",
            "options": {
                "by_epoch": false
            }
        },
        "dataloader": {
            "workers_per_gpu": 1
        },
        "checkpoint": {
            "period": {
                "by_epoch": false,
                "interval": 50
            }
        },
        "logging": {
            "by_epoch": false,
            "rounding_digits": 5,
            "interval": 10
        },
        "work_dir": "./model/temp"
    },
    "evaluation": {
        "metrics": "seq-cls-metric",
        "val_iters_per_epoch": 2,
        "dataloader": {
            "workers_per_gpu": 1
        },
        "period": {
            "by_epoch": false,
            "interval": 50
        }
    }
}
2023-06-26 20:32:28,188 - modelscope - INFO - ===========================Training Config End===========================
2023-06-26 20:32:28,190 - modelscope - INFO - num. of bad sample ids:5/26
2023-06-26 20:32:28,192 - modelscope - INFO - train: label size:3.0, data size:18,                 domain_size:1
2023-06-26 20:32:28,193 - modelscope - WARNING - ('OPTIMIZER', 'default', 'Adam') not found in ast index file
2023-06-26 20:32:28,194 - modelscope - WARNING - ('LR_SCHEDULER', 'default', 'LinearLR') not found in ast index file
2023-06-26 20:32:28,194 - modelscope - INFO - Stage: before_run:
    (ABOVE_NORMAL) OptimizerHook                      
    (LOW         ) LrSchedulerHook                    
    (LOW         ) CheckpointHook                     
    (VERY_LOW    ) TextLoggerHook                     
 -------------------- 
Stage: before_train_epoch:
    (LOW         ) LrSchedulerHook                    
 -------------------- 
Stage: before_train_iter:
    (ABOVE_NORMAL) OptimizerHook                      
 -------------------- 
Stage: after_train_iter:
    (ABOVE_NORMAL) OptimizerHook                      
    (NORMAL      ) EvaluationHook                     
    (LOW         ) LrSchedulerHook                    
    (LOW         ) CheckpointHook                     
    (VERY_LOW    ) TextLoggerHook                     
 -------------------- 
Stage: after_train_epoch:
    (NORMAL      ) EvaluationHook                     
    (LOW         ) LrSchedulerHook                    
    (LOW         ) CheckpointHook                     
    (VERY_LOW    ) TextLoggerHook                     
 -------------------- 
Stage: after_val_epoch:
    (VERY_LOW    ) TextLoggerHook                     
 -------------------- 
Stage: after_run:
    (LOW         ) CheckpointHook                     
 -------------------- 
2023-06-26 20:32:28,197 - modelscope - INFO - Checkpoints will be saved to ./model/temp
2023-06-26 20:32:28,197 - modelscope - INFO - Text logs will be saved to ./model/temp
** build_dataset error log: 'structbert is not in the custom_datasets registry group faq-question-answering. Please make sure the correct version of ModelScope library is used.'
** build_dataset error log: 'structbert is not in the custom_datasets registry group faq-question-answering. Please make sure the correct version of ModelScope library is used.'

展开
收起
1140532414034252 2023-06-26 21:08:25 275 0
3 条回答
写回答
取消 提交回答
  • 遇到一样的问题,自己构造数据,跟官方的一摸一样也报这个错,压根就不准备让人训练用的

    2023-11-01 15:02:36
    赞同 1 展开评论 打赏
  • 北京阿里云ACE会长

    可以使用以下命令进行更新:

    python
    Copy
    !pip install -U modelscope
    如果更新后仍然存在问题,可以尝试手动注册自定义数据集。在代码中添加以下代码:

    python
    Copy
    from modelscope.datasets import custom_datasets

    custom_datasets.register("faq-question-answering", "structbert")

    2023-07-10 07:51:09
    赞同 2 展开评论 打赏
  • 数据集格式有问题,建议搞个json或者csv

    2023-06-28 11:24:14
    赞同 展开评论 打赏

包含命名实体识别、文本分类、分词、关系抽取、问答、推理、文本摘要、情感分析、机器翻译等多个领域

热门讨论

热门文章

相关电子书

更多
低代码开发师(初级)实战教程 立即下载
冬季实战营第三期:MySQL数据库进阶实战 立即下载
阿里巴巴DevOps 最佳实践手册 立即下载