报错信息如下,需要加载两次才正常
2024-01-02 19:27:17,675 - modelscope - WARNING - Find task: rex-uninlu, model type: None. Insufficient information to build preprocessor, skip building preprocessor
ERROR: Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/modelscope/utils/registry.py", line 212, in build_from_cfg
return obj_cls(args)
File "/root/.cache/modelscope/modelscope_modules/nlp_deberta_rex-uninlu_chinese-base/ms_wrapper.py", line 29, in init
self.model, self.trainer = self.init_model(kwargs)
File "/root/.cache/modelscope/modelscope_modules/nlp_deberta_rex-uninlu_chinese-base/ms_wrapper.py", line 37, in init_model
tokenizer = AutoTokenizer.from_pretrained(training_args.bert_model_dir)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 751, in from_pretrained
if model_type is not None:
File "/opt/conda/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 52, in from_pretrained
return ori_from_pretrained(cls, model_dir, model_args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2017, in from_pretrained
tokenizer = cls(init_inputs, init_kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2049, in _from_pretrained
if added_tokens_file is not None:
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained
legacy_format=legacy_format,
File "/opt/conda/lib/python3.10/site-packages/transformers/models/bert/tokenization_bert.py", line 199, in init
super().init(
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in init
raise NotImplementedError
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
File "/opt/conda/lib/python3.10/site-packages/transformers/models/bert/tokenization_bert.py", line 239, in get_vocab
return dict(self.vocab, self.added_tokens_encoder)
AttributeError: 'BertTokenizer' object has no attribute 'vocab'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 677, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 566, in aenter
await self._router.startup()
File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 656, in startup
handler()
File "/opt/conda/lib/python3.10/site-packages/modelscope/server/core/event_handlers.py", line 37, in startup
_startup_model(app)
File "/opt/conda/lib/python3.10/site-packages/modelscope/server/core/event_handlers.py", line 17, in _startup_model
app.state.pipeline = create_pipeline(app.state.args.model_id,
File "/opt/conda/lib/python3.10/site-packages/modelscope/utils/input_output.py", line 79, in create_pipeline
return pipeline(
File "/opt/conda/lib/python3.10/site-packages/modelscope/pipelines/builder.py", line 170, in pipeline
return build_pipeline(cfg, task_name=task)
File "/opt/conda/lib/python3.10/site-packages/modelscope/pipelines/builder.py", line 65, in build_pipeline
return build_from_cfg(
File "/opt/conda/lib/python3.10/site-packages/modelscope/utils/registry.py", line 215, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
AttributeError: RexUniNLUPipeline: 'BertTokenizer' object has no attribute 'vocab'
ERROR: Application startup failed. Exiting.
root@dsw-225051-7bf99b5bb8-s8h4v:/mnt/workspace# modelscope server --model_id=damo/nlp_deberta_rex-uninlu_chinese-base --revision=v1.2.1
这个问题是由于BertTokenizer
对象没有vocab
属性导致的。你可以尝试以下方法解决这个问题:
pip install -U transformers
BertTokenizer
对象时添加model_max_length
参数:from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("damo/nlp_deberta_rex-uninlu_chinese-base", model_max_length=512)
这里将model_max_length
设置为512,你可以根据实际需求进行调整。
这个问题是由于BertTokenizer
对象没有vocab
属性导致的。你可以尝试以下方法解决这个问题:
pip install --upgrade transformers
BertTokenizer
对象时添加model_max_length
参数,例如:from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("damo/nlp_deberta_rex-uninlu_chinese-base", model_max_length=512)
这将限制模型的最大输入长度为512个token。你可以根据实际需求调整这个值。