Some weights of the model checkpoint at mypath/bert-base-chinese were not used when initializing Ber

简介: Some weights of the model checkpoint at mypath/bert-base-chinese were not used when initializing Ber

运行环境,警告信息和解决方案:

Linux环境,Python3,PyTorch版本为1.8.1,transformers包版本为4.12.5。


代码:

from transformers import AutoTokenizer,AutoModel
pretrained_path="mypath/bert-base-chinese"
tokenizer=AutoTokenizer.from_pretrained(pretrained_path)
encoder=AutoModel.from_pretrained(pretrained_path)


(模型文件下载自:bert-base-chinese · Hugging Face)


警告信息:


Some weights of the model checkpoint at mypath/bert-base-chinese were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


这是个警告信息,不是报错信息。这只说明对应的加载的预训练模型与任务类型不完全对应。如这个模型的架构是BertForMaskedLM,因此用BERT在别的任务上的model类来调用该预训练模型时,要么出现有些参数用不到的情况(如本例),要么出现有些参数没有、需要随机初始化的情况。

本例由于我只想输出transformer模型的last hidden state,因此用不到警告信息中所说的这些分类参数。


如果你想直接删除这个信息,可以使用:

from transformers import logging
logging.set_verbosity_warning()


或:

from transformers import logging
logging.set_verbosity_error()


以下介绍一些相关的知识点:


理论上应该完全匹配(config.json中给出的architectures就是BertForMaskedLM),但是仍会显示有些参数用不到:

from transformers import AutoTokenizer,BertForMaskedLM
pretrained_path="mypath/bert-base-chinese"
tokenizer=AutoTokenizer.from_pretrained(pretrained_path)
encoder=BertForMaskedLM.from_pretrained(pretrained_path)


警告信息:

Some weights of the model checkpoint at mypath/bert-base-chinese were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


参考https://github.com/huggingface/transformers/issues/5421#issuecomment-717245807,应该是因为官方的BERT模型有两个预测头(MLM和NSP),所以MLM任务上的模型没有加载NSP的预测头。


有新的参数需要随机初始化的情况(AutoModelForSequenceClassification这类属于需要在原始模型的基础上进行微调的模型类,对其进一步的了解可参考我撰写的另一篇博文:用huggingface.transformers.AutoModelForSequenceClassification在文本分类任务上微调预训练模型_诸神缄默不语的博客-CSDN博客_huggingface transformers微调):


from transformers import AutoConfig,AutoTokenizer,AutoModelForSequenceClassification
model_path="mypath/bert-base-chinese"
config=AutoConfig.from_pretrained(model_path,num_labels=5)
tokenizer=AutoTokenizer.from_pretrained(model_path)
encoder=AutoModelForSequenceClassification.from_pretrained(model_path,config=config)


警告信息:

Some weights of the model checkpoint at mypath/bert-base-chinese were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at mypath/bert-base-chinese and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
相关文章
|
7月前
|
机器学习/深度学习 自然语言处理 算法
【论文精读】ACL 2022:Graph Pre-training for AMR Parsing and Generation
【论文精读】ACL 2022:Graph Pre-training for AMR Parsing and Generation
|
4月前
【Bert4keras】解决Key bert/embeddings/word_embeddings not found in checkpoint
在使用bert4keras进行预训练并加载模型时遇到的"bert/embeddings/word_embeddings not found in checkpoint"错误,并提供了通过重新生成权重模型来解决这个问题的方法。
70 3
|
4月前
|
TensorFlow API 算法框架/工具
【Tensorflow】解决Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Te
文章讨论了在使用Tensorflow 2.3时遇到的一个错误:"Inputs to eager execution function cannot be Keras symbolic tensors...",这个问题通常与Tensorflow的eager execution(急切执行)模式有关,提供了三种解决这个问题的方法。
55 1
|
4月前
|
TensorFlow API 算法框架/工具
【Tensorflow+keras】解决使用model.load_weights时报错 ‘str‘ object has no attribute ‘decode‘
python 3.6,Tensorflow 2.0,在使用Tensorflow 的keras API,加载权重模型时,报错’str’ object has no attribute ‘decode’
62 0
|
6月前
|
人工智能 自然语言处理 PyTorch
CLIP(Contrastive Language-Image Pre-training)
CLIP(Contrastive Language-Image Pre-training)
348 0
|
7月前
[Transformer-XL]论文实现:Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
[Transformer-XL]论文实现:Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
43 1
|
7月前
|
机器学习/深度学习 数据挖掘 Python
[Bart]论文实现:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation...
[Bart]论文实现:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation...
58 0
|
运维 算法 数据挖掘
Clustering-Base and Classification -Base Approaches|学习笔记
快速学习 Clustering-Base and Classification -Base Approaches
Clustering-Base and Classification -Base Approaches|学习笔记
|
算法 调度 索引
beamManagement(三)connected mode DL Beam training
在进入连接态后,DL 可以使用CSI-RS/SSB进行波束训练,上行使用Sounding RS进行波束训练。先看下行波束训练过程,DL 参考信号的RRC层配置结构如下。
|
机器学习/深度学习 自然语言处理 算法
ACL 2022:Graph Pre-training for AMR Parsing and Generation
抽象语义表示(AMR)以图形结构突出文本的核心语义信息。最近,预训练语言模型(PLM)分别具有AMR解析和AMR到文本生成的高级任务。
165 0