一、点击下面的链接从Kaggle网站下载Bert模型权重文件,注意:下载之前需要在Kaggle网站注册账户和登录才能进行文件下载,下载需要五分钟左右,请耐心等候。
Kaggle官方网站:点击打开torch_bert_weights网站
二、下载完成之后,进行压缩包解压到自己熟悉的文件路径,再对解压后的文件夹的内容进行优化调整文件。
三、若没有基于Pytorch安装Bert模型配置运行环境,则可以点击下面的链接打开文章进行安装,若没有下载和配置Jupyter Notebook可以同上。
- 环境安装配置文章链接:点击打开《基于Pytorch学习Bert模型配置运行环境详细流程》文章
- Jupyter Notebook下载安装文章链接:点击打开《Jupyter Notebook安装及使用指南》文章
- Jupyter Notebook 配置文章链接:点击打开《Jupyter Notebook自动补全代码配置》文章
四、在jupyter notebook中运行下面代码出现问题,报ValueError: unable to parse F:/modelfile/Bert/bert-base-uncased/config.json as a URL or as a local path错误,注意:下面代码中的文件路径根据自己下载解压的文件路径对应更改。
from transformers import BertTokenizer, BertModel, BertForMaskedLM import numpy as np import torch # 加载bert的分词器 tokenizer = BertTokenizer.from_pretrained('F:/modelfile/Bert/bert-base-uncased-vocab.txt') # 加载bert模型,这个路径文件夹下有bert配置文件和bert模型权重文件 bert = BertModel.from_pretrained('F:/modelfile/Bert/bert-base-uncased/')
ValueError Traceback (most recent call last) D:\Anaconda\lib\site-packages\transformers\configuration_utils.py in _get_config_dict(cls, pretrained_model_name_or_path, **kwargs) 603 use_auth_token=use_auth_token, --> 604 user_agent=user_agent, 605 ) D:\Anaconda\lib\site-packages\transformers\utils\hub.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, use_auth_token, local_files_only) 299 # Something unknown --> 300 raise ValueError(f"unable to parse {url_or_filename} as a URL or as a local path") 301 ValueError: unable to parse F:/modelfile/Bert/bert-base-uncased/config.json as a URL or as a local path During handling of the above exception, another exception occurred: OSError Traceback (most recent call last) <ipython-input-10-0dc95b15dc3c> in <module>() 5 tokenizer = BertTokenizer.from_pretrained('F:/modelfile/Bert/bert-base-uncased-vocab.txt') 6 # 加载bert模型,这个路径文件夹下有bert_config.json配置文件和model.bin模型权重文件 ----> 7 bert = BertModel.from_pretrained('F:/modelfile/Bert/bert-base-uncased/') D:\Anaconda\lib\site-packages\transformers\modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs) 1593 _from_auto=from_auto_class, 1594 _from_pipeline=from_pipeline, -> 1595 **kwargs, 1596 ) 1597 else: D:\Anaconda\lib\site-packages\transformers\configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs) 519 assert unused_kwargs == {"foo": False} 520 ```""" --> 521 config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs) 522 if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type: 523 logger.warning( D:\Anaconda\lib\site-packages\transformers\configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs) 546 original_kwargs = copy.deepcopy(kwargs) 547 # Get config dict associated with the base config file --> 548 config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs) 549 550 # That config file may point us toward another config file to use. D:\Anaconda\lib\site-packages\transformers\configuration_utils.py in _get_config_dict(cls, pretrained_model_name_or_path, **kwargs) 628 except ValueError: 629 raise EnvironmentError( --> 630 f"We couldn't connect to '{HUGGINGFACE_CO_RESOLVE_ENDPOINT}' to load this model, couldn't find it in the cached " 631 f"files and it looks like {pretrained_model_name_or_path} is not the path to a directory containing a " 632 "{configuration_file} file.\nCheckout your internet connection or see how to run the library in " OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like F:/modelfile/Bert/bert-base-uncased/ is not the path to a directory containing a {configuration_file} file. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
五、上面出现的错误翻译成中文是“数值误差:无法分析F:/modelfile/Bert/Bert base uncased/config.json作为URL或本地路径”,可以看出BertModel.from_pretrained方法是默认分析获取config.json文件的内容,但是根据博主自己提供的文件路径只有bert_config.json文件没有config.json文件,那么要么更改from_pretrained方法的源代码,要么更改文件名称。博主建议直接更改文件名称,将bert_config.json改成config.json。
六、重新在jupyter notebook重新运行之前的代码发现能够正常运行,只是有警告产生。
D:\Anaconda\lib\site-packages\transformers\tokenization_utils_base.py:1656: FutureWarning: Calling BertTokenizer.from_pretrained() with the path to a single file or url is deprecated and won't be possible anymore in v5. Use a model identifier or the path to a directory instead. FutureWarning, Some weights of the model checkpoint at F:/modelfile/Bert/bert-base-uncased/ were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias'] - This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
七、警告虽不影响程序正常运行,但是毕竟不怎么好看,可以加入下面的代码对警告进行忽略不显示,再次在jupyter notebook中运行之后就警告就会忽略消失。
from transformers import logging logging.set_verbosity_error() import warnings # D:\Anaconda\lib\site-packages\transformers\tokenization_utils_base.py:1656:下面的代码module和lineno对应上的错误 warnings.filterwarnings("ignore", category=FutureWarning, module="transformers", lineno=1656)