直接使用
请打开基于EasyNLP的多模态CLIP图文检索,并点击右上角 “ 在DSW中打开” 。
基于EasyNLP的多模态CLIP图文检索
EasyNLP是阿里云机器学习PAI算法团队基于PyTorch开发的易用且丰富的NLP算法框架( https://github.com/alibaba/EasyNLP ),支持常用的中文预训练模型和大模型落地技术,并且提供了从训练到部署的一站式NLP开发体验。EasyNLP提供了简洁的接口供用户开发NLP模型,包括NLP应用AppZoo和预训练ModelZoo,同时提供技术帮助用户高效地落地超大预训练模型到业务,旨在帮助自然语言开发者方便快捷地构建模型并应用于生产。由于跨模态理解需求的不断增加,EasyNLP也将支持各种跨模态模型,特别是中文领域的跨模态模型,希望能够服务更多的NLP和多模态算法开发者和研究者。
图文检索是跨模态检索的一种主流任务,广泛应用于各种网络应用中。本文将为您介绍如何在PAI-DSW中基于EasyNLP快速使用CLIP进行跨模态图文检索。
关于CLIP
CLIP是2021年2月由OpenAI提出的一种基于对比学习的图文预训练表征模型,全称是Contrastive Language-Image Pre-training。模型分别构建了图像和文本的Encoder,对图像和文本进行特征抽取。其中图像Encoder使用的Backbone可以是经典的ResNet系列模型,也可以是更先进的Transformer类模型,如VIT;文本Encoder则一般使用BERT类模型进行特征抽取,也包括RoBERTa等。CLIP在大规模图文数据集上进行了对比学习训练,在多个数据集上的准确度表明,CLIP优于各种基于ImageNet的模型,也具有良好的零样本学习(Zero-shot Learning)能力。
运行环境要求
建议用户使用:Python 3.6,Pytorch 1.8镜像,GPU机型 P100 or V100,内存至少为 32G
EasyNLP安装
建议从GitHub下载EasyNLP源代码进行安装,命令如下:
! git clone https://github.com/alibaba/EasyNLP.git ! pip install -r EasyNLP/requirements.txt ! cd EasyNLP && python setup.py install
您可以使用如下命令验证是否安装成功:
! which easynlp
/home/pai/bin/easynlp
如果您系统内已经安装完easynlp的CLI工具,则说明EasyNLP代码库已经安装。
数据准备
首先,您需要进入指定目录,下载用于本示例的训练数据与验证数据,以及用于提取向量进行向量检索的单列测试数据。命令如下:
! cd examples/appzoo_tutorials/text_vision ! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/CLIP/MUGE_MR_train_base64_part.tsv ! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/CLIP/MUGE_MR_valid_base64_part.tsv ! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/CLIP/MUGE_MR_test_base64_part_text.tsv ! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/CLIP/MUGE_MR_test_base64_part_image.tsv
/bin/bash: line 0: cd: examples/appzoo_tutorials/text_vision: No such file or directory --2022-07-20 11:40:29-- https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/CLIP/MUGE_MR_train_base64_part.tsv Resolving atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27 Connecting to atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 7466122 (7.1M) [text/tab-separated-values] Saving to: ‘MUGE_MR_train_base64_part.tsv’ MUGE_MR_train_base6 100%[===================>] 7.12M 14.1MB/s in 0.5s 2022-07-20 11:40:30 (14.1 MB/s) - ‘MUGE_MR_train_base64_part.tsv’ saved [7466122/7466122] --2022-07-20 11:40:31-- https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/CLIP/MUGE_MR_valid_base64_part.tsv Resolving atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27 Connecting to atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 3783806 (3.6M) [text/tab-separated-values] Saving to: ‘MUGE_MR_valid_base64_part.tsv’ MUGE_MR_valid_base6 100%[===================>] 3.61M 13.8MB/s in 0.3s 2022-07-20 11:40:31 (13.8 MB/s) - ‘MUGE_MR_valid_base64_part.tsv’ saved [3783806/3783806] --2022-07-20 11:40:31-- https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/CLIP/MUGE_MR_test_base64_part_text.tsv Resolving atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27 Connecting to atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 168 [text/tab-separated-values] Saving to: ‘MUGE_MR_test_base64_part_text.tsv’ MUGE_MR_test_base64 100%[===================>] 168 --.-KB/s in 0s 2022-07-20 11:40:32 (125 MB/s) - ‘MUGE_MR_test_base64_part_text.tsv’ saved [168/168] --2022-07-20 11:40:32-- https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/CLIP/MUGE_MR_test_base64_part_image.tsv Resolving atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27 Connecting to atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 104557 (102K) [text/tab-separated-values] Saving to: ‘MUGE_MR_test_base64_part_image.tsv’ MUGE_MR_test_base64 100%[===================>] 102.11K --.-KB/s in 0.08s 2022-07-20 11:40:32 (1.20 MB/s) - ‘MUGE_MR_test_base64_part_image.tsv’ saved [104557/104557]
训练数据和验证数据都为.tsv文件。每行为一个数据,以制表符\t分隔为两列,第一列为文本,第二列为图片的base64编码。用于提取向量进行向量检索的测试数据为单列,仅包含文本或图片的base64编码。
初始化
在Python 3.6环境下,我们首先从刚刚安装好的EasyNLP中引入模型运行需要的各种库,并做一些初始化。在本教程中,我们使用的CLIP模型为clip_chinese_roberta_large_with_vit_large,其图像Encoder采用VIT,文本Encoder采用中文RoBERTa。EasyNLP中集成了丰富的预训练模型库,如果想尝试其他预训练模型,或其他组合的CLIP模型,也可以在user_defined_parameters中进行相应修改,具体的模型名称可见模型列表。
# 为了避免EasyNLP中的args与Jupyter系统的冲突,需要手动设置,否则无法进行初始化。 # 在命令行或py文件中运行文中代码则可忽略下述代码。 import sys sys.argv = ['main.py']
import torch.cuda from easynlp.appzoo import MultiModalDataset from easynlp.appzoo import get_application_predictor, get_application_model, get_application_evaluator, get_application_model_for_evaluation from easynlp.core import Trainer, PredictorManager from easynlp.utils import initialize_easynlp, get_args, get_pretrain_model_path from easynlp.utils.global_vars import parse_user_defined_parameters initialize_easynlp() args = get_args() user_defined_parameters = parse_user_defined_parameters('pretrain_model_name_or_path=clip_chinese_roberta_large_with_vit_large fix_vision=True mode=finetune') args.checkpoint_dir = "./clip_model/" args.pretrained_model_name_or_path = "clip_chinese_roberta_large_with_vit_large"
/home/pai/lib/python3.6/site-packages/OpenSSL/crypto.py:12: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release. from cryptography import x509 Please ignore the following import error if you are using tunnel table io. No module named '_common_io'
No module named 'easy_predict' ------------------------ arguments ------------------------ app_name ........................................ text_classify append_cols ..................................... None buckets ......................................... None checkpoint_dir .................................. None chief_hosts ..................................... data_threads .................................... 10 distributed_backend ............................. nccl do_lower_case ................................... False epoch_num ....................................... 3.0 export_tf_checkpoint_type ....................... easytransfer first_sequence .................................. None gradient_accumulation_steps ..................... 1 input_schema .................................... None is_chief ........................................ is_master_node .................................. True job_name ........................................ None label_enumerate_values .......................... None label_name ...................................... None learning_rate ................................... 5e-05 local_rank ...................................... None logging_steps ................................... 100 master_port ..................................... 23456 max_grad_norm ................................... 1.0 micro_batch_size ................................ 2 mode ............................................ train modelzoo_base_dir ............................... n_cpu ........................................... 1 n_gpu ........................................... 1 odps_config ..................................... None optimizer_type .................................. AdamW output_schema ................................... outputs ......................................... None predict_queue_size .............................. 1024 predict_slice_size .............................. 4096 predict_table_read_thread_num ................... 16 predict_thread_num .............................. 2 ps_hosts ........................................ random_seed ..................................... 1234 rank ............................................ 0 read_odps ....................................... False restore_works_dir ............................... ./.easynlp_predict_restore_works_dir resume_from_checkpoint .......................... None save_all_checkpoints ............................ False save_checkpoint_steps ........................... None second_sequence ................................. None sequence_length ................................. 16 skip_first_line ................................. False tables .......................................... None task_count ...................................... 1 task_index ...................................... 0 use_amp ......................................... False use_torchacc .................................... False user_defined_parameters ......................... None user_entry_file ................................. None user_script ..................................... None warmup_proportion ............................... 0.1 weight_decay .................................... 0.0001 worker_count .................................... 1 worker_cpu ...................................... -1 worker_gpu ...................................... -1 worker_hosts .................................... None world_size ...................................... 1 -------------------- end of arguments --------------------- > initializing torch distributed ... Init dist done. World size: 1, rank 0, l_rank 0 > setting random seeds to 1234 ...
注意:上述代码如果出现“Address already in use”错误,则需要运行以下代码清理端口上正在执行的程序。
netstat -tunlp|grep 6000
kill -9 PID (需要替换成上一行代码执行结果中对应的程序ID)
载入数据
我们使用EasyNLP中自带的MultiModalDataset,对训练和测试数据进行载入。主要参数如下:
- pretrained_model_name_or_path:预训练模型名称路径,这里我们使用封装好的get_pretrain_model_path函数,来处理模型名称"clip_chinese_roberta_large_with_vit_large"以得到其路径,并自动下载模型
- max_seq_length:文本最大长度,超过将截断,不足将padding
- input_schema:输入tsv数据的格式,逗号分隔的每一项对应数据文件中每行以\t分隔的一项,每项开头为其字段标识,如label、sent1等
- first_sequence、second_sequence:用于说明input_schema中哪些字段作为第一/第二列输入数据
- is_training:是否为训练过程,train_dataset为True,valid_dataset为False
train_dataset = MultiModalDataset( pretrained_model_name_or_path=get_pretrain_model_path("clip_chinese_roberta_large_with_vit_large"), data_file="MUGE_MR_train_base64_part.tsv", max_seq_length=32, input_schema="text:str:1,image:str:1", first_sequence="text", second_sequence="image", is_training=True) valid_dataset = MultiModalDataset( pretrained_model_name_or_path=get_pretrain_model_path("clip_chinese_roberta_large_with_vit_large"), data_file="MUGE_MR_valid_base64_part.tsv", max_seq_length=32, input_schema="text:str:1,image:str:1", first_sequence="text", second_sequence="image", is_training=False)
`/root/.easynlp/modelzoo/alibaba-pai/clip_chinese_roberta_large_with_vit_large.tgz` already exists `/root/.easynlp/modelzoo/alibaba-pai/clip_chinese_roberta_large_with_vit_large.tgz` already exists
/root/.local/lib/python3.6/site-packages/pai_easynlp-0.0.6-py3.6.egg/easynlp/modelzoo/tokenization_utils_base.py:1632: FutureWarning: Calling BertTokenizer.from_pretrained() with the path to a single file or url is deprecated and won't be possible anymore in v5. Use a model identifier or the path to a directory instead. FutureWarning,
由于之前我们选用了clip_chinese_roberta_large_with_vit_large,因此这里也会对预训练模型进行自动下载并载入。
模型训练
处理好数据与模型载入后,我们开始训练模型。 我们使用EasyNLP中封装好的get_application_model函数进行训练时的模型构建,其参数如下:
- app_name:任务名称,这里选择文本分类"clip"
- pretrained_model_name_or_path:预训练模型名称路径,这里我们使用封装好的get_pretrain_model_path函数,来处理模型名称"clip_chinese_roberta_large_with_vit_large"以得到其路径,并自动下载模型
- user_defined_parameters:用户自定义参数,直接填入刚刚处理好的自定义参数user_defined_parameters
model = get_application_model(app_name="clip", pretrained_model_name_or_path=get_pretrain_model_path("clip_chinese_roberta_large_with_vit_large"), user_defined_parameters=user_defined_parameters)
`/root/.easynlp/modelzoo/alibaba-pai/clip_chinese_roberta_large_with_vit_large.tgz` already exists
Loaded weights of the model: [embeddings.position_ids,embeddings.word_embeddings.weight,embeddings.position_embeddings.weight,embeddings.token_type_embeddings.weight,embeddings.LayerNorm.weight,embeddings.LayerNorm.bias,encoder.layer.0.attention.self.query.weight,encoder.layer.0.attention.self.query.bias,encoder.layer.0.attention.self.key.weight,encoder.layer.0.attention.self.key.bias,encoder.layer.0.attention.self.value.weight,encoder.layer.0.attention.self.value.bias,encoder.layer.0.attention.output.dense.weight,encoder.layer.0.attention.output.dense.bias,encoder.layer.0.attention.output.LayerNorm.weight,encoder.layer.0.attention.output.LayerNorm.bias,encoder.layer.0.intermediate.dense.weight,encoder.layer.0.intermediate.dense.bias,encoder.layer.0.output.dense.weight,encoder.layer.0.output.dense.bias,encoder.layer.0.output.LayerNorm.weight,encoder.layer.0.output.LayerNorm.bias,encoder.layer.1.attention.self.query.weight,encoder.layer.1.attention.self.query.bias,encoder.layer.1.attention.self.key.weight,encoder.layer.1.attention.self.key.bias,encoder.layer.1.attention.self.value.weight,encoder.layer.1.attention.self.value.bias,encoder.layer.1.attention.output.dense.weight,encoder.layer.1.attention.output.dense.bias,encoder.layer.1.attention.output.LayerNorm.weight,encoder.layer.1.attention.output.LayerNorm.bias,encoder.layer.1.intermediate.dense.weight,encoder.layer.1.intermediate.dense.bias,encoder.layer.1.output.dense.weight,encoder.layer.1.output.dense.bias,encoder.layer.1.output.LayerNorm.weight,encoder.layer.1.output.LayerNorm.bias,encoder.layer.2.attention.self.query.weight,encoder.layer.2.attention.self.query.bias,encoder.layer.2.attention.self.key.weight,encoder.layer.2.attention.self.key.bias,encoder.layer.2.attention.self.value.weight,encoder.layer.2.attention.self.value.bias,encoder.layer.2.attention.output.dense.weight,encoder.layer.2.attention.output.dense.bias,encoder.layer.2.attention.output.LayerNorm.weight,encoder.layer.2.attention.output.LayerNorm.bias,encoder.layer.2.intermediate.dense.weight,encoder.layer.2.intermediate.dense.bias,encoder.layer.2.output.dense.weight,encoder.layer.2.output.dense.bias,encoder.layer.2.output.LayerNorm.weight,encoder.layer.2.output.LayerNorm.bias,encoder.layer.3.attention.self.query.weight,encoder.layer.3.attention.self.query.bias,encoder.layer.3.attention.self.key.weight,encoder.layer.3.attention.self.key.bias,encoder.layer.3.attention.self.value.weight,encoder.layer.3.attention.self.value.bias,encoder.layer.3.attention.output.dense.weight,encoder.layer.3.attention.output.dense.bias,encoder.layer.3.attention.output.LayerNorm.weight,encoder.layer.3.attention.output.LayerNorm.bias,encoder.layer.3.intermediate.dense.weight,encoder.layer.3.intermediate.dense.bias,encoder.layer.3.output.dense.weight,encoder.layer.3.output.dense.bias,encoder.layer.3.output.LayerNorm.weight,encoder.layer.3.output.LayerNorm.bias,encoder.layer.4.attention.self.query.weight,encoder.layer.4.attention.self.query.bias,encoder.layer.4.attention.self.key.weight,encoder.layer.4.attention.self.key.bias,encoder.layer.4.attention.self.value.weight,encoder.layer.4.attention.self.value.bias,encoder.layer.4.attention.output.dense.weight,encoder.layer.4.attention.output.dense.bias,encoder.layer.4.attention.output.LayerNorm.weight,encoder.layer.4.attention.output.LayerNorm.bias,encoder.layer.4.intermediate.dense.weight,encoder.layer.4.intermediate.dense.bias,encoder.layer.4.output.dense.weight,encoder.layer.4.output.dense.bias,encoder.layer.4.output.LayerNorm.weight,encoder.layer.4.output.LayerNorm.bias,encoder.layer.5.attention.self.query.weight,encoder.layer.5.attention.self.query.bias,encoder.layer.5.attention.self.key.weight,encoder.layer.5.attention.self.key.bias,encoder.layer.5.attention.self.value.weight,encoder.layer.5.attention.self.value.bias,encoder.layer.5.attention.output.dense.weight,encoder.layer.5.attention.output.dense.bias,encoder.layer.5.attention.output.LayerNorm.weight,encoder.layer.5.attention.output.LayerNorm.bias,encoder.layer.5.intermediate.dense.weight,encoder.layer.5.intermediate.dense.bias,encoder.layer.5.output.dense.weight,encoder.layer.5.output.dense.bias,encoder.layer.5.output.LayerNorm.weight,encoder.layer.5.output.LayerNorm.bias,encoder.layer.6.attention.self.query.weight,encoder.layer.6.attention.self.query.bias,encoder.layer.6.attention.self.key.weight,encoder.layer.6.attention.self.key.bias,encoder.layer.6.attention.self.value.weight,encoder.layer.6.attention.self.value.bias,encoder.layer.6.attention.output.dense.weight,encoder.layer.6.attention.output.dense.bias,encoder.layer.6.attention.output.LayerNorm.weight,encoder.layer.6.attention.output.LayerNorm.bias,encoder.layer.6.intermediate.dense.weight,encoder.layer.6.intermediate.dense.bias,encoder.layer.6.output.dense.weight,encoder.layer.6.output.dense.bias,encoder.layer.6.output.LayerNorm.weight,encoder.layer.6.output.LayerNorm.bias,encoder.layer.7.attention.self.query.weight,encoder.layer.7.attention.self.query.bias,encoder.layer.7.attention.self.key.weight,encoder.layer.7.attention.self.key.bias,encoder.layer.7.attention.self.value.weight,encoder.layer.7.attention.self.value.bias,encoder.layer.7.attention.output.dense.weight,encoder.layer.7.attention.output.dense.bias,encoder.layer.7.attention.output.LayerNorm.weight,encoder.layer.7.attention.output.LayerNorm.bias,encoder.layer.7.intermediate.dense.weight,encoder.layer.7.intermediate.dense.bias,encoder.layer.7.output.dense.weight,encoder.layer.7.output.dense.bias,encoder.layer.7.output.LayerNorm.weight,encoder.layer.7.output.LayerNorm.bias,encoder.layer.8.attention.self.query.weight,encoder.layer.8.attention.self.query.bias,encoder.layer.8.attention.self.key.weight,encoder.layer.8.attention.self.key.bias,encoder.layer.8.attention.self.value.weight,encoder.layer.8.attention.self.value.bias,encoder.layer.8.attention.output.dense.weight,encoder.layer.8.attention.output.dense.bias,encoder.layer.8.attention.output.LayerNorm.weight,encoder.layer.8.attention.output.LayerNorm.bias,encoder.layer.8.intermediate.dense.weight,encoder.layer.8.intermediate.dense.bias,encoder.layer.8.output.dense.weight,encoder.layer.8.output.dense.bias,encoder.layer.8.output.LayerNorm.weight,encoder.layer.8.output.LayerNorm.bias,encoder.layer.9.attention.self.query.weight,encoder.layer.9.attention.self.query.bias,encoder.layer.9.attention.self.key.weight,encoder.layer.9.attention.self.key.bias,encoder.layer.9.attention.self.value.weight,encoder.layer.9.attention.self.value.bias,encoder.layer.9.attention.output.dense.weight,encoder.layer.9.attention.output.dense.bias,encoder.layer.9.attention.output.LayerNorm.weight,encoder.layer.9.attention.output.LayerNorm.bias,encoder.layer.9.intermediate.dense.weight,encoder.layer.9.intermediate.dense.bias,encoder.layer.9.output.dense.weight,encoder.layer.9.output.dense.bias,encoder.layer.9.output.LayerNorm.weight,encoder.layer.9.output.LayerNorm.bias,encoder.layer.10.attention.self.query.weight,encoder.layer.10.attention.self.query.bias,encoder.layer.10.attention.self.key.weight,encoder.layer.10.attention.self.key.bias,encoder.layer.10.attention.self.value.weight,encoder.layer.10.attention.self.value.bias,encoder.layer.10.attention.output.dense.weight,encoder.layer.10.attention.output.dense.bias,encoder.layer.10.attention.output.LayerNorm.weight,encoder.layer.10.attention.output.LayerNorm.bias,encoder.layer.10.intermediate.dense.weight,encoder.layer.10.intermediate.dense.bias,encoder.layer.10.output.dense.weight,encoder.layer.10.output.dense.bias,encoder.layer.10.output.LayerNorm.weight,encoder.layer.10.output.LayerNorm.bias,encoder.layer.11.attention.self.query.weight,encoder.layer.11.attention.self.query.bias,encoder.layer.11.attention.self.key.weight,encoder.layer.11.attention.self.key.bias,encoder.layer.11.attention.self.value.weight,encoder.layer.11.attention.self.value.bias,encoder.layer.11.attention.output.dense.weight,encoder.layer.11.attention.output.dense.bias,encoder.layer.11.attention.output.LayerNorm.weight,encoder.layer.11.attention.output.LayerNorm.bias,encoder.layer.11.intermediate.dense.weight,encoder.layer.11.intermediate.dense.bias,encoder.layer.11.output.dense.weight,encoder.layer.11.output.dense.bias,encoder.layer.11.output.LayerNorm.weight,encoder.layer.11.output.LayerNorm.bias,encoder.layer.12.attention.self.query.weight,encoder.layer.12.attention.self.query.bias,encoder.layer.12.attention.self.key.weight,encoder.layer.12.attention.self.key.bias,encoder.layer.12.attention.self.value.weight,encoder.layer.12.attention.self.value.bias,encoder.layer.12.attention.output.dense.weight,encoder.layer.12.attention.output.dense.bias,encoder.layer.12.attention.output.LayerNorm.weight,encoder.layer.12.attention.output.LayerNorm.bias,encoder.layer.12.intermediate.dense.weight,encoder.layer.12.intermediate.dense.bias,encoder.layer.12.output.dense.weight,encoder.layer.12.output.dense.bias,encoder.layer.12.output.LayerNorm.weight,encoder.layer.12.output.LayerNorm.bias,encoder.layer.13.attention.self.query.weight,encoder.layer.13.attention.self.query.bias,encoder.layer.13.attention.self.key.weight,encoder.layer.13.attention.self.key.bias,encoder.layer.13.attention.self.value.weight,encoder.layer.13.attention.self.value.bias,encoder.layer.13.attention.output.dense.weight,encoder.layer.13.attention.output.dense.bias,encoder.layer.13.attention.output.LayerNorm.weight,encoder.layer.13.attention.output.LayerNorm.bias,encoder.layer.13.intermediate.dense.weight,encoder.layer.13.intermediate.dense.bias,encoder.layer.13.output.dense.weight,encoder.layer.13.output.dense.bias,encoder.layer.13.output.LayerNorm.weight,encoder.layer.13.output.LayerNorm.bias,encoder.layer.14.attention.self.query.weight,encoder.layer.14.attention.self.query.bias,encoder.layer.14.attention.self.key.weight,encoder.layer.14.attention.self.key.bias,encoder.layer.14.attention.self.value.weight,encoder.layer.14.attention.self.value.bias,encoder.layer.14.attention.output.dense.weight,encoder.layer.14.attention.output.dense.bias,encoder.layer.14.attention.output.LayerNorm.weight,encoder.layer.14.attention.output.LayerNorm.bias,encoder.layer.14.intermediate.dense.weight,encoder.layer.14.intermediate.dense.bias,encoder.layer.14.output.dense.weight,encoder.layer.14.output.dense.bias,encoder.layer.14.output.LayerNorm.weight,encoder.layer.14.output.LayerNorm.bias,encoder.layer.15.attention.self.query.weight,encoder.layer.15.attention.self.query.bias,encoder.layer.15.attention.self.key.weight,encoder.layer.15.attention.self.key.bias,encoder.layer.15.attention.self.value.weight,encoder.layer.15.attention.self.value.bias,encoder.layer.15.attention.output.dense.weight,encoder.layer.15.attention.output.dense.bias,encoder.layer.15.attention.output.LayerNorm.weight,encoder.layer.15.attention.output.LayerNorm.bias,encoder.layer.15.intermediate.dense.weight,encoder.layer.15.intermediate.dense.bias,encoder.layer.15.output.dense.weight,encoder.layer.15.output.dense.bias,encoder.layer.15.output.LayerNorm.weight,encoder.layer.15.output.LayerNorm.bias,encoder.layer.16.attention.self.query.weight,encoder.layer.16.attention.self.query.bias,encoder.layer.16.attention.self.key.weight,encoder.layer.16.attention.self.key.bias,encoder.layer.16.attention.self.value.weight,encoder.layer.16.attention.self.value.bias,encoder.layer.16.attention.output.dense.weight,encoder.layer.16.attention.output.dense.bias,encoder.layer.16.attention.output.LayerNorm.weight,encoder.layer.16.attention.output.LayerNorm.bias,encoder.layer.16.intermediate.dense.weight,encoder.layer.16.intermediate.dense.bias,encoder.layer.16.output.dense.weight,encoder.layer.16.output.dense.bias,encoder.layer.16.output.LayerNorm.weight,encoder.layer.16.output.LayerNorm.bias,encoder.layer.17.attention.self.query.weight,encoder.layer.17.attention.self.query.bias,encoder.layer.17.attention.self.key.weight,encoder.layer.17.attention.self.key.bias,encoder.layer.17.attention.self.value.weight,encoder.layer.17.attention.self.value.bias,encoder.layer.17.attention.output.dense.weight,encoder.layer.17.attention.output.dense.bias,encoder.layer.17.attention.output.LayerNorm.weight,encoder.layer.17.attention.output.LayerNorm.bias,encoder.layer.17.intermediate.dense.weight,encoder.layer.17.intermediate.dense.bias,encoder.layer.17.output.dense.weight,encoder.layer.17.output.dense.bias,encoder.layer.17.output.LayerNorm.weight,encoder.layer.17.output.LayerNorm.bias,encoder.layer.18.attention.self.query.weight,encoder.layer.18.attention.self.query.bias,encoder.layer.18.attention.self.key.weight,encoder.layer.18.attention.self.key.bias,encoder.layer.18.attention.self.value.weight,encoder.layer.18.attention.self.value.bias,encoder.layer.18.attention.output.dense.weight,encoder.layer.18.attention.output.dense.bias,encoder.layer.18.attention.output.LayerNorm.weight,encoder.layer.18.attention.output.LayerNorm.bias,encoder.layer.18.intermediate.dense.weight,encoder.layer.18.intermediate.dense.bias,encoder.layer.18.output.dense.weight,encoder.layer.18.output.dense.bias,encoder.layer.18.output.LayerNorm.weight,encoder.layer.18.output.LayerNorm.bias,encoder.layer.19.attention.self.query.weight,encoder.layer.19.attention.self.query.bias,encoder.layer.19.attention.self.key.weight,encoder.layer.19.attention.self.key.bias,encoder.layer.19.attention.self.value.weight,encoder.layer.19.attention.self.value.bias,encoder.layer.19.attention.output.dense.weight,encoder.layer.19.attention.output.dense.bias,encoder.layer.19.attention.output.LayerNorm.weight,encoder.layer.19.attention.output.LayerNorm.bias,encoder.layer.19.intermediate.dense.weight,encoder.layer.19.intermediate.dense.bias,encoder.layer.19.output.dense.weight,encoder.layer.19.output.dense.bias,encoder.layer.19.output.LayerNorm.weight,encoder.layer.19.output.LayerNorm.bias,encoder.layer.20.attention.self.query.weight,encoder.layer.20.attention.self.query.bias,encoder.layer.20.attention.self.key.weight,encoder.layer.20.attention.self.key.bias,encoder.layer.20.attention.self.value.weight,encoder.layer.20.attention.self.value.bias,encoder.layer.20.attention.output.dense.weight,encoder.layer.20.attention.output.dense.bias,encoder.layer.20.attention.output.LayerNorm.weight,encoder.layer.20.attention.output.LayerNorm.bias,encoder.layer.20.intermediate.dense.weight,encoder.layer.20.intermediate.dense.bias,encoder.layer.20.output.dense.weight,encoder.layer.20.output.dense.bias,encoder.layer.20.output.LayerNorm.weight,encoder.layer.20.output.LayerNorm.bias,encoder.layer.21.attention.self.query.weight,encoder.layer.21.attention.self.query.bias,encoder.layer.21.attention.self.key.weight,encoder.layer.21.attention.self.key.bias,encoder.layer.21.attention.self.value.weight,encoder.layer.21.attention.self.value.bias,encoder.layer.21.attention.output.dense.weight,encoder.layer.21.attention.output.dense.bias,encoder.layer.21.attention.output.LayerNorm.weight,encoder.layer.21.attention.output.LayerNorm.bias,encoder.layer.21.intermediate.dense.weight,encoder.layer.21.intermediate.dense.bias,encoder.layer.21.output.dense.weight,encoder.layer.21.output.dense.bias,encoder.layer.21.output.LayerNorm.weight,encoder.layer.21.output.LayerNorm.bias,encoder.layer.22.attention.self.query.weight,encoder.layer.22.attention.self.query.bias,encoder.layer.22.attention.self.key.weight,encoder.layer.22.attention.self.key.bias,encoder.layer.22.attention.self.value.weight,encoder.layer.22.attention.self.value.bias,encoder.layer.22.attention.output.dense.weight,encoder.layer.22.attention.output.dense.bias,encoder.layer.22.attention.output.LayerNorm.weight,encoder.layer.22.attention.output.LayerNorm.bias,encoder.layer.22.intermediate.dense.weight,encoder.layer.22.intermediate.dense.bias,encoder.layer.22.output.dense.weight,encoder.layer.22.output.dense.bias,encoder.layer.22.output.LayerNorm.weight,encoder.layer.22.output.LayerNorm.bias,encoder.layer.23.attention.self.query.weight,encoder.layer.23.attention.self.query.bias,encoder.layer.23.attention.self.key.weight,encoder.layer.23.attention.self.key.bias,encoder.layer.23.attention.self.value.weight,encoder.layer.23.attention.self.value.bias,encoder.layer.23.attention.output.dense.weight,encoder.layer.23.attention.output.dense.bias,encoder.layer.23.attention.output.LayerNorm.weight,encoder.layer.23.attention.output.LayerNorm.bias,encoder.layer.23.intermediate.dense.weight,encoder.layer.23.intermediate.dense.bias,encoder.layer.23.output.dense.weight,encoder.layer.23.output.dense.bias,encoder.layer.23.output.LayerNorm.weight,encoder.layer.23.output.LayerNorm.bias,pooler.dense.weight,pooler.dense.bias]. All weights are initialized. Loaded weights of the model: [vision_model.embeddings.class_embedding,vision_model.embeddings.position_ids,vision_model.embeddings.patch_embedding.weight,vision_model.embeddings.position_embedding.weight,vision_model.pre_layrnorm.weight,vision_model.pre_layrnorm.bias,vision_model.encoder.layers.0.self_attn.k_proj.weight,vision_model.encoder.layers.0.self_attn.k_proj.bias,vision_model.encoder.layers.0.self_attn.v_proj.weight,vision_model.encoder.layers.0.self_attn.v_proj.bias,vision_model.encoder.layers.0.self_attn.q_proj.weight,vision_model.encoder.layers.0.self_attn.q_proj.bias,vision_model.encoder.layers.0.self_attn.out_proj.weight,vision_model.encoder.layers.0.self_attn.out_proj.bias,vision_model.encoder.layers.0.layer_norm1.weight,vision_model.encoder.layers.0.layer_norm1.bias,vision_model.encoder.layers.0.mlp.fc1.weight,vision_model.encoder.layers.0.mlp.fc1.bias,vision_model.encoder.layers.0.mlp.fc2.weight,vision_model.encoder.layers.0.mlp.fc2.bias,vision_model.encoder.layers.0.layer_norm2.weight,vision_model.encoder.layers.0.layer_norm2.bias,vision_model.encoder.layers.1.self_attn.k_proj.weight,vision_model.encoder.layers.1.self_attn.k_proj.bias,vision_model.encoder.layers.1.self_attn.v_proj.weight,vision_model.encoder.layers.1.self_attn.v_proj.bias,vision_model.encoder.layers.1.self_attn.q_proj.weight,vision_model.encoder.layers.1.self_attn.q_proj.bias,vision_model.encoder.layers.1.self_attn.out_proj.weight,vision_model.encoder.layers.1.self_attn.out_proj.bias,vision_model.encoder.layers.1.layer_norm1.weight,vision_model.encoder.layers.1.layer_norm1.bias,vision_model.encoder.layers.1.mlp.fc1.weight,vision_model.encoder.layers.1.mlp.fc1.bias,vision_model.encoder.layers.1.mlp.fc2.weight,vision_model.encoder.layers.1.mlp.fc2.bias,vision_model.encoder.layers.1.layer_norm2.weight,vision_model.encoder.layers.1.layer_norm2.bias,vision_model.encoder.layers.2.self_attn.k_proj.weight,vision_model.encoder.layers.2.self_attn.k_proj.bias,vision_model.encoder.layers.2.self_attn.v_proj.weight,vision_model.encoder.layers.2.self_attn.v_proj.bias,vision_model.encoder.layers.2.self_attn.q_proj.weight,vision_model.encoder.layers.2.self_attn.q_proj.bias,vision_model.encoder.layers.2.self_attn.out_proj.weight,vision_model.encoder.layers.2.self_attn.out_proj.bias,vision_model.encoder.layers.2.layer_norm1.weight,vision_model.encoder.layers.2.layer_norm1.bias,vision_model.encoder.layers.2.mlp.fc1.weight,vision_model.encoder.layers.2.mlp.fc1.bias,vision_model.encoder.layers.2.mlp.fc2.weight,vision_model.encoder.layers.2.mlp.fc2.bias,vision_model.encoder.layers.2.layer_norm2.weight,vision_model.encoder.layers.2.layer_norm2.bias,vision_model.encoder.layers.3.self_attn.k_proj.weight,vision_model.encoder.layers.3.self_attn.k_proj.bias,vision_model.encoder.layers.3.self_attn.v_proj.weight,vision_model.encoder.layers.3.self_attn.v_proj.bias,vision_model.encoder.layers.3.self_attn.q_proj.weight,vision_model.encoder.layers.3.self_attn.q_proj.bias,vision_model.encoder.layers.3.self_attn.out_proj.weight,vision_model.encoder.layers.3.self_attn.out_proj.bias,vision_model.encoder.layers.3.layer_norm1.weight,vision_model.encoder.layers.3.layer_norm1.bias,vision_model.encoder.layers.3.mlp.fc1.weight,vision_model.encoder.layers.3.mlp.fc1.bias,vision_model.encoder.layers.3.mlp.fc2.weight,vision_model.encoder.layers.3.mlp.fc2.bias,vision_model.encoder.layers.3.layer_norm2.weight,vision_model.encoder.layers.3.layer_norm2.bias,vision_model.encoder.layers.4.self_attn.k_proj.weight,vision_model.encoder.layers.4.self_attn.k_proj.bias,vision_model.encoder.layers.4.self_attn.v_proj.weight,vision_model.encoder.layers.4.self_attn.v_proj.bias,vision_model.encoder.layers.4.self_attn.q_proj.weight,vision_model.encoder.layers.4.self_attn.q_proj.bias,vision_model.encoder.layers.4.self_attn.out_proj.weight,vision_model.encoder.layers.4.self_attn.out_proj.bias,vision_model.encoder.layers.4.layer_norm1.weight,vision_model.encoder.layers.4.layer_norm1.bias,vision_model.encoder.layers.4.mlp.fc1.weight,vision_model.encoder.layers.4.mlp.fc1.bias,vision_model.encoder.layers.4.mlp.fc2.weight,vision_model.encoder.layers.4.mlp.fc2.bias,vision_model.encoder.layers.4.layer_norm2.weight,vision_model.encoder.layers.4.layer_norm2.bias,vision_model.encoder.layers.5.self_attn.k_proj.weight,vision_model.encoder.layers.5.self_attn.k_proj.bias,vision_model.encoder.layers.5.self_attn.v_proj.weight,vision_model.encoder.layers.5.self_attn.v_proj.bias,vision_model.encoder.layers.5.self_attn.q_proj.weight,vision_model.encoder.layers.5.self_attn.q_proj.bias,vision_model.encoder.layers.5.self_attn.out_proj.weight,vision_model.encoder.layers.5.self_attn.out_proj.bias,vision_model.encoder.layers.5.layer_norm1.weight,vision_model.encoder.layers.5.layer_norm1.bias,vision_model.encoder.layers.5.mlp.fc1.weight,vision_model.encoder.layers.5.mlp.fc1.bias,vision_model.encoder.layers.5.mlp.fc2.weight,vision_model.encoder.layers.5.mlp.fc2.bias,vision_model.encoder.layers.5.layer_norm2.weight,vision_model.encoder.layers.5.layer_norm2.bias,vision_model.encoder.layers.6.self_attn.k_proj.weight,vision_model.encoder.layers.6.self_attn.k_proj.bias,vision_model.encoder.layers.6.self_attn.v_proj.weight,vision_model.encoder.layers.6.self_attn.v_proj.bias,vision_model.encoder.layers.6.self_attn.q_proj.weight,vision_model.encoder.layers.6.self_attn.q_proj.bias,vision_model.encoder.layers.6.self_attn.out_proj.weight,vision_model.encoder.layers.6.self_attn.out_proj.bias,vision_model.encoder.layers.6.layer_norm1.weight,vision_model.encoder.layers.6.layer_norm1.bias,vision_model.encoder.layers.6.mlp.fc1.weight,vision_model.encoder.layers.6.mlp.fc1.bias,vision_model.encoder.layers.6.mlp.fc2.weight,vision_model.encoder.layers.6.mlp.fc2.bias,vision_model.encoder.layers.6.layer_norm2.weight,vision_model.encoder.layers.6.layer_norm2.bias,vision_model.encoder.layers.7.self_attn.k_proj.weight,vision_model.encoder.layers.7.self_attn.k_proj.bias,vision_model.encoder.layers.7.self_attn.v_proj.weight,vision_model.encoder.layers.7.self_attn.v_proj.bias,vision_model.encoder.layers.7.self_attn.q_proj.weight,vision_model.encoder.layers.7.self_attn.q_proj.bias,vision_model.encoder.layers.7.self_attn.out_proj.weight,vision_model.encoder.layers.7.self_attn.out_proj.bias,vision_model.encoder.layers.7.layer_norm1.weight,vision_model.encoder.layers.7.layer_norm1.bias,vision_model.encoder.layers.7.mlp.fc1.weight,vision_model.encoder.layers.7.mlp.fc1.bias,vision_model.encoder.layers.7.mlp.fc2.weight,vision_model.encoder.layers.7.mlp.fc2.bias,vision_model.encoder.layers.7.layer_norm2.weight,vision_model.encoder.layers.7.layer_norm2.bias,vision_model.encoder.layers.8.self_attn.k_proj.weight,vision_model.encoder.layers.8.self_attn.k_proj.bias,vision_model.encoder.layers.8.self_attn.v_proj.weight,vision_model.encoder.layers.8.self_attn.v_proj.bias,vision_model.encoder.layers.8.self_attn.q_proj.weight,vision_model.encoder.layers.8.self_attn.q_proj.bias,vision_model.encoder.layers.8.self_attn.out_proj.weight,vision_model.encoder.layers.8.self_attn.out_proj.bias,vision_model.encoder.layers.8.layer_norm1.weight,vision_model.encoder.layers.8.layer_norm1.bias,vision_model.encoder.layers.8.mlp.fc1.weight,vision_model.encoder.layers.8.mlp.fc1.bias,vision_model.encoder.layers.8.mlp.fc2.weight,vision_model.encoder.layers.8.mlp.fc2.bias,vision_model.encoder.layers.8.layer_norm2.weight,vision_model.encoder.layers.8.layer_norm2.bias,vision_model.encoder.layers.9.self_attn.k_proj.weight,vision_model.encoder.layers.9.self_attn.k_proj.bias,vision_model.encoder.layers.9.self_attn.v_proj.weight,vision_model.encoder.layers.9.self_attn.v_proj.bias,vision_model.encoder.layers.9.self_attn.q_proj.weight,vision_model.encoder.layers.9.self_attn.q_proj.bias,vision_model.encoder.layers.9.self_attn.out_proj.weight,vision_model.encoder.layers.9.self_attn.out_proj.bias,vision_model.encoder.layers.9.layer_norm1.weight,vision_model.encoder.layers.9.layer_norm1.bias,vision_model.encoder.layers.9.mlp.fc1.weight,vision_model.encoder.layers.9.mlp.fc1.bias,vision_model.encoder.layers.9.mlp.fc2.weight,vision_model.encoder.layers.9.mlp.fc2.bias,vision_model.encoder.layers.9.layer_norm2.weight,vision_model.encoder.layers.9.layer_norm2.bias,vision_model.encoder.layers.10.self_attn.k_proj.weight,vision_model.encoder.layers.10.self_attn.k_proj.bias,vision_model.encoder.layers.10.self_attn.v_proj.weight,vision_model.encoder.layers.10.self_attn.v_proj.bias,vision_model.encoder.layers.10.self_attn.q_proj.weight,vision_model.encoder.layers.10.self_attn.q_proj.bias,vision_model.encoder.layers.10.self_attn.out_proj.weight,vision_model.encoder.layers.10.self_attn.out_proj.bias,vision_model.encoder.layers.10.layer_norm1.weight,vision_model.encoder.layers.10.layer_norm1.bias,vision_model.encoder.layers.10.mlp.fc1.weight,vision_model.encoder.layers.10.mlp.fc1.bias,vision_model.encoder.layers.10.mlp.fc2.weight,vision_model.encoder.layers.10.mlp.fc2.bias,vision_model.encoder.layers.10.layer_norm2.weight,vision_model.encoder.layers.10.layer_norm2.bias,vision_model.encoder.layers.11.self_attn.k_proj.weight,vision_model.encoder.layers.11.self_attn.k_proj.bias,vision_model.encoder.layers.11.self_attn.v_proj.weight,vision_model.encoder.layers.11.self_attn.v_proj.bias,vision_model.encoder.layers.11.self_attn.q_proj.weight,vision_model.encoder.layers.11.self_attn.q_proj.bias,vision_model.encoder.layers.11.self_attn.out_proj.weight,vision_model.encoder.layers.11.self_attn.out_proj.bias,vision_model.encoder.layers.11.layer_norm1.weight,vision_model.encoder.layers.11.layer_norm1.bias,vision_model.encoder.layers.11.mlp.fc1.weight,vision_model.encoder.layers.11.mlp.fc1.bias,vision_model.encoder.layers.11.mlp.fc2.weight,vision_model.encoder.layers.11.mlp.fc2.bias,vision_model.encoder.layers.11.layer_norm2.weight,vision_model.encoder.layers.11.layer_norm2.bias,vision_model.encoder.layers.12.self_attn.k_proj.weight,vision_model.encoder.layers.12.self_attn.k_proj.bias,vision_model.encoder.layers.12.self_attn.v_proj.weight,vision_model.encoder.layers.12.self_attn.v_proj.bias,vision_model.encoder.layers.12.self_attn.q_proj.weight,vision_model.encoder.layers.12.self_attn.q_proj.bias,vision_model.encoder.layers.12.self_attn.out_proj.weight,vision_model.encoder.layers.12.self_attn.out_proj.bias,vision_model.encoder.layers.12.layer_norm1.weight,vision_model.encoder.layers.12.layer_norm1.bias,vision_model.encoder.layers.12.mlp.fc1.weight,vision_model.encoder.layers.12.mlp.fc1.bias,vision_model.encoder.layers.12.mlp.fc2.weight,vision_model.encoder.layers.12.mlp.fc2.bias,vision_model.encoder.layers.12.layer_norm2.weight,vision_model.encoder.layers.12.layer_norm2.bias,vision_model.encoder.layers.13.self_attn.k_proj.weight,vision_model.encoder.layers.13.self_attn.k_proj.bias,vision_model.encoder.layers.13.self_attn.v_proj.weight,vision_model.encoder.layers.13.self_attn.v_proj.bias,vision_model.encoder.layers.13.self_attn.q_proj.weight,vision_model.encoder.layers.13.self_attn.q_proj.bias,vision_model.encoder.layers.13.self_attn.out_proj.weight,vision_model.encoder.layers.13.self_attn.out_proj.bias,vision_model.encoder.layers.13.layer_norm1.weight,vision_model.encoder.layers.13.layer_norm1.bias,vision_model.encoder.layers.13.mlp.fc1.weight,vision_model.encoder.layers.13.mlp.fc1.bias,vision_model.encoder.layers.13.mlp.fc2.weight,vision_model.encoder.layers.13.mlp.fc2.bias,vision_model.encoder.layers.13.layer_norm2.weight,vision_model.encoder.layers.13.layer_norm2.bias,vision_model.encoder.layers.14.self_attn.k_proj.weight,vision_model.encoder.layers.14.self_attn.k_proj.bias,vision_model.encoder.layers.14.self_attn.v_proj.weight,vision_model.encoder.layers.14.self_attn.v_proj.bias,vision_model.encoder.layers.14.self_attn.q_proj.weight,vision_model.encoder.layers.14.self_attn.q_proj.bias,vision_model.encoder.layers.14.self_attn.out_proj.weight,vision_model.encoder.layers.14.self_attn.out_proj.bias,vision_model.encoder.layers.14.layer_norm1.weight,vision_model.encoder.layers.14.layer_norm1.bias,vision_model.encoder.layers.14.mlp.fc1.weight,vision_model.encoder.layers.14.mlp.fc1.bias,vision_model.encoder.layers.14.mlp.fc2.weight,vision_model.encoder.layers.14.mlp.fc2.bias,vision_model.encoder.layers.14.layer_norm2.weight,vision_model.encoder.layers.14.layer_norm2.bias,vision_model.encoder.layers.15.self_attn.k_proj.weight,vision_model.encoder.layers.15.self_attn.k_proj.bias,vision_model.encoder.layers.15.self_attn.v_proj.weight,vision_model.encoder.layers.15.self_attn.v_proj.bias,vision_model.encoder.layers.15.self_attn.q_proj.weight,vision_model.encoder.layers.15.self_attn.q_proj.bias,vision_model.encoder.layers.15.self_attn.out_proj.weight,vision_model.encoder.layers.15.self_attn.out_proj.bias,vision_model.encoder.layers.15.layer_norm1.weight,vision_model.encoder.layers.15.layer_norm1.bias,vision_model.encoder.layers.15.mlp.fc1.weight,vision_model.encoder.layers.15.mlp.fc1.bias,vision_model.encoder.layers.15.mlp.fc2.weight,vision_model.encoder.layers.15.mlp.fc2.bias,vision_model.encoder.layers.15.layer_norm2.weight,vision_model.encoder.layers.15.layer_norm2.bias,vision_model.encoder.layers.16.self_attn.k_proj.weight,vision_model.encoder.layers.16.self_attn.k_proj.bias,vision_model.encoder.layers.16.self_attn.v_proj.weight,vision_model.encoder.layers.16.self_attn.v_proj.bias,vision_model.encoder.layers.16.self_attn.q_proj.weight,vision_model.encoder.layers.16.self_attn.q_proj.bias,vision_model.encoder.layers.16.self_attn.out_proj.weight,vision_model.encoder.layers.16.self_attn.out_proj.bias,vision_model.encoder.layers.16.layer_norm1.weight,vision_model.encoder.layers.16.layer_norm1.bias,vision_model.encoder.layers.16.mlp.fc1.weight,vision_model.encoder.layers.16.mlp.fc1.bias,vision_model.encoder.layers.16.mlp.fc2.weight,vision_model.encoder.layers.16.mlp.fc2.bias,vision_model.encoder.layers.16.layer_norm2.weight,vision_model.encoder.layers.16.layer_norm2.bias,vision_model.encoder.layers.17.self_attn.k_proj.weight,vision_model.encoder.layers.17.self_attn.k_proj.bias,vision_model.encoder.layers.17.self_attn.v_proj.weight,vision_model.encoder.layers.17.self_attn.v_proj.bias,vision_model.encoder.layers.17.self_attn.q_proj.weight,vision_model.encoder.layers.17.self_attn.q_proj.bias,vision_model.encoder.layers.17.self_attn.out_proj.weight,vision_model.encoder.layers.17.self_attn.out_proj.bias,vision_model.encoder.layers.17.layer_norm1.weight,vision_model.encoder.layers.17.layer_norm1.bias,vision_model.encoder.layers.17.mlp.fc1.weight,vision_model.encoder.layers.17.mlp.fc1.bias,vision_model.encoder.layers.17.mlp.fc2.weight,vision_model.encoder.layers.17.mlp.fc2.bias,vision_model.encoder.layers.17.layer_norm2.weight,vision_model.encoder.layers.17.layer_norm2.bias,vision_model.encoder.layers.18.self_attn.k_proj.weight,vision_model.encoder.layers.18.self_attn.k_proj.bias,vision_model.encoder.layers.18.self_attn.v_proj.weight,vision_model.encoder.layers.18.self_attn.v_proj.bias,vision_model.encoder.layers.18.self_attn.q_proj.weight,vision_model.encoder.layers.18.self_attn.q_proj.bias,vision_model.encoder.layers.18.self_attn.out_proj.weight,vision_model.encoder.layers.18.self_attn.out_proj.bias,vision_model.encoder.layers.18.layer_norm1.weight,vision_model.encoder.layers.18.layer_norm1.bias,vision_model.encoder.layers.18.mlp.fc1.weight,vision_model.encoder.layers.18.mlp.fc1.bias,vision_model.encoder.layers.18.mlp.fc2.weight,vision_model.encoder.layers.18.mlp.fc2.bias,vision_model.encoder.layers.18.layer_norm2.weight,vision_model.encoder.layers.18.layer_norm2.bias,vision_model.encoder.layers.19.self_attn.k_proj.weight,vision_model.encoder.layers.19.self_attn.k_proj.bias,vision_model.encoder.layers.19.self_attn.v_proj.weight,vision_model.encoder.layers.19.self_attn.v_proj.bias,vision_model.encoder.layers.19.self_attn.q_proj.weight,vision_model.encoder.layers.19.self_attn.q_proj.bias,vision_model.encoder.layers.19.self_attn.out_proj.weight,vision_model.encoder.layers.19.self_attn.out_proj.bias,vision_model.encoder.layers.19.layer_norm1.weight,vision_model.encoder.layers.19.layer_norm1.bias,vision_model.encoder.layers.19.mlp.fc1.weight,vision_model.encoder.layers.19.mlp.fc1.bias,vision_model.encoder.layers.19.mlp.fc2.weight,vision_model.encoder.layers.19.mlp.fc2.bias,vision_model.encoder.layers.19.layer_norm2.weight,vision_model.encoder.layers.19.layer_norm2.bias,vision_model.encoder.layers.20.self_attn.k_proj.weight,vision_model.encoder.layers.20.self_attn.k_proj.bias,vision_model.encoder.layers.20.self_attn.v_proj.weight,vision_model.encoder.layers.20.self_attn.v_proj.bias,vision_model.encoder.layers.20.self_attn.q_proj.weight,vision_model.encoder.layers.20.self_attn.q_proj.bias,vision_model.encoder.layers.20.self_attn.out_proj.weight,vision_model.encoder.layers.20.self_attn.out_proj.bias,vision_model.encoder.layers.20.layer_norm1.weight,vision_model.encoder.layers.20.layer_norm1.bias,vision_model.encoder.layers.20.mlp.fc1.weight,vision_model.encoder.layers.20.mlp.fc1.bias,vision_model.encoder.layers.20.mlp.fc2.weight,vision_model.encoder.layers.20.mlp.fc2.bias,vision_model.encoder.layers.20.layer_norm2.weight,vision_model.encoder.layers.20.layer_norm2.bias,vision_model.encoder.layers.21.self_attn.k_proj.weight,vision_model.encoder.layers.21.self_attn.k_proj.bias,vision_model.encoder.layers.21.self_attn.v_proj.weight,vision_model.encoder.layers.21.self_attn.v_proj.bias,vision_model.encoder.layers.21.self_attn.q_proj.weight,vision_model.encoder.layers.21.self_attn.q_proj.bias,vision_model.encoder.layers.21.self_attn.out_proj.weight,vision_model.encoder.layers.21.self_attn.out_proj.bias,vision_model.encoder.layers.21.layer_norm1.weight,vision_model.encoder.layers.21.layer_norm1.bias,vision_model.encoder.layers.21.mlp.fc1.weight,vision_model.encoder.layers.21.mlp.fc1.bias,vision_model.encoder.layers.21.mlp.fc2.weight,vision_model.encoder.layers.21.mlp.fc2.bias,vision_model.encoder.layers.21.layer_norm2.weight,vision_model.encoder.layers.21.layer_norm2.bias,vision_model.encoder.layers.22.self_attn.k_proj.weight,vision_model.encoder.layers.22.self_attn.k_proj.bias,vision_model.encoder.layers.22.self_attn.v_proj.weight,vision_model.encoder.layers.22.self_attn.v_proj.bias,vision_model.encoder.layers.22.self_attn.q_proj.weight,vision_model.encoder.layers.22.self_attn.q_proj.bias,vision_model.encoder.layers.22.self_attn.out_proj.weight,vision_model.encoder.layers.22.self_attn.out_proj.bias,vision_model.encoder.layers.22.layer_norm1.weight,vision_model.encoder.layers.22.layer_norm1.bias,vision_model.encoder.layers.22.mlp.fc1.weight,vision_model.encoder.layers.22.mlp.fc1.bias,vision_model.encoder.layers.22.mlp.fc2.weight,vision_model.encoder.layers.22.mlp.fc2.bias,vision_model.encoder.layers.22.layer_norm2.weight,vision_model.encoder.layers.22.layer_norm2.bias,vision_model.encoder.layers.23.self_attn.k_proj.weight,vision_model.encoder.layers.23.self_attn.k_proj.bias,vision_model.encoder.layers.23.self_attn.v_proj.weight,vision_model.encoder.layers.23.self_attn.v_proj.bias,vision_model.encoder.layers.23.self_attn.q_proj.weight,vision_model.encoder.layers.23.self_attn.q_proj.bias,vision_model.encoder.layers.23.self_attn.out_proj.weight,vision_model.encoder.layers.23.self_attn.out_proj.bias,vision_model.encoder.layers.23.layer_norm1.weight,vision_model.encoder.layers.23.layer_norm1.bias,vision_model.encoder.layers.23.mlp.fc1.weight,vision_model.encoder.layers.23.mlp.fc1.bias,vision_model.encoder.layers.23.mlp.fc2.weight,vision_model.encoder.layers.23.mlp.fc2.bias,vision_model.encoder.layers.23.layer_norm2.weight,vision_model.encoder.layers.23.layer_norm2.bias,vision_model.post_layernorm.weight,vision_model.post_layernorm.bias]. All weights are initialized.
从日志中可以看出,我们对预训练模型的参数进行了载入。下一步我们使用EasyNLP中的Train类创建训练实例,并进行训练。
trainer = Trainer(model=model, train_dataset=train_dataset, user_defined_parameters=user_defined_parameters, evaluator=get_application_evaluator(app_name="clip", valid_dataset=valid_dataset, user_defined_parameters=user_defined_parameters, eval_batch_size=32)) trainer.train()
/home/pai/lib/python3.6/site-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 10 worker processes in total. Our suggested max number of worker in current system is 8, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. cpuset_checked)) [2022-07-20 17:10:55,821 INFO] ========== Initializing Tensorboard ========== [2022-07-20 17:10:55,829 INFO] ========== Training Start ========== [2022-07-20 17:10:55,832 INFO] Num of GPUs (all) = 1 [2022-07-20 17:10:55,833 INFO] Num of CPUs per worker = 1 [2022-07-20 17:10:55,833 INFO] Num dataset examples = 80 [2022-07-20 17:10:55,833 INFO] Num training examples = 80 [2022-07-20 17:10:55,834 INFO] Num validation examples = 40 [2022-07-20 17:10:55,834 INFO] Train. batch size = 2 [2022-07-20 17:10:55,835 INFO] Train. micro batch size = 2 [2022-07-20 17:10:55,835 INFO] Train. batch no. = 120 [2022-07-20 17:10:55,837 INFO] Evaluation batch size = 2 [2022-07-20 17:10:55,837 INFO] Total training steps = 120 [2022-07-20 17:10:55,838 INFO] Sequence length = 16 [2022-07-20 17:10:55,839 INFO] Saving steps = None [2022-07-20 17:10:55,840 INFO] Distributed_backend = nccl [2022-07-20 17:10:55,840 INFO] Worker Count = 1 [2022-07-20 17:10:55,841 INFO] Worker CPU = -1 [2022-07-20 17:10:55,841 INFO] Worker data threads = 10 [2022-07-20 17:10:55,846 INFO] num model params = 630,275,073 [2022-07-20 17:10:55,847 INFO] num trainable params = 327,095,297 [2022-07-20 17:10:55,847 INFO] [2022-07-20 17:10:55,851 INFO] ========== Model Config ========== [2022-07-20 17:10:55,852 INFO] {"return_dict": true, "output_hidden_states": false, "output_attentions": false, "torchscript": false, "use_bfloat16": false, "pruned_heads": {}, "tie_word_embeddings": true, "is_encoder_decoder": false, "is_decoder": false, "add_cross_attention": false, "tie_encoder_decoder": false, "max_length": 20, "min_length": 0, "do_sample": false, "early_stopping": false, "num_beams": 1, "num_beam_groups": 1, "diversity_penalty": 0.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "repetition_penalty": 1.0, "length_penalty": 1.0, "no_repeat_ngram_size": 0, "encoder_no_repeat_ngram_size": 0, "bad_words_ids": null, "num_return_sequences": 1, "chunk_size_feed_forward": 0, "output_scores": false, "return_dict_in_generate": false, "forced_bos_token_id": null, "forced_eos_token_id": null, "remove_invalid_values": false, "architectures": null, "finetuning_task": null, "id2label": {"0": "LABEL_0", "1": "LABEL_1"}, "label2id": {"LABEL_0": 0, "LABEL_1": 1}, "tokenizer_class": null, "prefix": null, "bos_token_id": null, "pad_token_id": null, "eos_token_id": null, "sep_token_id": null, "decoder_start_token_id": null, "task_specific_params": null, "problem_type": null, "_name_or_path": "", "easynlp_version": null, "text_config_dict": {"return_dict": true, "output_hidden_states": false, "output_attentions": false, "torchscript": false, "use_bfloat16": false, "pruned_heads": {}, "tie_word_embeddings": true, "is_encoder_decoder": false, "is_decoder": false, "add_cross_attention": false, "tie_encoder_decoder": false, "max_length": 20, "min_length": 0, "do_sample": false, "early_stopping": false, "num_beams": 1, "num_beam_groups": 1, "diversity_penalty": 0.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "repetition_penalty": 1.0, "length_penalty": 1.0, "no_repeat_ngram_size": 0, "encoder_no_repeat_ngram_size": 0, "bad_words_ids": null, "num_return_sequences": 1, "chunk_size_feed_forward": 0, "output_scores": false, "return_dict_in_generate": false, "forced_bos_token_id": null, "forced_eos_token_id": null, "remove_invalid_values": false, "architectures": ["BertForMaskedLM"], "finetuning_task": null, "id2label": {"0": "LABEL_0", "1": "LABEL_1"}, "label2id": {"LABEL_0": 0, "LABEL_1": 1}, "tokenizer_class": null, "prefix": null, "bos_token_id": 0, "pad_token_id": 0, "eos_token_id": 2, "sep_token_id": null, "decoder_start_token_id": null, "task_specific_params": null, "problem_type": null, "_name_or_path": "", "easynlp_version": "0.0.3", "directionality": "bidi", "gradient_checkpointing": false, "model_type": "clip_text_model", "output_past": true, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "position_embedding_type": "absolute", "use_cache": true, "vocab_size": 21128, "hidden_size": 1024, "intermediate_size": 4096, "dropout": 0.0, "num_hidden_layers": 24, "num_attention_heads": 16, "max_position_embeddings": 512, "layer_norm_eps": 1e-12, "hidden_act": "gelu", "initializer_range": 0.02, "initializer_factor": 1.0, "attention_dropout": 0.0, "type_vocab_size": 2, "hidden_dropout_prob": 0.1, "attention_probs_dropout_prob": 0.1}, "vision_config_dict": {"return_dict": true, "output_hidden_states": false, "output_attentions": false, "torchscript": false, "use_bfloat16": false, "pruned_heads": {}, "tie_word_embeddings": true, "is_encoder_decoder": false, "is_decoder": false, "add_cross_attention": false, "tie_encoder_decoder": false, "max_length": 20, "min_length": 0, "do_sample": false, "early_stopping": false, "num_beams": 1, "num_beam_groups": 1, "diversity_penalty": 0.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "repetition_penalty": 1.0, "length_penalty": 1.0, "no_repeat_ngram_size": 0, "encoder_no_repeat_ngram_size": 0, "bad_words_ids": null, "num_return_sequences": 1, "chunk_size_feed_forward": 0, "output_scores": false, "return_dict_in_generate": false, "forced_bos_token_id": null, "forced_eos_token_id": null, "remove_invalid_values": false, "architectures": null, "finetuning_task": null, "id2label": {"0": "LABEL_0", "1": "LABEL_1"}, "label2id": {"LABEL_0": 0, "LABEL_1": 1}, "tokenizer_class": null, "prefix": null, "bos_token_id": null, "pad_token_id": null, "eos_token_id": null, "sep_token_id": null, "decoder_start_token_id": null, "task_specific_params": null, "problem_type": null, "_name_or_path": "", "easynlp_version": "0.0.3", "cross_attention_hidden_size": null, "model_type": "clip_vision_model", "torch_dtype": null, "transformers_version": "4.16.0.dev0", "hidden_size": 1024, "intermediate_size": 4096, "dropout": 0.0, "num_hidden_layers": 24, "num_attention_heads": 16, "patch_size": 14, "image_size": 224, "initializer_range": 0.02, "initializer_factor": 1.0, "attention_dropout": 0.0, "layer_norm_eps": 1e-05, "hidden_act": "quick_gelu"}, "text_config": {"return_dict": true, "output_hidden_states": false, "output_attentions": false, "torchscript": false, "use_bfloat16": false, "pruned_heads": {}, "tie_word_embeddings": true, "is_encoder_decoder": false, "is_decoder": false, "add_cross_attention": false, "tie_encoder_decoder": false, "max_length": 20, "min_length": 0, "do_sample": false, "early_stopping": false, "num_beams": 1, "num_beam_groups": 1, "diversity_penalty": 0.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "repetition_penalty": 1.0, "length_penalty": 1.0, "no_repeat_ngram_size": 0, "encoder_no_repeat_ngram_size": 0, "bad_words_ids": null, "num_return_sequences": 1, "chunk_size_feed_forward": 0, "output_scores": false, "return_dict_in_generate": false, "forced_bos_token_id": null, "forced_eos_token_id": null, "remove_invalid_values": false, "architectures": ["BertForMaskedLM"], "finetuning_task": null, "id2label": {"0": "LABEL_0", "1": "LABEL_1"}, "label2id": {"LABEL_0": 0, "LABEL_1": 1}, "tokenizer_class": null, "prefix": null, "bos_token_id": 0, "pad_token_id": 0, "eos_token_id": 2, "sep_token_id": null, "decoder_start_token_id": null, "task_specific_params": null, "problem_type": null, "_name_or_path": "", "easynlp_version": "0.0.3", "directionality": "bidi", "gradient_checkpointing": false, "model_type": "clip_text_model", "output_past": true, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "position_embedding_type": "absolute", "use_cache": true, "vocab_size": 21128, "hidden_size": 1024, "intermediate_size": 4096, "dropout": 0.0, "num_hidden_layers": 24, "num_attention_heads": 16, "max_position_embeddings": 512, "layer_norm_eps": 1e-12, "hidden_act": "gelu", "initializer_range": 0.02, "initializer_factor": 1.0, "attention_dropout": 0.0, "type_vocab_size": 2, "hidden_dropout_prob": 0.1, "attention_probs_dropout_prob": 0.1}, "vision_config": {"return_dict": true, "output_hidden_states": false, "output_attentions": false, "torchscript": false, "use_bfloat16": false, "pruned_heads": {}, "tie_word_embeddings": true, "is_encoder_decoder": false, "is_decoder": false, "add_cross_attention": false, "tie_encoder_decoder": false, "max_length": 20, "min_length": 0, "do_sample": false, "early_stopping": false, "num_beams": 1, "num_beam_groups": 1, "diversity_penalty": 0.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "repetition_penalty": 1.0, "length_penalty": 1.0, "no_repeat_ngram_size": 0, "encoder_no_repeat_ngram_size": 0, "bad_words_ids": null, "num_return_sequences": 1, "chunk_size_feed_forward": 0, "output_scores": false, "return_dict_in_generate": false, "forced_bos_token_id": null, "forced_eos_token_id": null, "remove_invalid_values": false, "architectures": null, "finetuning_task": null, "id2label": {"0": "LABEL_0", "1": "LABEL_1"}, "label2id": {"LABEL_0": 0, "LABEL_1": 1}, "tokenizer_class": null, "prefix": null, "bos_token_id": null, "pad_token_id": null, "eos_token_id": null, "sep_token_id": null, "decoder_start_token_id": null, "task_specific_params": null, "problem_type": null, "_name_or_path": "", "easynlp_version": "0.0.3", "cross_attention_hidden_size": null, "model_type": "clip_vision_model", "torch_dtype": null, "transformers_version": "4.16.0.dev0", "hidden_size": 1024, "intermediate_size": 4096, "dropout": 0.0, "num_hidden_layers": 24, "num_attention_heads": 16, "patch_size": 14, "image_size": 224, "initializer_range": 0.02, "initializer_factor": 1.0, "attention_dropout": 0.0, "layer_norm_eps": 1e-05, "hidden_act": "quick_gelu"}, "projection_dim": 512, "logit_scale_init_value": 2.6592, "initializer_factor": 1.0, "model_type": "clip"}
optimizer type: AdamW
/root/.local/lib/python3.6/site-packages/pai_easynlp-0.0.6-py3.6.egg/easynlp/core/optimizers.py:441: UserWarning: This overload of add_ is deprecated: add_(Number alpha, Tensor other) Consider using one of the following signatures instead: add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:1005.) exp_avg.mul_(beta1).add_(1.0 - beta1, grad) /home/pai/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`. warnings.warn("To get the last learning rate computed by the scheduler, " [2022-07-20 17:11:23,694 INFO] Epoch [ 2/ 3], step [100/120], lr 0.000010, 7.11 s [2022-07-20 17:11:23,696 INFO] loss : 0.0004 [2022-07-20 17:11:27,046 INFO] Saving best model to ./clip_model/pytorch_model.bin...
Training Time: 31.7812922000885, rank 0, gsteps 120
[2022-07-20 17:12:03,450 INFO] Training Time: 68.18616724014282
模型评估
训练过程结束后,train好的模型被我们保存在一开始指定好的checkpoint_dir中,本地路径为"./clip_model/"。我们可以对train好的模型进行效果评估。我们同样先使用EasyNLP中的get_application_model_for_evaluation方法构建评估模型。
model = get_application_model_for_evaluation(app_name="clip", pretrained_model_name_or_path="./clip_model/", user_defined_parameters=user_defined_parameters)
Loaded weights of the model: [embeddings.position_ids,embeddings.word_embeddings.weight,embeddings.position_embeddings.weight,embeddings.token_type_embeddings.weight,embeddings.LayerNorm.weight,embeddings.LayerNorm.bias,encoder.layer.0.attention.self.query.weight,encoder.layer.0.attention.self.query.bias,encoder.layer.0.attention.self.key.weight,encoder.layer.0.attention.self.key.bias,encoder.layer.0.attention.self.value.weight,encoder.layer.0.attention.self.value.bias,encoder.layer.0.attention.output.dense.weight,encoder.layer.0.attention.output.dense.bias,encoder.layer.0.attention.output.LayerNorm.weight,encoder.layer.0.attention.output.LayerNorm.bias,encoder.layer.0.intermediate.dense.weight,encoder.layer.0.intermediate.dense.bias,encoder.layer.0.output.dense.weight,encoder.layer.0.output.dense.bias,encoder.layer.0.output.LayerNorm.weight,encoder.layer.0.output.LayerNorm.bias,encoder.layer.1.attention.self.query.weight,encoder.layer.1.attention.self.query.bias,encoder.layer.1.attention.self.key.weight,encoder.layer.1.attention.self.key.bias,encoder.layer.1.attention.self.value.weight,encoder.layer.1.attention.self.value.bias,encoder.layer.1.attention.output.dense.weight,encoder.layer.1.attention.output.dense.bias,encoder.layer.1.attention.output.LayerNorm.weight,encoder.layer.1.attention.output.LayerNorm.bias,encoder.layer.1.intermediate.dense.weight,encoder.layer.1.intermediate.dense.bias,encoder.layer.1.output.dense.weight,encoder.layer.1.output.dense.bias,encoder.layer.1.output.LayerNorm.weight,encoder.layer.1.output.LayerNorm.bias,encoder.layer.2.attention.self.query.weight,encoder.layer.2.attention.self.query.bias,encoder.layer.2.attention.self.key.weight,encoder.layer.2.attention.self.key.bias,encoder.layer.2.attention.self.value.weight,encoder.layer.2.attention.self.value.bias,encoder.layer.2.attention.output.dense.weight,encoder.layer.2.attention.output.dense.bias,encoder.layer.2.attention.output.LayerNorm.weight,encoder.layer.2.attention.output.LayerNorm.bias,encoder.layer.2.intermediate.dense.weight,encoder.layer.2.intermediate.dense.bias,encoder.layer.2.output.dense.weight,encoder.layer.2.output.dense.bias,encoder.layer.2.output.LayerNorm.weight,encoder.layer.2.output.LayerNorm.bias,encoder.layer.3.attention.self.query.weight,encoder.layer.3.attention.self.query.bias,encoder.layer.3.attention.self.key.weight,encoder.layer.3.attention.self.key.bias,encoder.layer.3.attention.self.value.weight,encoder.layer.3.attention.self.value.bias,encoder.layer.3.attention.output.dense.weight,encoder.layer.3.attention.output.dense.bias,encoder.layer.3.attention.output.LayerNorm.weight,encoder.layer.3.attention.output.LayerNorm.bias,encoder.layer.3.intermediate.dense.weight,encoder.layer.3.intermediate.dense.bias,encoder.layer.3.output.dense.weight,encoder.layer.3.output.dense.bias,encoder.layer.3.output.LayerNorm.weight,encoder.layer.3.output.LayerNorm.bias,encoder.layer.4.attention.self.query.weight,encoder.layer.4.attention.self.query.bias,encoder.layer.4.attention.self.key.weight,encoder.layer.4.attention.self.key.bias,encoder.layer.4.attention.self.value.weight,encoder.layer.4.attention.self.value.bias,encoder.layer.4.attention.output.dense.weight,encoder.layer.4.attention.output.dense.bias,encoder.layer.4.attention.output.LayerNorm.weight,encoder.layer.4.attention.output.LayerNorm.bias,encoder.layer.4.intermediate.dense.weight,encoder.layer.4.intermediate.dense.bias,encoder.layer.4.output.dense.weight,encoder.layer.4.output.dense.bias,encoder.layer.4.output.LayerNorm.weight,encoder.layer.4.output.LayerNorm.bias,encoder.layer.5.attention.self.query.weight,encoder.layer.5.attention.self.query.bias,encoder.layer.5.attention.self.key.weight,encoder.layer.5.attention.self.key.bias,encoder.layer.5.attention.self.value.weight,encoder.layer.5.attention.self.value.bias,encoder.layer.5.attention.output.dense.weight,encoder.layer.5.attention.output.dense.bias,encoder.layer.5.attention.output.LayerNorm.weight,encoder.layer.5.attention.output.LayerNorm.bias,encoder.layer.5.intermediate.dense.weight,encoder.layer.5.intermediate.dense.bias,encoder.layer.5.output.dense.weight,encoder.layer.5.output.dense.bias,encoder.layer.5.output.LayerNorm.weight,encoder.layer.5.output.LayerNorm.bias,encoder.layer.6.attention.self.query.weight,encoder.layer.6.attention.self.query.bias,encoder.layer.6.attention.self.key.weight,encoder.layer.6.attention.self.key.bias,encoder.layer.6.attention.self.value.weight,encoder.layer.6.attention.self.value.bias,encoder.layer.6.attention.output.dense.weight,encoder.layer.6.attention.output.dense.bias,encoder.layer.6.attention.output.LayerNorm.weight,encoder.layer.6.attention.output.LayerNorm.bias,encoder.layer.6.intermediate.dense.weight,encoder.layer.6.intermediate.dense.bias,encoder.layer.6.output.dense.weight,encoder.layer.6.output.dense.bias,encoder.layer.6.output.LayerNorm.weight,encoder.layer.6.output.LayerNorm.bias,encoder.layer.7.attention.self.query.weight,encoder.layer.7.attention.self.query.bias,encoder.layer.7.attention.self.key.weight,encoder.layer.7.attention.self.key.bias,encoder.layer.7.attention.self.value.weight,encoder.layer.7.attention.self.value.bias,encoder.layer.7.attention.output.dense.weight,encoder.layer.7.attention.output.dense.bias,encoder.layer.7.attention.output.LayerNorm.weight,encoder.layer.7.attention.output.LayerNorm.bias,encoder.layer.7.intermediate.dense.weight,encoder.layer.7.intermediate.dense.bias,encoder.layer.7.output.dense.weight,encoder.layer.7.output.dense.bias,encoder.layer.7.output.LayerNorm.weight,encoder.layer.7.output.LayerNorm.bias,encoder.layer.8.attention.self.query.weight,encoder.layer.8.attention.self.query.bias,encoder.layer.8.attention.self.key.weight,encoder.layer.8.attention.self.key.bias,encoder.layer.8.attention.self.value.weight,encoder.layer.8.attention.self.value.bias,encoder.layer.8.attention.output.dense.weight,encoder.layer.8.attention.output.dense.bias,encoder.layer.8.attention.output.LayerNorm.weight,encoder.layer.8.attention.output.LayerNorm.bias,encoder.layer.8.intermediate.dense.weight,encoder.layer.8.intermediate.dense.bias,encoder.layer.8.output.dense.weight,encoder.layer.8.output.dense.bias,encoder.layer.8.output.LayerNorm.weight,encoder.layer.8.output.LayerNorm.bias,encoder.layer.9.attention.self.query.weight,encoder.layer.9.attention.self.query.bias,encoder.layer.9.attention.self.key.weight,encoder.layer.9.attention.self.key.bias,encoder.layer.9.attention.self.value.weight,encoder.layer.9.attention.self.value.bias,encoder.layer.9.attention.output.dense.weight,encoder.layer.9.attention.output.dense.bias,encoder.layer.9.attention.output.LayerNorm.weight,encoder.layer.9.attention.output.LayerNorm.bias,encoder.layer.9.intermediate.dense.weight,encoder.layer.9.intermediate.dense.bias,encoder.layer.9.output.dense.weight,encoder.layer.9.output.dense.bias,encoder.layer.9.output.LayerNorm.weight,encoder.layer.9.output.LayerNorm.bias,encoder.layer.10.attention.self.query.weight,encoder.layer.10.attention.self.query.bias,encoder.layer.10.attention.self.key.weight,encoder.layer.10.attention.self.key.bias,encoder.layer.10.attention.self.value.weight,encoder.layer.10.attention.self.value.bias,encoder.layer.10.attention.output.dense.weight,encoder.layer.10.attention.output.dense.bias,encoder.layer.10.attention.output.LayerNorm.weight,encoder.layer.10.attention.output.LayerNorm.bias,encoder.layer.10.intermediate.dense.weight,encoder.layer.10.intermediate.dense.bias,encoder.layer.10.output.dense.weight,encoder.layer.10.output.dense.bias,encoder.layer.10.output.LayerNorm.weight,encoder.layer.10.output.LayerNorm.bias,encoder.layer.11.attention.self.query.weight,encoder.layer.11.attention.self.query.bias,encoder.layer.11.attention.self.key.weight,encoder.layer.11.attention.self.key.bias,encoder.layer.11.attention.self.value.weight,encoder.layer.11.attention.self.value.bias,encoder.layer.11.attention.output.dense.weight,encoder.layer.11.attention.output.dense.bias,encoder.layer.11.attention.output.LayerNorm.weight,encoder.layer.11.attention.output.LayerNorm.bias,encoder.layer.11.intermediate.dense.weight,encoder.layer.11.intermediate.dense.bias,encoder.layer.11.output.dense.weight,encoder.layer.11.output.dense.bias,encoder.layer.11.output.LayerNorm.weight,encoder.layer.11.output.LayerNorm.bias,encoder.layer.12.attention.self.query.weight,encoder.layer.12.attention.self.query.bias,encoder.layer.12.attention.self.key.weight,encoder.layer.12.attention.self.key.bias,encoder.layer.12.attention.self.value.weight,encoder.layer.12.attention.self.value.bias,encoder.layer.12.attention.output.dense.weight,encoder.layer.12.attention.output.dense.bias,encoder.layer.12.attention.output.LayerNorm.weight,encoder.layer.12.attention.output.LayerNorm.bias,encoder.layer.12.intermediate.dense.weight,encoder.layer.12.intermediate.dense.bias,encoder.layer.12.output.dense.weight,encoder.layer.12.output.dense.bias,encoder.layer.12.output.LayerNorm.weight,encoder.layer.12.output.LayerNorm.bias,encoder.layer.13.attention.self.query.weight,encoder.layer.13.attention.self.query.bias,encoder.layer.13.attention.self.key.weight,encoder.layer.13.attention.self.key.bias,encoder.layer.13.attention.self.value.weight,encoder.layer.13.attention.self.value.bias,encoder.layer.13.attention.output.dense.weight,encoder.layer.13.attention.output.dense.bias,encoder.layer.13.attention.output.LayerNorm.weight,encoder.layer.13.attention.output.LayerNorm.bias,encoder.layer.13.intermediate.dense.weight,encoder.layer.13.intermediate.dense.bias,encoder.layer.13.output.dense.weight,encoder.layer.13.output.dense.bias,encoder.layer.13.output.LayerNorm.weight,encoder.layer.13.output.LayerNorm.bias,encoder.layer.14.attention.self.query.weight,encoder.layer.14.attention.self.query.bias,encoder.layer.14.attention.self.key.weight,encoder.layer.14.attention.self.key.bias,encoder.layer.14.attention.self.value.weight,encoder.layer.14.attention.self.value.bias,encoder.layer.14.attention.output.dense.weight,encoder.layer.14.attention.output.dense.bias,encoder.layer.14.attention.output.LayerNorm.weight,encoder.layer.14.attention.output.LayerNorm.bias,encoder.layer.14.intermediate.dense.weight,encoder.layer.14.intermediate.dense.bias,encoder.layer.14.output.dense.weight,encoder.layer.14.output.dense.bias,encoder.layer.14.output.LayerNorm.weight,encoder.layer.14.output.LayerNorm.bias,encoder.layer.15.attention.self.query.weight,encoder.layer.15.attention.self.query.bias,encoder.layer.15.attention.self.key.weight,encoder.layer.15.attention.self.key.bias,encoder.layer.15.attention.self.value.weight,encoder.layer.15.attention.self.value.bias,encoder.layer.15.attention.output.dense.weight,encoder.layer.15.attention.output.dense.bias,encoder.layer.15.attention.output.LayerNorm.weight,encoder.layer.15.attention.output.LayerNorm.bias,encoder.layer.15.intermediate.dense.weight,encoder.layer.15.intermediate.dense.bias,encoder.layer.15.output.dense.weight,encoder.layer.15.output.dense.bias,encoder.layer.15.output.LayerNorm.weight,encoder.layer.15.output.LayerNorm.bias,encoder.layer.16.attention.self.query.weight,encoder.layer.16.attention.self.query.bias,encoder.layer.16.attention.self.key.weight,encoder.layer.16.attention.self.key.bias,encoder.layer.16.attention.self.value.weight,encoder.layer.16.attention.self.value.bias,encoder.layer.16.attention.output.dense.weight,encoder.layer.16.attention.output.dense.bias,encoder.layer.16.attention.output.LayerNorm.weight,encoder.layer.16.attention.output.LayerNorm.bias,encoder.layer.16.intermediate.dense.weight,encoder.layer.16.intermediate.dense.bias,encoder.layer.16.output.dense.weight,encoder.layer.16.output.dense.bias,encoder.layer.16.output.LayerNorm.weight,encoder.layer.16.output.LayerNorm.bias,encoder.layer.17.attention.self.query.weight,encoder.layer.17.attention.self.query.bias,encoder.layer.17.attention.self.key.weight,encoder.layer.17.attention.self.key.bias,encoder.layer.17.attention.self.value.weight,encoder.layer.17.attention.self.value.bias,encoder.layer.17.attention.output.dense.weight,encoder.layer.17.attention.output.dense.bias,encoder.layer.17.attention.output.LayerNorm.weight,encoder.layer.17.attention.output.LayerNorm.bias,encoder.layer.17.intermediate.dense.weight,encoder.layer.17.intermediate.dense.bias,encoder.layer.17.output.dense.weight,encoder.layer.17.output.dense.bias,encoder.layer.17.output.LayerNorm.weight,encoder.layer.17.output.LayerNorm.bias,encoder.layer.18.attention.self.query.weight,encoder.layer.18.attention.self.query.bias,encoder.layer.18.attention.self.key.weight,encoder.layer.18.attention.self.key.bias,encoder.layer.18.attention.self.value.weight,encoder.layer.18.attention.self.value.bias,encoder.layer.18.attention.output.dense.weight,encoder.layer.18.attention.output.dense.bias,encoder.layer.18.attention.output.LayerNorm.weight,encoder.layer.18.attention.output.LayerNorm.bias,encoder.layer.18.intermediate.dense.weight,encoder.layer.18.intermediate.dense.bias,encoder.layer.18.output.dense.weight,encoder.layer.18.output.dense.bias,encoder.layer.18.output.LayerNorm.weight,encoder.layer.18.output.LayerNorm.bias,encoder.layer.19.attention.self.query.weight,encoder.layer.19.attention.self.query.bias,encoder.layer.19.attention.self.key.weight,encoder.layer.19.attention.self.key.bias,encoder.layer.19.attention.self.value.weight,encoder.layer.19.attention.self.value.bias,encoder.layer.19.attention.output.dense.weight,encoder.layer.19.attention.output.dense.bias,encoder.layer.19.attention.output.LayerNorm.weight,encoder.layer.19.attention.output.LayerNorm.bias,encoder.layer.19.intermediate.dense.weight,encoder.layer.19.intermediate.dense.bias,encoder.layer.19.output.dense.weight,encoder.layer.19.output.dense.bias,encoder.layer.19.output.LayerNorm.weight,encoder.layer.19.output.LayerNorm.bias,encoder.layer.20.attention.self.query.weight,encoder.layer.20.attention.self.query.bias,encoder.layer.20.attention.self.key.weight,encoder.layer.20.attention.self.key.bias,encoder.layer.20.attention.self.value.weight,encoder.layer.20.attention.self.value.bias,encoder.layer.20.attention.output.dense.weight,encoder.layer.20.attention.output.dense.bias,encoder.layer.20.attention.output.LayerNorm.weight,encoder.layer.20.attention.output.LayerNorm.bias,encoder.layer.20.intermediate.dense.weight,encoder.layer.20.intermediate.dense.bias,encoder.layer.20.output.dense.weight,encoder.layer.20.output.dense.bias,encoder.layer.20.output.LayerNorm.weight,encoder.layer.20.output.LayerNorm.bias,encoder.layer.21.attention.self.query.weight,encoder.layer.21.attention.self.query.bias,encoder.layer.21.attention.self.key.weight,encoder.layer.21.attention.self.key.bias,encoder.layer.21.attention.self.value.weight,encoder.layer.21.attention.self.value.bias,encoder.layer.21.attention.output.dense.weight,encoder.layer.21.attention.output.dense.bias,encoder.layer.21.attention.output.LayerNorm.weight,encoder.layer.21.attention.output.LayerNorm.bias,encoder.layer.21.intermediate.dense.weight,encoder.layer.21.intermediate.dense.bias,encoder.layer.21.output.dense.weight,encoder.layer.21.output.dense.bias,encoder.layer.21.output.LayerNorm.weight,encoder.layer.21.output.LayerNorm.bias,encoder.layer.22.attention.self.query.weight,encoder.layer.22.attention.self.query.bias,encoder.layer.22.attention.self.key.weight,encoder.layer.22.attention.self.key.bias,encoder.layer.22.attention.self.value.weight,encoder.layer.22.attention.self.value.bias,encoder.layer.22.attention.output.dense.weight,encoder.layer.22.attention.output.dense.bias,encoder.layer.22.attention.output.LayerNorm.weight,encoder.layer.22.attention.output.LayerNorm.bias,encoder.layer.22.intermediate.dense.weight,encoder.layer.22.intermediate.dense.bias,encoder.layer.22.output.dense.weight,encoder.layer.22.output.dense.bias,encoder.layer.22.output.LayerNorm.weight,encoder.layer.22.output.LayerNorm.bias,encoder.layer.23.attention.self.query.weight,encoder.layer.23.attention.self.query.bias,encoder.layer.23.attention.self.key.weight,encoder.layer.23.attention.self.key.bias,encoder.layer.23.attention.self.value.weight,encoder.layer.23.attention.self.value.bias,encoder.layer.23.attention.output.dense.weight,encoder.layer.23.attention.output.dense.bias,encoder.layer.23.attention.output.LayerNorm.weight,encoder.layer.23.attention.output.LayerNorm.bias,encoder.layer.23.intermediate.dense.weight,encoder.layer.23.intermediate.dense.bias,encoder.layer.23.output.dense.weight,encoder.layer.23.output.dense.bias,encoder.layer.23.output.LayerNorm.weight,encoder.layer.23.output.LayerNorm.bias,pooler.dense.weight,pooler.dense.bias]. All weights are initialized. Loaded weights of the model: [vision_model.embeddings.class_embedding,vision_model.embeddings.position_ids,vision_model.embeddings.patch_embedding.weight,vision_model.embeddings.position_embedding.weight,vision_model.pre_layrnorm.weight,vision_model.pre_layrnorm.bias,vision_model.encoder.layers.0.self_attn.k_proj.weight,vision_model.encoder.layers.0.self_attn.k_proj.bias,vision_model.encoder.layers.0.self_attn.v_proj.weight,vision_model.encoder.layers.0.self_attn.v_proj.bias,vision_model.encoder.layers.0.self_attn.q_proj.weight,vision_model.encoder.layers.0.self_attn.q_proj.bias,vision_model.encoder.layers.0.self_attn.out_proj.weight,vision_model.encoder.layers.0.self_attn.out_proj.bias,vision_model.encoder.layers.0.layer_norm1.weight,vision_model.encoder.layers.0.layer_norm1.bias,vision_model.encoder.layers.0.mlp.fc1.weight,vision_model.encoder.layers.0.mlp.fc1.bias,vision_model.encoder.layers.0.mlp.fc2.weight,vision_model.encoder.layers.0.mlp.fc2.bias,vision_model.encoder.layers.0.layer_norm2.weight,vision_model.encoder.layers.0.layer_norm2.bias,vision_model.encoder.layers.1.self_attn.k_proj.weight,vision_model.encoder.layers.1.self_attn.k_proj.bias,vision_model.encoder.layers.1.self_attn.v_proj.weight,vision_model.encoder.layers.1.self_attn.v_proj.bias,vision_model.encoder.layers.1.self_attn.q_proj.weight,vision_model.encoder.layers.1.self_attn.q_proj.bias,vision_model.encoder.layers.1.self_attn.out_proj.weight,vision_model.encoder.layers.1.self_attn.out_proj.bias,vision_model.encoder.layers.1.layer_norm1.weight,vision_model.encoder.layers.1.layer_norm1.bias,vision_model.encoder.layers.1.mlp.fc1.weight,vision_model.encoder.layers.1.mlp.fc1.bias,vision_model.encoder.layers.1.mlp.fc2.weight,vision_model.encoder.layers.1.mlp.fc2.bias,vision_model.encoder.layers.1.layer_norm2.weight,vision_model.encoder.layers.1.layer_norm2.bias,vision_model.encoder.layers.2.self_attn.k_proj.weight,vision_model.encoder.layers.2.self_attn.k_proj.bias,vision_model.encoder.layers.2.self_attn.v_proj.weight,vision_model.encoder.layers.2.self_attn.v_proj.bias,vision_model.encoder.layers.2.self_attn.q_proj.weight,vision_model.encoder.layers.2.self_attn.q_proj.bias,vision_model.encoder.layers.2.self_attn.out_proj.weight,vision_model.encoder.layers.2.self_attn.out_proj.bias,vision_model.encoder.layers.2.layer_norm1.weight,vision_model.encoder.layers.2.layer_norm1.bias,vision_model.encoder.layers.2.mlp.fc1.weight,vision_model.encoder.layers.2.mlp.fc1.bias,vision_model.encoder.layers.2.mlp.fc2.weight,vision_model.encoder.layers.2.mlp.fc2.bias,vision_model.encoder.layers.2.layer_norm2.weight,vision_model.encoder.layers.2.layer_norm2.bias,vision_model.encoder.layers.3.self_attn.k_proj.weight,vision_model.encoder.layers.3.self_attn.k_proj.bias,vision_model.encoder.layers.3.self_attn.v_proj.weight,vision_model.encoder.layers.3.self_attn.v_proj.bias,vision_model.encoder.layers.3.self_attn.q_proj.weight,vision_model.encoder.layers.3.self_attn.q_proj.bias,vision_model.encoder.layers.3.self_attn.out_proj.weight,vision_model.encoder.layers.3.self_attn.out_proj.bias,vision_model.encoder.layers.3.layer_norm1.weight,vision_model.encoder.layers.3.layer_norm1.bias,vision_model.encoder.layers.3.mlp.fc1.weight,vision_model.encoder.layers.3.mlp.fc1.bias,vision_model.encoder.layers.3.mlp.fc2.weight,vision_model.encoder.layers.3.mlp.fc2.bias,vision_model.encoder.layers.3.layer_norm2.weight,vision_model.encoder.layers.3.layer_norm2.bias,vision_model.encoder.layers.4.self_attn.k_proj.weight,vision_model.encoder.layers.4.self_attn.k_proj.bias,vision_model.encoder.layers.4.self_attn.v_proj.weight,vision_model.encoder.layers.4.self_attn.v_proj.bias,vision_model.encoder.layers.4.self_attn.q_proj.weight,vision_model.encoder.layers.4.self_attn.q_proj.bias,vision_model.encoder.layers.4.self_attn.out_proj.weight,vision_model.encoder.layers.4.self_attn.out_proj.bias,vision_model.encoder.layers.4.layer_norm1.weight,vision_model.encoder.layers.4.layer_norm1.bias,vision_model.encoder.layers.4.mlp.fc1.weight,vision_model.encoder.layers.4.mlp.fc1.bias,vision_model.encoder.layers.4.mlp.fc2.weight,vision_model.encoder.layers.4.mlp.fc2.bias,vision_model.encoder.layers.4.layer_norm2.weight,vision_model.encoder.layers.4.layer_norm2.bias,vision_model.encoder.layers.5.self_attn.k_proj.weight,vision_model.encoder.layers.5.self_attn.k_proj.bias,vision_model.encoder.layers.5.self_attn.v_proj.weight,vision_model.encoder.layers.5.self_attn.v_proj.bias,vision_model.encoder.layers.5.self_attn.q_proj.weight,vision_model.encoder.layers.5.self_attn.q_proj.bias,vision_model.encoder.layers.5.self_attn.out_proj.weight,vision_model.encoder.layers.5.self_attn.out_proj.bias,vision_model.encoder.layers.5.layer_norm1.weight,vision_model.encoder.layers.5.layer_norm1.bias,vision_model.encoder.layers.5.mlp.fc1.weight,vision_model.encoder.layers.5.mlp.fc1.bias,vision_model.encoder.layers.5.mlp.fc2.weight,vision_model.encoder.layers.5.mlp.fc2.bias,vision_model.encoder.layers.5.layer_norm2.weight,vision_model.encoder.layers.5.layer_norm2.bias,vision_model.encoder.layers.6.self_attn.k_proj.weight,vision_model.encoder.layers.6.self_attn.k_proj.bias,vision_model.encoder.layers.6.self_attn.v_proj.weight,vision_model.encoder.layers.6.self_attn.v_proj.bias,vision_model.encoder.layers.6.self_attn.q_proj.weight,vision_model.encoder.layers.6.self_attn.q_proj.bias,vision_model.encoder.layers.6.self_attn.out_proj.weight,vision_model.encoder.layers.6.self_attn.out_proj.bias,vision_model.encoder.layers.6.layer_norm1.weight,vision_model.encoder.layers.6.layer_norm1.bias,vision_model.encoder.layers.6.mlp.fc1.weight,vision_model.encoder.layers.6.mlp.fc1.bias,vision_model.encoder.layers.6.mlp.fc2.weight,vision_model.encoder.layers.6.mlp.fc2.bias,vision_model.encoder.layers.6.layer_norm2.weight,vision_model.encoder.layers.6.layer_norm2.bias,vision_model.encoder.layers.7.self_attn.k_proj.weight,vision_model.encoder.layers.7.self_attn.k_proj.bias,vision_model.encoder.layers.7.self_attn.v_proj.weight,vision_model.encoder.layers.7.self_attn.v_proj.bias,vision_model.encoder.layers.7.self_attn.q_proj.weight,vision_model.encoder.layers.7.self_attn.q_proj.bias,vision_model.encoder.layers.7.self_attn.out_proj.weight,vision_model.encoder.layers.7.self_attn.out_proj.bias,vision_model.encoder.layers.7.layer_norm1.weight,vision_model.encoder.layers.7.layer_norm1.bias,vision_model.encoder.layers.7.mlp.fc1.weight,vision_model.encoder.layers.7.mlp.fc1.bias,vision_model.encoder.layers.7.mlp.fc2.weight,vision_model.encoder.layers.7.mlp.fc2.bias,vision_model.encoder.layers.7.layer_norm2.weight,vision_model.encoder.layers.7.layer_norm2.bias,vision_model.encoder.layers.8.self_attn.k_proj.weight,vision_model.encoder.layers.8.self_attn.k_proj.bias,vision_model.encoder.layers.8.self_attn.v_proj.weight,vision_model.encoder.layers.8.self_attn.v_proj.bias,vision_model.encoder.layers.8.self_attn.q_proj.weight,vision_model.encoder.layers.8.self_attn.q_proj.bias,vision_model.encoder.layers.8.self_attn.out_proj.weight,vision_model.encoder.layers.8.self_attn.out_proj.bias,vision_model.encoder.layers.8.layer_norm1.weight,vision_model.encoder.layers.8.layer_norm1.bias,vision_model.encoder.layers.8.mlp.fc1.weight,vision_model.encoder.layers.8.mlp.fc1.bias,vision_model.encoder.layers.8.mlp.fc2.weight,vision_model.encoder.layers.8.mlp.fc2.bias,vision_model.encoder.layers.8.layer_norm2.weight,vision_model.encoder.layers.8.layer_norm2.bias,vision_model.encoder.layers.9.self_attn.k_proj.weight,vision_model.encoder.layers.9.self_attn.k_proj.bias,vision_model.encoder.layers.9.self_attn.v_proj.weight,vision_model.encoder.layers.9.self_attn.v_proj.bias,vision_model.encoder.layers.9.self_attn.q_proj.weight,vision_model.encoder.layers.9.self_attn.q_proj.bias,vision_model.encoder.layers.9.self_attn.out_proj.weight,vision_model.encoder.layers.9.self_attn.out_proj.bias,vision_model.encoder.layers.9.layer_norm1.weight,vision_model.encoder.layers.9.layer_norm1.bias,vision_model.encoder.layers.9.mlp.fc1.weight,vision_model.encoder.layers.9.mlp.fc1.bias,vision_model.encoder.layers.9.mlp.fc2.weight,vision_model.encoder.layers.9.mlp.fc2.bias,vision_model.encoder.layers.9.layer_norm2.weight,vision_model.encoder.layers.9.layer_norm2.bias,vision_model.encoder.layers.10.self_attn.k_proj.weight,vision_model.encoder.layers.10.self_attn.k_proj.bias,vision_model.encoder.layers.10.self_attn.v_proj.weight,vision_model.encoder.layers.10.self_attn.v_proj.bias,vision_model.encoder.layers.10.self_attn.q_proj.weight,vision_model.encoder.layers.10.self_attn.q_proj.bias,vision_model.encoder.layers.10.self_attn.out_proj.weight,vision_model.encoder.layers.10.self_attn.out_proj.bias,vision_model.encoder.layers.10.layer_norm1.weight,vision_model.encoder.layers.10.layer_norm1.bias,vision_model.encoder.layers.10.mlp.fc1.weight,vision_model.encoder.layers.10.mlp.fc1.bias,vision_model.encoder.layers.10.mlp.fc2.weight,vision_model.encoder.layers.10.mlp.fc2.bias,vision_model.encoder.layers.10.layer_norm2.weight,vision_model.encoder.layers.10.layer_norm2.bias,vision_model.encoder.layers.11.self_attn.k_proj.weight,vision_model.encoder.layers.11.self_attn.k_proj.bias,vision_model.encoder.layers.11.self_attn.v_proj.weight,vision_model.encoder.layers.11.self_attn.v_proj.bias,vision_model.encoder.layers.11.self_attn.q_proj.weight,vision_model.encoder.layers.11.self_attn.q_proj.bias,vision_model.encoder.layers.11.self_attn.out_proj.weight,vision_model.encoder.layers.11.self_attn.out_proj.bias,vision_model.encoder.layers.11.layer_norm1.weight,vision_model.encoder.layers.11.layer_norm1.bias,vision_model.encoder.layers.11.mlp.fc1.weight,vision_model.encoder.layers.11.mlp.fc1.bias,vision_model.encoder.layers.11.mlp.fc2.weight,vision_model.encoder.layers.11.mlp.fc2.bias,vision_model.encoder.layers.11.layer_norm2.weight,vision_model.encoder.layers.11.layer_norm2.bias,vision_model.encoder.layers.12.self_attn.k_proj.weight,vision_model.encoder.layers.12.self_attn.k_proj.bias,vision_model.encoder.layers.12.self_attn.v_proj.weight,vision_model.encoder.layers.12.self_attn.v_proj.bias,vision_model.encoder.layers.12.self_attn.q_proj.weight,vision_model.encoder.layers.12.self_attn.q_proj.bias,vision_model.encoder.layers.12.self_attn.out_proj.weight,vision_model.encoder.layers.12.self_attn.out_proj.bias,vision_model.encoder.layers.12.layer_norm1.weight,vision_model.encoder.layers.12.layer_norm1.bias,vision_model.encoder.layers.12.mlp.fc1.weight,vision_model.encoder.layers.12.mlp.fc1.bias,vision_model.encoder.layers.12.mlp.fc2.weight,vision_model.encoder.layers.12.mlp.fc2.bias,vision_model.encoder.layers.12.layer_norm2.weight,vision_model.encoder.layers.12.layer_norm2.bias,vision_model.encoder.layers.13.self_attn.k_proj.weight,vision_model.encoder.layers.13.self_attn.k_proj.bias,vision_model.encoder.layers.13.self_attn.v_proj.weight,vision_model.encoder.layers.13.self_attn.v_proj.bias,vision_model.encoder.layers.13.self_attn.q_proj.weight,vision_model.encoder.layers.13.self_attn.q_proj.bias,vision_model.encoder.layers.13.self_attn.out_proj.weight,vision_model.encoder.layers.13.self_attn.out_proj.bias,vision_model.encoder.layers.13.layer_norm1.weight,vision_model.encoder.layers.13.layer_norm1.bias,vision_model.encoder.layers.13.mlp.fc1.weight,vision_model.encoder.layers.13.mlp.fc1.bias,vision_model.encoder.layers.13.mlp.fc2.weight,vision_model.encoder.layers.13.mlp.fc2.bias,vision_model.encoder.layers.13.layer_norm2.weight,vision_model.encoder.layers.13.layer_norm2.bias,vision_model.encoder.layers.14.self_attn.k_proj.weight,vision_model.encoder.layers.14.self_attn.k_proj.bias,vision_model.encoder.layers.14.self_attn.v_proj.weight,vision_model.encoder.layers.14.self_attn.v_proj.bias,vision_model.encoder.layers.14.self_attn.q_proj.weight,vision_model.encoder.layers.14.self_attn.q_proj.bias,vision_model.encoder.layers.14.self_attn.out_proj.weight,vision_model.encoder.layers.14.self_attn.out_proj.bias,vision_model.encoder.layers.14.layer_norm1.weight,vision_model.encoder.layers.14.layer_norm1.bias,vision_model.encoder.layers.14.mlp.fc1.weight,vision_model.encoder.layers.14.mlp.fc1.bias,vision_model.encoder.layers.14.mlp.fc2.weight,vision_model.encoder.layers.14.mlp.fc2.bias,vision_model.encoder.layers.14.layer_norm2.weight,vision_model.encoder.layers.14.layer_norm2.bias,vision_model.encoder.layers.15.self_attn.k_proj.weight,vision_model.encoder.layers.15.self_attn.k_proj.bias,vision_model.encoder.layers.15.self_attn.v_proj.weight,vision_model.encoder.layers.15.self_attn.v_proj.bias,vision_model.encoder.layers.15.self_attn.q_proj.weight,vision_model.encoder.layers.15.self_attn.q_proj.bias,vision_model.encoder.layers.15.self_attn.out_proj.weight,vision_model.encoder.layers.15.self_attn.out_proj.bias,vision_model.encoder.layers.15.layer_norm1.weight,vision_model.encoder.layers.15.layer_norm1.bias,vision_model.encoder.layers.15.mlp.fc1.weight,vision_model.encoder.layers.15.mlp.fc1.bias,vision_model.encoder.layers.15.mlp.fc2.weight,vision_model.encoder.layers.15.mlp.fc2.bias,vision_model.encoder.layers.15.layer_norm2.weight,vision_model.encoder.layers.15.layer_norm2.bias,vision_model.encoder.layers.16.self_attn.k_proj.weight,vision_model.encoder.layers.16.self_attn.k_proj.bias,vision_model.encoder.layers.16.self_attn.v_proj.weight,vision_model.encoder.layers.16.self_attn.v_proj.bias,vision_model.encoder.layers.16.self_attn.q_proj.weight,vision_model.encoder.layers.16.self_attn.q_proj.bias,vision_model.encoder.layers.16.self_attn.out_proj.weight,vision_model.encoder.layers.16.self_attn.out_proj.bias,vision_model.encoder.layers.16.layer_norm1.weight,vision_model.encoder.layers.16.layer_norm1.bias,vision_model.encoder.layers.16.mlp.fc1.weight,vision_model.encoder.layers.16.mlp.fc1.bias,vision_model.encoder.layers.16.mlp.fc2.weight,vision_model.encoder.layers.16.mlp.fc2.bias,vision_model.encoder.layers.16.layer_norm2.weight,vision_model.encoder.layers.16.layer_norm2.bias,vision_model.encoder.layers.17.self_attn.k_proj.weight,vision_model.encoder.layers.17.self_attn.k_proj.bias,vision_model.encoder.layers.17.self_attn.v_proj.weight,vision_model.encoder.layers.17.self_attn.v_proj.bias,vision_model.encoder.layers.17.self_attn.q_proj.weight,vision_model.encoder.layers.17.self_attn.q_proj.bias,vision_model.encoder.layers.17.self_attn.out_proj.weight,vision_model.encoder.layers.17.self_attn.out_proj.bias,vision_model.encoder.layers.17.layer_norm1.weight,vision_model.encoder.layers.17.layer_norm1.bias,vision_model.encoder.layers.17.mlp.fc1.weight,vision_model.encoder.layers.17.mlp.fc1.bias,vision_model.encoder.layers.17.mlp.fc2.weight,vision_model.encoder.layers.17.mlp.fc2.bias,vision_model.encoder.layers.17.layer_norm2.weight,vision_model.encoder.layers.17.layer_norm2.bias,vision_model.encoder.layers.18.self_attn.k_proj.weight,vision_model.encoder.layers.18.self_attn.k_proj.bias,vision_model.encoder.layers.18.self_attn.v_proj.weight,vision_model.encoder.layers.18.self_attn.v_proj.bias,vision_model.encoder.layers.18.self_attn.q_proj.weight,vision_model.encoder.layers.18.self_attn.q_proj.bias,vision_model.encoder.layers.18.self_attn.out_proj.weight,vision_model.encoder.layers.18.self_attn.out_proj.bias,vision_model.encoder.layers.18.layer_norm1.weight,vision_model.encoder.layers.18.layer_norm1.bias,vision_model.encoder.layers.18.mlp.fc1.weight,vision_model.encoder.layers.18.mlp.fc1.bias,vision_model.encoder.layers.18.mlp.fc2.weight,vision_model.encoder.layers.18.mlp.fc2.bias,vision_model.encoder.layers.18.layer_norm2.weight,vision_model.encoder.layers.18.layer_norm2.bias,vision_model.encoder.layers.19.self_attn.k_proj.weight,vision_model.encoder.layers.19.self_attn.k_proj.bias,vision_model.encoder.layers.19.self_attn.v_proj.weight,vision_model.encoder.layers.19.self_attn.v_proj.bias,vision_model.encoder.layers.19.self_attn.q_proj.weight,vision_model.encoder.layers.19.self_attn.q_proj.bias,vision_model.encoder.layers.19.self_attn.out_proj.weight,vision_model.encoder.layers.19.self_attn.out_proj.bias,vision_model.encoder.layers.19.layer_norm1.weight,vision_model.encoder.layers.19.layer_norm1.bias,vision_model.encoder.layers.19.mlp.fc1.weight,vision_model.encoder.layers.19.mlp.fc1.bias,vision_model.encoder.layers.19.mlp.fc2.weight,vision_model.encoder.layers.19.mlp.fc2.bias,vision_model.encoder.layers.19.layer_norm2.weight,vision_model.encoder.layers.19.layer_norm2.bias,vision_model.encoder.layers.20.self_attn.k_proj.weight,vision_model.encoder.layers.20.self_attn.k_proj.bias,vision_model.encoder.layers.20.self_attn.v_proj.weight,vision_model.encoder.layers.20.self_attn.v_proj.bias,vision_model.encoder.layers.20.self_attn.q_proj.weight,vision_model.encoder.layers.20.self_attn.q_proj.bias,vision_model.encoder.layers.20.self_attn.out_proj.weight,vision_model.encoder.layers.20.self_attn.out_proj.bias,vision_model.encoder.layers.20.layer_norm1.weight,vision_model.encoder.layers.20.layer_norm1.bias,vision_model.encoder.layers.20.mlp.fc1.weight,vision_model.encoder.layers.20.mlp.fc1.bias,vision_model.encoder.layers.20.mlp.fc2.weight,vision_model.encoder.layers.20.mlp.fc2.bias,vision_model.encoder.layers.20.layer_norm2.weight,vision_model.encoder.layers.20.layer_norm2.bias,vision_model.encoder.layers.21.self_attn.k_proj.weight,vision_model.encoder.layers.21.self_attn.k_proj.bias,vision_model.encoder.layers.21.self_attn.v_proj.weight,vision_model.encoder.layers.21.self_attn.v_proj.bias,vision_model.encoder.layers.21.self_attn.q_proj.weight,vision_model.encoder.layers.21.self_attn.q_proj.bias,vision_model.encoder.layers.21.self_attn.out_proj.weight,vision_model.encoder.layers.21.self_attn.out_proj.bias,vision_model.encoder.layers.21.layer_norm1.weight,vision_model.encoder.layers.21.layer_norm1.bias,vision_model.encoder.layers.21.mlp.fc1.weight,vision_model.encoder.layers.21.mlp.fc1.bias,vision_model.encoder.layers.21.mlp.fc2.weight,vision_model.encoder.layers.21.mlp.fc2.bias,vision_model.encoder.layers.21.layer_norm2.weight,vision_model.encoder.layers.21.layer_norm2.bias,vision_model.encoder.layers.22.self_attn.k_proj.weight,vision_model.encoder.layers.22.self_attn.k_proj.bias,vision_model.encoder.layers.22.self_attn.v_proj.weight,vision_model.encoder.layers.22.self_attn.v_proj.bias,vision_model.encoder.layers.22.self_attn.q_proj.weight,vision_model.encoder.layers.22.self_attn.q_proj.bias,vision_model.encoder.layers.22.self_attn.out_proj.weight,vision_model.encoder.layers.22.self_attn.out_proj.bias,vision_model.encoder.layers.22.layer_norm1.weight,vision_model.encoder.layers.22.layer_norm1.bias,vision_model.encoder.layers.22.mlp.fc1.weight,vision_model.encoder.layers.22.mlp.fc1.bias,vision_model.encoder.layers.22.mlp.fc2.weight,vision_model.encoder.layers.22.mlp.fc2.bias,vision_model.encoder.layers.22.layer_norm2.weight,vision_model.encoder.layers.22.layer_norm2.bias,vision_model.encoder.layers.23.self_attn.k_proj.weight,vision_model.encoder.layers.23.self_attn.k_proj.bias,vision_model.encoder.layers.23.self_attn.v_proj.weight,vision_model.encoder.layers.23.self_attn.v_proj.bias,vision_model.encoder.layers.23.self_attn.q_proj.weight,vision_model.encoder.layers.23.self_attn.q_proj.bias,vision_model.encoder.layers.23.self_attn.out_proj.weight,vision_model.encoder.layers.23.self_attn.out_proj.bias,vision_model.encoder.layers.23.layer_norm1.weight,vision_model.encoder.layers.23.layer_norm1.bias,vision_model.encoder.layers.23.mlp.fc1.weight,vision_model.encoder.layers.23.mlp.fc1.bias,vision_model.encoder.layers.23.mlp.fc2.weight,vision_model.encoder.layers.23.mlp.fc2.bias,vision_model.encoder.layers.23.layer_norm2.weight,vision_model.encoder.layers.23.layer_norm2.bias,vision_model.post_layernorm.weight,vision_model.post_layernorm.bias]. All weights are initialized.
之后我们使用EasyNLP中的get_application_evaluator来初始化evaluator,并指定当前device下的当前模型,进行模型评估。
evaluator = get_application_evaluator(app_name="clip", valid_dataset=valid_dataset, user_defined_parameters=user_defined_parameters, eval_batch_size=32) model.to(torch.cuda.current_device()) evaluator.evaluate(model=model)
[2022-07-20 17:05:47,057 INFO] Inference time = 0.40s, [9.9234 ms / sample]
r1_num:31 r5_num:37 r10_num:38 query_num:40 r1(%):77.5 r5(%):92.5 r10(%):95.0 mean_recall(%):88.33333333333334
[('mean_recall', 0.8833333333333334)]
模型预测
我们同样可以使用训练好的模型进行预测,也就是文本和图片的特征向量提取。我们首先创建一个predictor,并据此实例化一个PredictorManager实例。以文本特征向量提取为例,我们指定输入为MUGE_MR_test_base64_part_text.tsv,预测好的结果输出在"text_feat.tsv",并指定输出格式为"text_feat"。
predictor = get_application_predictor(app_name="clip", model_dir="./clip_model/", first_sequence="text", second_sequence="image", sequence_length=32, user_defined_parameters=user_defined_parameters) predictor_manager = PredictorManager(predictor=predictor, input_file="MUGE_MR_test_base64_part_text.tsv", input_schema="text:str:1", output_file="text_feat.tsv", output_schema="text_feat", append_cols="text", batch_size=2) predictor_manager.run() exit()
Loaded weights of the model: [embeddings.position_ids,embeddings.word_embeddings.weight,embeddings.position_embeddings.weight,embeddings.token_type_embeddings.weight,embeddings.LayerNorm.weight,embeddings.LayerNorm.bias,encoder.layer.0.attention.self.query.weight,encoder.layer.0.attention.self.query.bias,encoder.layer.0.attention.self.key.weight,encoder.layer.0.attention.self.key.bias,encoder.layer.0.attention.self.value.weight,encoder.layer.0.attention.self.value.bias,encoder.layer.0.attention.output.dense.weight,encoder.layer.0.attention.output.dense.bias,encoder.layer.0.attention.output.LayerNorm.weight,encoder.layer.0.attention.output.LayerNorm.bias,encoder.layer.0.intermediate.dense.weight,encoder.layer.0.intermediate.dense.bias,encoder.layer.0.output.dense.weight,encoder.layer.0.output.dense.bias,encoder.layer.0.output.LayerNorm.weight,encoder.layer.0.output.LayerNorm.bias,encoder.layer.1.attention.self.query.weight,encoder.layer.1.attention.self.query.bias,encoder.layer.1.attention.self.key.weight,encoder.layer.1.attention.self.key.bias,encoder.layer.1.attention.self.value.weight,encoder.layer.1.attention.self.value.bias,encoder.layer.1.attention.output.dense.weight,encoder.layer.1.attention.output.dense.bias,encoder.layer.1.attention.output.LayerNorm.weight,encoder.layer.1.attention.output.LayerNorm.bias,encoder.layer.1.intermediate.dense.weight,encoder.layer.1.intermediate.dense.bias,encoder.layer.1.output.dense.weight,encoder.layer.1.output.dense.bias,encoder.layer.1.output.LayerNorm.weight,encoder.layer.1.output.LayerNorm.bias,encoder.layer.2.attention.self.query.weight,encoder.layer.2.attention.self.query.bias,encoder.layer.2.attention.self.key.weight,encoder.layer.2.attention.self.key.bias,encoder.layer.2.attention.self.value.weight,encoder.layer.2.attention.self.value.bias,encoder.layer.2.attention.output.dense.weight,encoder.layer.2.attention.output.dense.bias,encoder.layer.2.attention.output.LayerNorm.weight,encoder.layer.2.attention.output.LayerNorm.bias,encoder.layer.2.intermediate.dense.weight,encoder.layer.2.intermediate.dense.bias,encoder.layer.2.output.dense.weight,encoder.layer.2.output.dense.bias,encoder.layer.2.output.LayerNorm.weight,encoder.layer.2.output.LayerNorm.bias,encoder.layer.3.attention.self.query.weight,encoder.layer.3.attention.self.query.bias,encoder.layer.3.attention.self.key.weight,encoder.layer.3.attention.self.key.bias,encoder.layer.3.attention.self.value.weight,encoder.layer.3.attention.self.value.bias,encoder.layer.3.attention.output.dense.weight,encoder.layer.3.attention.output.dense.bias,encoder.layer.3.attention.output.LayerNorm.weight,encoder.layer.3.attention.output.LayerNorm.bias,encoder.layer.3.intermediate.dense.weight,encoder.layer.3.intermediate.dense.bias,encoder.layer.3.output.dense.weight,encoder.layer.3.output.dense.bias,encoder.layer.3.output.LayerNorm.weight,encoder.layer.3.output.LayerNorm.bias,encoder.layer.4.attention.self.query.weight,encoder.layer.4.attention.self.query.bias,encoder.layer.4.attention.self.key.weight,encoder.layer.4.attention.self.key.bias,encoder.layer.4.attention.self.value.weight,encoder.layer.4.attention.self.value.bias,encoder.layer.4.attention.output.dense.weight,encoder.layer.4.attention.output.dense.bias,encoder.layer.4.attention.output.LayerNorm.weight,encoder.layer.4.attention.output.LayerNorm.bias,encoder.layer.4.intermediate.dense.weight,encoder.layer.4.intermediate.dense.bias,encoder.layer.4.output.dense.weight,encoder.layer.4.output.dense.bias,encoder.layer.4.output.LayerNorm.weight,encoder.layer.4.output.LayerNorm.bias,encoder.layer.5.attention.self.query.weight,encoder.layer.5.attention.self.query.bias,encoder.layer.5.attention.self.key.weight,encoder.layer.5.attention.self.key.bias,encoder.layer.5.attention.self.value.weight,encoder.layer.5.attention.self.value.bias,encoder.layer.5.attention.output.dense.weight,encoder.layer.5.attention.output.dense.bias,encoder.layer.5.attention.output.LayerNorm.weight,encoder.layer.5.attention.output.LayerNorm.bias,encoder.layer.5.intermediate.dense.weight,encoder.layer.5.intermediate.dense.bias,encoder.layer.5.output.dense.weight,encoder.layer.5.output.dense.bias,encoder.layer.5.output.LayerNorm.weight,encoder.layer.5.output.LayerNorm.bias,encoder.layer.6.attention.self.query.weight,encoder.layer.6.attention.self.query.bias,encoder.layer.6.attention.self.key.weight,encoder.layer.6.attention.self.key.bias,encoder.layer.6.attention.self.value.weight,encoder.layer.6.attention.self.value.bias,encoder.layer.6.attention.output.dense.weight,encoder.layer.6.attention.output.dense.bias,encoder.layer.6.attention.output.LayerNorm.weight,encoder.layer.6.attention.output.LayerNorm.bias,encoder.layer.6.intermediate.dense.weight,encoder.layer.6.intermediate.dense.bias,encoder.layer.6.output.dense.weight,encoder.layer.6.output.dense.bias,encoder.layer.6.output.LayerNorm.weight,encoder.layer.6.output.LayerNorm.bias,encoder.layer.7.attention.self.query.weight,encoder.layer.7.attention.self.query.bias,encoder.layer.7.attention.self.key.weight,encoder.layer.7.attention.self.key.bias,encoder.layer.7.attention.self.value.weight,encoder.layer.7.attention.self.value.bias,encoder.layer.7.attention.output.dense.weight,encoder.layer.7.attention.output.dense.bias,encoder.layer.7.attention.output.LayerNorm.weight,encoder.layer.7.attention.output.LayerNorm.bias,encoder.layer.7.intermediate.dense.weight,encoder.layer.7.intermediate.dense.bias,encoder.layer.7.output.dense.weight,encoder.layer.7.output.dense.bias,encoder.layer.7.output.LayerNorm.weight,encoder.layer.7.output.LayerNorm.bias,encoder.layer.8.attention.self.query.weight,encoder.layer.8.attention.self.query.bias,encoder.layer.8.attention.self.key.weight,encoder.layer.8.attention.self.key.bias,encoder.layer.8.attention.self.value.weight,encoder.layer.8.attention.self.value.bias,encoder.layer.8.attention.output.dense.weight,encoder.layer.8.attention.output.dense.bias,encoder.layer.8.attention.output.LayerNorm.weight,encoder.layer.8.attention.output.LayerNorm.bias,encoder.layer.8.intermediate.dense.weight,encoder.layer.8.intermediate.dense.bias,encoder.layer.8.output.dense.weight,encoder.layer.8.output.dense.bias,encoder.layer.8.output.LayerNorm.weight,encoder.layer.8.output.LayerNorm.bias,encoder.layer.9.attention.self.query.weight,encoder.layer.9.attention.self.query.bias,encoder.layer.9.attention.self.key.weight,encoder.layer.9.attention.self.key.bias,encoder.layer.9.attention.self.value.weight,encoder.layer.9.attention.self.value.bias,encoder.layer.9.attention.output.dense.weight,encoder.layer.9.attention.output.dense.bias,encoder.layer.9.attention.output.LayerNorm.weight,encoder.layer.9.attention.output.LayerNorm.bias,encoder.layer.9.intermediate.dense.weight,encoder.layer.9.intermediate.dense.bias,encoder.layer.9.output.dense.weight,encoder.layer.9.output.dense.bias,encoder.layer.9.output.LayerNorm.weight,encoder.layer.9.output.LayerNorm.bias,encoder.layer.10.attention.self.query.weight,encoder.layer.10.attention.self.query.bias,encoder.layer.10.attention.self.key.weight,encoder.layer.10.attention.self.key.bias,encoder.layer.10.attention.self.value.weight,encoder.layer.10.attention.self.value.bias,encoder.layer.10.attention.output.dense.weight,encoder.layer.10.attention.output.dense.bias,encoder.layer.10.attention.output.LayerNorm.weight,encoder.layer.10.attention.output.LayerNorm.bias,encoder.layer.10.intermediate.dense.weight,encoder.layer.10.intermediate.dense.bias,encoder.layer.10.output.dense.weight,encoder.layer.10.output.dense.bias,encoder.layer.10.output.LayerNorm.weight,encoder.layer.10.output.LayerNorm.bias,encoder.layer.11.attention.self.query.weight,encoder.layer.11.attention.self.query.bias,encoder.layer.11.attention.self.key.weight,encoder.layer.11.attention.self.key.bias,encoder.layer.11.attention.self.value.weight,encoder.layer.11.attention.self.value.bias,encoder.layer.11.attention.output.dense.weight,encoder.layer.11.attention.output.dense.bias,encoder.layer.11.attention.output.LayerNorm.weight,encoder.layer.11.attention.output.LayerNorm.bias,encoder.layer.11.intermediate.dense.weight,encoder.layer.11.intermediate.dense.bias,encoder.layer.11.output.dense.weight,encoder.layer.11.output.dense.bias,encoder.layer.11.output.LayerNorm.weight,encoder.layer.11.output.LayerNorm.bias,encoder.layer.12.attention.self.query.weight,encoder.layer.12.attention.self.query.bias,encoder.layer.12.attention.self.key.weight,encoder.layer.12.attention.self.key.bias,encoder.layer.12.attention.self.value.weight,encoder.layer.12.attention.self.value.bias,encoder.layer.12.attention.output.dense.weight,encoder.layer.12.attention.output.dense.bias,encoder.layer.12.attention.output.LayerNorm.weight,encoder.layer.12.attention.output.LayerNorm.bias,encoder.layer.12.intermediate.dense.weight,encoder.layer.12.intermediate.dense.bias,encoder.layer.12.output.dense.weight,encoder.layer.12.output.dense.bias,encoder.layer.12.output.LayerNorm.weight,encoder.layer.12.output.LayerNorm.bias,encoder.layer.13.attention.self.query.weight,encoder.layer.13.attention.self.query.bias,encoder.layer.13.attention.self.key.weight,encoder.layer.13.attention.self.key.bias,encoder.layer.13.attention.self.value.weight,encoder.layer.13.attention.self.value.bias,encoder.layer.13.attention.output.dense.weight,encoder.layer.13.attention.output.dense.bias,encoder.layer.13.attention.output.LayerNorm.weight,encoder.layer.13.attention.output.LayerNorm.bias,encoder.layer.13.intermediate.dense.weight,encoder.layer.13.intermediate.dense.bias,encoder.layer.13.output.dense.weight,encoder.layer.13.output.dense.bias,encoder.layer.13.output.LayerNorm.weight,encoder.layer.13.output.LayerNorm.bias,encoder.layer.14.attention.self.query.weight,encoder.layer.14.attention.self.query.bias,encoder.layer.14.attention.self.key.weight,encoder.layer.14.attention.self.key.bias,encoder.layer.14.attention.self.value.weight,encoder.layer.14.attention.self.value.bias,encoder.layer.14.attention.output.dense.weight,encoder.layer.14.attention.output.dense.bias,encoder.layer.14.attention.output.LayerNorm.weight,encoder.layer.14.attention.output.LayerNorm.bias,encoder.layer.14.intermediate.dense.weight,encoder.layer.14.intermediate.dense.bias,encoder.layer.14.output.dense.weight,encoder.layer.14.output.dense.bias,encoder.layer.14.output.LayerNorm.weight,encoder.layer.14.output.LayerNorm.bias,encoder.layer.15.attention.self.query.weight,encoder.layer.15.attention.self.query.bias,encoder.layer.15.attention.self.key.weight,encoder.layer.15.attention.self.key.bias,encoder.layer.15.attention.self.value.weight,encoder.layer.15.attention.self.value.bias,encoder.layer.15.attention.output.dense.weight,encoder.layer.15.attention.output.dense.bias,encoder.layer.15.attention.output.LayerNorm.weight,encoder.layer.15.attention.output.LayerNorm.bias,encoder.layer.15.intermediate.dense.weight,encoder.layer.15.intermediate.dense.bias,encoder.layer.15.output.dense.weight,encoder.layer.15.output.dense.bias,encoder.layer.15.output.LayerNorm.weight,encoder.layer.15.output.LayerNorm.bias,encoder.layer.16.attention.self.query.weight,encoder.layer.16.attention.self.query.bias,encoder.layer.16.attention.self.key.weight,encoder.layer.16.attention.self.key.bias,encoder.layer.16.attention.self.value.weight,encoder.layer.16.attention.self.value.bias,encoder.layer.16.attention.output.dense.weight,encoder.layer.16.attention.output.dense.bias,encoder.layer.16.attention.output.LayerNorm.weight,encoder.layer.16.attention.output.LayerNorm.bias,encoder.layer.16.intermediate.dense.weight,encoder.layer.16.intermediate.dense.bias,encoder.layer.16.output.dense.weight,encoder.layer.16.output.dense.bias,encoder.layer.16.output.LayerNorm.weight,encoder.layer.16.output.LayerNorm.bias,encoder.layer.17.attention.self.query.weight,encoder.layer.17.attention.self.query.bias,encoder.layer.17.attention.self.key.weight,encoder.layer.17.attention.self.key.bias,encoder.layer.17.attention.self.value.weight,encoder.layer.17.attention.self.value.bias,encoder.layer.17.attention.output.dense.weight,encoder.layer.17.attention.output.dense.bias,encoder.layer.17.attention.output.LayerNorm.weight,encoder.layer.17.attention.output.LayerNorm.bias,encoder.layer.17.intermediate.dense.weight,encoder.layer.17.intermediate.dense.bias,encoder.layer.17.output.dense.weight,encoder.layer.17.output.dense.bias,encoder.layer.17.output.LayerNorm.weight,encoder.layer.17.output.LayerNorm.bias,encoder.layer.18.attention.self.query.weight,encoder.layer.18.attention.self.query.bias,encoder.layer.18.attention.self.key.weight,encoder.layer.18.attention.self.key.bias,encoder.layer.18.attention.self.value.weight,encoder.layer.18.attention.self.value.bias,encoder.layer.18.attention.output.dense.weight,encoder.layer.18.attention.output.dense.bias,encoder.layer.18.attention.output.LayerNorm.weight,encoder.layer.18.attention.output.LayerNorm.bias,encoder.layer.18.intermediate.dense.weight,encoder.layer.18.intermediate.dense.bias,encoder.layer.18.output.dense.weight,encoder.layer.18.output.dense.bias,encoder.layer.18.output.LayerNorm.weight,encoder.layer.18.output.LayerNorm.bias,encoder.layer.19.attention.self.query.weight,encoder.layer.19.attention.self.query.bias,encoder.layer.19.attention.self.key.weight,encoder.layer.19.attention.self.key.bias,encoder.layer.19.attention.self.value.weight,encoder.layer.19.attention.self.value.bias,encoder.layer.19.attention.output.dense.weight,encoder.layer.19.attention.output.dense.bias,encoder.layer.19.attention.output.LayerNorm.weight,encoder.layer.19.attention.output.LayerNorm.bias,encoder.layer.19.intermediate.dense.weight,encoder.layer.19.intermediate.dense.bias,encoder.layer.19.output.dense.weight,encoder.layer.19.output.dense.bias,encoder.layer.19.output.LayerNorm.weight,encoder.layer.19.output.LayerNorm.bias,encoder.layer.20.attention.self.query.weight,encoder.layer.20.attention.self.query.bias,encoder.layer.20.attention.self.key.weight,encoder.layer.20.attention.self.key.bias,encoder.layer.20.attention.self.value.weight,encoder.layer.20.attention.self.value.bias,encoder.layer.20.attention.output.dense.weight,encoder.layer.20.attention.output.dense.bias,encoder.layer.20.attention.output.LayerNorm.weight,encoder.layer.20.attention.output.LayerNorm.bias,encoder.layer.20.intermediate.dense.weight,encoder.layer.20.intermediate.dense.bias,encoder.layer.20.output.dense.weight,encoder.layer.20.output.dense.bias,encoder.layer.20.output.LayerNorm.weight,encoder.layer.20.output.LayerNorm.bias,encoder.layer.21.attention.self.query.weight,encoder.layer.21.attention.self.query.bias,encoder.layer.21.attention.self.key.weight,encoder.layer.21.attention.self.key.bias,encoder.layer.21.attention.self.value.weight,encoder.layer.21.attention.self.value.bias,encoder.layer.21.attention.output.dense.weight,encoder.layer.21.attention.output.dense.bias,encoder.layer.21.attention.output.LayerNorm.weight,encoder.layer.21.attention.output.LayerNorm.bias,encoder.layer.21.intermediate.dense.weight,encoder.layer.21.intermediate.dense.bias,encoder.layer.21.output.dense.weight,encoder.layer.21.output.dense.bias,encoder.layer.21.output.LayerNorm.weight,encoder.layer.21.output.LayerNorm.bias,encoder.layer.22.attention.self.query.weight,encoder.layer.22.attention.self.query.bias,encoder.layer.22.attention.self.key.weight,encoder.layer.22.attention.self.key.bias,encoder.layer.22.attention.self.value.weight,encoder.layer.22.attention.self.value.bias,encoder.layer.22.attention.output.dense.weight,encoder.layer.22.attention.output.dense.bias,encoder.layer.22.attention.output.LayerNorm.weight,encoder.layer.22.attention.output.LayerNorm.bias,encoder.layer.22.intermediate.dense.weight,encoder.layer.22.intermediate.dense.bias,encoder.layer.22.output.dense.weight,encoder.layer.22.output.dense.bias,encoder.layer.22.output.LayerNorm.weight,encoder.layer.22.output.LayerNorm.bias,encoder.layer.23.attention.self.query.weight,encoder.layer.23.attention.self.query.bias,encoder.layer.23.attention.self.key.weight,encoder.layer.23.attention.self.key.bias,encoder.layer.23.attention.self.value.weight,encoder.layer.23.attention.self.value.bias,encoder.layer.23.attention.output.dense.weight,encoder.layer.23.attention.output.dense.bias,encoder.layer.23.attention.output.LayerNorm.weight,encoder.layer.23.attention.output.LayerNorm.bias,encoder.layer.23.intermediate.dense.weight,encoder.layer.23.intermediate.dense.bias,encoder.layer.23.output.dense.weight,encoder.layer.23.output.dense.bias,encoder.layer.23.output.LayerNorm.weight,encoder.layer.23.output.LayerNorm.bias,pooler.dense.weight,pooler.dense.bias]. All weights are initialized. Loaded weights of the model: [vision_model.embeddings.class_embedding,vision_model.embeddings.position_ids,vision_model.embeddings.patch_embedding.weight,vision_model.embeddings.position_embedding.weight,vision_model.pre_layrnorm.weight,vision_model.pre_layrnorm.bias,vision_model.encoder.layers.0.self_attn.k_proj.weight,vision_model.encoder.layers.0.self_attn.k_proj.bias,vision_model.encoder.layers.0.self_attn.v_proj.weight,vision_model.encoder.layers.0.self_attn.v_proj.bias,vision_model.encoder.layers.0.self_attn.q_proj.weight,vision_model.encoder.layers.0.self_attn.q_proj.bias,vision_model.encoder.layers.0.self_attn.out_proj.weight,vision_model.encoder.layers.0.self_attn.out_proj.bias,vision_model.encoder.layers.0.layer_norm1.weight,vision_model.encoder.layers.0.layer_norm1.bias,vision_model.encoder.layers.0.mlp.fc1.weight,vision_model.encoder.layers.0.mlp.fc1.bias,vision_model.encoder.layers.0.mlp.fc2.weight,vision_model.encoder.layers.0.mlp.fc2.bias,vision_model.encoder.layers.0.layer_norm2.weight,vision_model.encoder.layers.0.layer_norm2.bias,vision_model.encoder.layers.1.self_attn.k_proj.weight,vision_model.encoder.layers.1.self_attn.k_proj.bias,vision_model.encoder.layers.1.self_attn.v_proj.weight,vision_model.encoder.layers.1.self_attn.v_proj.bias,vision_model.encoder.layers.1.self_attn.q_proj.weight,vision_model.encoder.layers.1.self_attn.q_proj.bias,vision_model.encoder.layers.1.self_attn.out_proj.weight,vision_model.encoder.layers.1.self_attn.out_proj.bias,vision_model.encoder.layers.1.layer_norm1.weight,vision_model.encoder.layers.1.layer_norm1.bias,vision_model.encoder.layers.1.mlp.fc1.weight,vision_model.encoder.layers.1.mlp.fc1.bias,vision_model.encoder.layers.1.mlp.fc2.weight,vision_model.encoder.layers.1.mlp.fc2.bias,vision_model.encoder.layers.1.layer_norm2.weight,vision_model.encoder.layers.1.layer_norm2.bias,vision_model.encoder.layers.2.self_attn.k_proj.weight,vision_model.encoder.layers.2.self_attn.k_proj.bias,vision_model.encoder.layers.2.self_attn.v_proj.weight,vision_model.encoder.layers.2.self_attn.v_proj.bias,vision_model.encoder.layers.2.self_attn.q_proj.weight,vision_model.encoder.layers.2.self_attn.q_proj.bias,vision_model.encoder.layers.2.self_attn.out_proj.weight,vision_model.encoder.layers.2.self_attn.out_proj.bias,vision_model.encoder.layers.2.layer_norm1.weight,vision_model.encoder.layers.2.layer_norm1.bias,vision_model.encoder.layers.2.mlp.fc1.weight,vision_model.encoder.layers.2.mlp.fc1.bias,vision_model.encoder.layers.2.mlp.fc2.weight,vision_model.encoder.layers.2.mlp.fc2.bias,vision_model.encoder.layers.2.layer_norm2.weight,vision_model.encoder.layers.2.layer_norm2.bias,vision_model.encoder.layers.3.self_attn.k_proj.weight,vision_model.encoder.layers.3.self_attn.k_proj.bias,vision_model.encoder.layers.3.self_attn.v_proj.weight,vision_model.encoder.layers.3.self_attn.v_proj.bias,vision_model.encoder.layers.3.self_attn.q_proj.weight,vision_model.encoder.layers.3.self_attn.q_proj.bias,vision_model.encoder.layers.3.self_attn.out_proj.weight,vision_model.encoder.layers.3.self_attn.out_proj.bias,vision_model.encoder.layers.3.layer_norm1.weight,vision_model.encoder.layers.3.layer_norm1.bias,vision_model.encoder.layers.3.mlp.fc1.weight,vision_model.encoder.layers.3.mlp.fc1.bias,vision_model.encoder.layers.3.mlp.fc2.weight,vision_model.encoder.layers.3.mlp.fc2.bias,vision_model.encoder.layers.3.layer_norm2.weight,vision_model.encoder.layers.3.layer_norm2.bias,vision_model.encoder.layers.4.self_attn.k_proj.weight,vision_model.encoder.layers.4.self_attn.k_proj.bias,vision_model.encoder.layers.4.self_attn.v_proj.weight,vision_model.encoder.layers.4.self_attn.v_proj.bias,vision_model.encoder.layers.4.self_attn.q_proj.weight,vision_model.encoder.layers.4.self_attn.q_proj.bias,vision_model.encoder.layers.4.self_attn.out_proj.weight,vision_model.encoder.layers.4.self_attn.out_proj.bias,vision_model.encoder.layers.4.layer_norm1.weight,vision_model.encoder.layers.4.layer_norm1.bias,vision_model.encoder.layers.4.mlp.fc1.weight,vision_model.encoder.layers.4.mlp.fc1.bias,vision_model.encoder.layers.4.mlp.fc2.weight,vision_model.encoder.layers.4.mlp.fc2.bias,vision_model.encoder.layers.4.layer_norm2.weight,vision_model.encoder.layers.4.layer_norm2.bias,vision_model.encoder.layers.5.self_attn.k_proj.weight,vision_model.encoder.layers.5.self_attn.k_proj.bias,vision_model.encoder.layers.5.self_attn.v_proj.weight,vision_model.encoder.layers.5.self_attn.v_proj.bias,vision_model.encoder.layers.5.self_attn.q_proj.weight,vision_model.encoder.layers.5.self_attn.q_proj.bias,vision_model.encoder.layers.5.self_attn.out_proj.weight,vision_model.encoder.layers.5.self_attn.out_proj.bias,vision_model.encoder.layers.5.layer_norm1.weight,vision_model.encoder.layers.5.layer_norm1.bias,vision_model.encoder.layers.5.mlp.fc1.weight,vision_model.encoder.layers.5.mlp.fc1.bias,vision_model.encoder.layers.5.mlp.fc2.weight,vision_model.encoder.layers.5.mlp.fc2.bias,vision_model.encoder.layers.5.layer_norm2.weight,vision_model.encoder.layers.5.layer_norm2.bias,vision_model.encoder.layers.6.self_attn.k_proj.weight,vision_model.encoder.layers.6.self_attn.k_proj.bias,vision_model.encoder.layers.6.self_attn.v_proj.weight,vision_model.encoder.layers.6.self_attn.v_proj.bias,vision_model.encoder.layers.6.self_attn.q_proj.weight,vision_model.encoder.layers.6.self_attn.q_proj.bias,vision_model.encoder.layers.6.self_attn.out_proj.weight,vision_model.encoder.layers.6.self_attn.out_proj.bias,vision_model.encoder.layers.6.layer_norm1.weight,vision_model.encoder.layers.6.layer_norm1.bias,vision_model.encoder.layers.6.mlp.fc1.weight,vision_model.encoder.layers.6.mlp.fc1.bias,vision_model.encoder.layers.6.mlp.fc2.weight,vision_model.encoder.layers.6.mlp.fc2.bias,vision_model.encoder.layers.6.layer_norm2.weight,vision_model.encoder.layers.6.layer_norm2.bias,vision_model.encoder.layers.7.self_attn.k_proj.weight,vision_model.encoder.layers.7.self_attn.k_proj.bias,vision_model.encoder.layers.7.self_attn.v_proj.weight,vision_model.encoder.layers.7.self_attn.v_proj.bias,vision_model.encoder.layers.7.self_attn.q_proj.weight,vision_model.encoder.layers.7.self_attn.q_proj.bias,vision_model.encoder.layers.7.self_attn.out_proj.weight,vision_model.encoder.layers.7.self_attn.out_proj.bias,vision_model.encoder.layers.7.layer_norm1.weight,vision_model.encoder.layers.7.layer_norm1.bias,vision_model.encoder.layers.7.mlp.fc1.weight,vision_model.encoder.layers.7.mlp.fc1.bias,vision_model.encoder.layers.7.mlp.fc2.weight,vision_model.encoder.layers.7.mlp.fc2.bias,vision_model.encoder.layers.7.layer_norm2.weight,vision_model.encoder.layers.7.layer_norm2.bias,vision_model.encoder.layers.8.self_attn.k_proj.weight,vision_model.encoder.layers.8.self_attn.k_proj.bias,vision_model.encoder.layers.8.self_attn.v_proj.weight,vision_model.encoder.layers.8.self_attn.v_proj.bias,vision_model.encoder.layers.8.self_attn.q_proj.weight,vision_model.encoder.layers.8.self_attn.q_proj.bias,vision_model.encoder.layers.8.self_attn.out_proj.weight,vision_model.encoder.layers.8.self_attn.out_proj.bias,vision_model.encoder.layers.8.layer_norm1.weight,vision_model.encoder.layers.8.layer_norm1.bias,vision_model.encoder.layers.8.mlp.fc1.weight,vision_model.encoder.layers.8.mlp.fc1.bias,vision_model.encoder.layers.8.mlp.fc2.weight,vision_model.encoder.layers.8.mlp.fc2.bias,vision_model.encoder.layers.8.layer_norm2.weight,vision_model.encoder.layers.8.layer_norm2.bias,vision_model.encoder.layers.9.self_attn.k_proj.weight,vision_model.encoder.layers.9.self_attn.k_proj.bias,vision_model.encoder.layers.9.self_attn.v_proj.weight,vision_model.encoder.layers.9.self_attn.v_proj.bias,vision_model.encoder.layers.9.self_attn.q_proj.weight,vision_model.encoder.layers.9.self_attn.q_proj.bias,vision_model.encoder.layers.9.self_attn.out_proj.weight,vision_model.encoder.layers.9.self_attn.out_proj.bias,vision_model.encoder.layers.9.layer_norm1.weight,vision_model.encoder.layers.9.layer_norm1.bias,vision_model.encoder.layers.9.mlp.fc1.weight,vision_model.encoder.layers.9.mlp.fc1.bias,vision_model.encoder.layers.9.mlp.fc2.weight,vision_model.encoder.layers.9.mlp.fc2.bias,vision_model.encoder.layers.9.layer_norm2.weight,vision_model.encoder.layers.9.layer_norm2.bias,vision_model.encoder.layers.10.self_attn.k_proj.weight,vision_model.encoder.layers.10.self_attn.k_proj.bias,vision_model.encoder.layers.10.self_attn.v_proj.weight,vision_model.encoder.layers.10.self_attn.v_proj.bias,vision_model.encoder.layers.10.self_attn.q_proj.weight,vision_model.encoder.layers.10.self_attn.q_proj.bias,vision_model.encoder.layers.10.self_attn.out_proj.weight,vision_model.encoder.layers.10.self_attn.out_proj.bias,vision_model.encoder.layers.10.layer_norm1.weight,vision_model.encoder.layers.10.layer_norm1.bias,vision_model.encoder.layers.10.mlp.fc1.weight,vision_model.encoder.layers.10.mlp.fc1.bias,vision_model.encoder.layers.10.mlp.fc2.weight,vision_model.encoder.layers.10.mlp.fc2.bias,vision_model.encoder.layers.10.layer_norm2.weight,vision_model.encoder.layers.10.layer_norm2.bias,vision_model.encoder.layers.11.self_attn.k_proj.weight,vision_model.encoder.layers.11.self_attn.k_proj.bias,vision_model.encoder.layers.11.self_attn.v_proj.weight,vision_model.encoder.layers.11.self_attn.v_proj.bias,vision_model.encoder.layers.11.self_attn.q_proj.weight,vision_model.encoder.layers.11.self_attn.q_proj.bias,vision_model.encoder.layers.11.self_attn.out_proj.weight,vision_model.encoder.layers.11.self_attn.out_proj.bias,vision_model.encoder.layers.11.layer_norm1.weight,vision_model.encoder.layers.11.layer_norm1.bias,vision_model.encoder.layers.11.mlp.fc1.weight,vision_model.encoder.layers.11.mlp.fc1.bias,vision_model.encoder.layers.11.mlp.fc2.weight,vision_model.encoder.layers.11.mlp.fc2.bias,vision_model.encoder.layers.11.layer_norm2.weight,vision_model.encoder.layers.11.layer_norm2.bias,vision_model.encoder.layers.12.self_attn.k_proj.weight,vision_model.encoder.layers.12.self_attn.k_proj.bias,vision_model.encoder.layers.12.self_attn.v_proj.weight,vision_model.encoder.layers.12.self_attn.v_proj.bias,vision_model.encoder.layers.12.self_attn.q_proj.weight,vision_model.encoder.layers.12.self_attn.q_proj.bias,vision_model.encoder.layers.12.self_attn.out_proj.weight,vision_model.encoder.layers.12.self_attn.out_proj.bias,vision_model.encoder.layers.12.layer_norm1.weight,vision_model.encoder.layers.12.layer_norm1.bias,vision_model.encoder.layers.12.mlp.fc1.weight,vision_model.encoder.layers.12.mlp.fc1.bias,vision_model.encoder.layers.12.mlp.fc2.weight,vision_model.encoder.layers.12.mlp.fc2.bias,vision_model.encoder.layers.12.layer_norm2.weight,vision_model.encoder.layers.12.layer_norm2.bias,vision_model.encoder.layers.13.self_attn.k_proj.weight,vision_model.encoder.layers.13.self_attn.k_proj.bias,vision_model.encoder.layers.13.self_attn.v_proj.weight,vision_model.encoder.layers.13.self_attn.v_proj.bias,vision_model.encoder.layers.13.self_attn.q_proj.weight,vision_model.encoder.layers.13.self_attn.q_proj.bias,vision_model.encoder.layers.13.self_attn.out_proj.weight,vision_model.encoder.layers.13.self_attn.out_proj.bias,vision_model.encoder.layers.13.layer_norm1.weight,vision_model.encoder.layers.13.layer_norm1.bias,vision_model.encoder.layers.13.mlp.fc1.weight,vision_model.encoder.layers.13.mlp.fc1.bias,vision_model.encoder.layers.13.mlp.fc2.weight,vision_model.encoder.layers.13.mlp.fc2.bias,vision_model.encoder.layers.13.layer_norm2.weight,vision_model.encoder.layers.13.layer_norm2.bias,vision_model.encoder.layers.14.self_attn.k_proj.weight,vision_model.encoder.layers.14.self_attn.k_proj.bias,vision_model.encoder.layers.14.self_attn.v_proj.weight,vision_model.encoder.layers.14.self_attn.v_proj.bias,vision_model.encoder.layers.14.self_attn.q_proj.weight,vision_model.encoder.layers.14.self_attn.q_proj.bias,vision_model.encoder.layers.14.self_attn.out_proj.weight,vision_model.encoder.layers.14.self_attn.out_proj.bias,vision_model.encoder.layers.14.layer_norm1.weight,vision_model.encoder.layers.14.layer_norm1.bias,vision_model.encoder.layers.14.mlp.fc1.weight,vision_model.encoder.layers.14.mlp.fc1.bias,vision_model.encoder.layers.14.mlp.fc2.weight,vision_model.encoder.layers.14.mlp.fc2.bias,vision_model.encoder.layers.14.layer_norm2.weight,vision_model.encoder.layers.14.layer_norm2.bias,vision_model.encoder.layers.15.self_attn.k_proj.weight,vision_model.encoder.layers.15.self_attn.k_proj.bias,vision_model.encoder.layers.15.self_attn.v_proj.weight,vision_model.encoder.layers.15.self_attn.v_proj.bias,vision_model.encoder.layers.15.self_attn.q_proj.weight,vision_model.encoder.layers.15.self_attn.q_proj.bias,vision_model.encoder.layers.15.self_attn.out_proj.weight,vision_model.encoder.layers.15.self_attn.out_proj.bias,vision_model.encoder.layers.15.layer_norm1.weight,vision_model.encoder.layers.15.layer_norm1.bias,vision_model.encoder.layers.15.mlp.fc1.weight,vision_model.encoder.layers.15.mlp.fc1.bias,vision_model.encoder.layers.15.mlp.fc2.weight,vision_model.encoder.layers.15.mlp.fc2.bias,vision_model.encoder.layers.15.layer_norm2.weight,vision_model.encoder.layers.15.layer_norm2.bias,vision_model.encoder.layers.16.self_attn.k_proj.weight,vision_model.encoder.layers.16.self_attn.k_proj.bias,vision_model.encoder.layers.16.self_attn.v_proj.weight,vision_model.encoder.layers.16.self_attn.v_proj.bias,vision_model.encoder.layers.16.self_attn.q_proj.weight,vision_model.encoder.layers.16.self_attn.q_proj.bias,vision_model.encoder.layers.16.self_attn.out_proj.weight,vision_model.encoder.layers.16.self_attn.out_proj.bias,vision_model.encoder.layers.16.layer_norm1.weight,vision_model.encoder.layers.16.layer_norm1.bias,vision_model.encoder.layers.16.mlp.fc1.weight,vision_model.encoder.layers.16.mlp.fc1.bias,vision_model.encoder.layers.16.mlp.fc2.weight,vision_model.encoder.layers.16.mlp.fc2.bias,vision_model.encoder.layers.16.layer_norm2.weight,vision_model.encoder.layers.16.layer_norm2.bias,vision_model.encoder.layers.17.self_attn.k_proj.weight,vision_model.encoder.layers.17.self_attn.k_proj.bias,vision_model.encoder.layers.17.self_attn.v_proj.weight,vision_model.encoder.layers.17.self_attn.v_proj.bias,vision_model.encoder.layers.17.self_attn.q_proj.weight,vision_model.encoder.layers.17.self_attn.q_proj.bias,vision_model.encoder.layers.17.self_attn.out_proj.weight,vision_model.encoder.layers.17.self_attn.out_proj.bias,vision_model.encoder.layers.17.layer_norm1.weight,vision_model.encoder.layers.17.layer_norm1.bias,vision_model.encoder.layers.17.mlp.fc1.weight,vision_model.encoder.layers.17.mlp.fc1.bias,vision_model.encoder.layers.17.mlp.fc2.weight,vision_model.encoder.layers.17.mlp.fc2.bias,vision_model.encoder.layers.17.layer_norm2.weight,vision_model.encoder.layers.17.layer_norm2.bias,vision_model.encoder.layers.18.self_attn.k_proj.weight,vision_model.encoder.layers.18.self_attn.k_proj.bias,vision_model.encoder.layers.18.self_attn.v_proj.weight,vision_model.encoder.layers.18.self_attn.v_proj.bias,vision_model.encoder.layers.18.self_attn.q_proj.weight,vision_model.encoder.layers.18.self_attn.q_proj.bias,vision_model.encoder.layers.18.self_attn.out_proj.weight,vision_model.encoder.layers.18.self_attn.out_proj.bias,vision_model.encoder.layers.18.layer_norm1.weight,vision_model.encoder.layers.18.layer_norm1.bias,vision_model.encoder.layers.18.mlp.fc1.weight,vision_model.encoder.layers.18.mlp.fc1.bias,vision_model.encoder.layers.18.mlp.fc2.weight,vision_model.encoder.layers.18.mlp.fc2.bias,vision_model.encoder.layers.18.layer_norm2.weight,vision_model.encoder.layers.18.layer_norm2.bias,vision_model.encoder.layers.19.self_attn.k_proj.weight,vision_model.encoder.layers.19.self_attn.k_proj.bias,vision_model.encoder.layers.19.self_attn.v_proj.weight,vision_model.encoder.layers.19.self_attn.v_proj.bias,vision_model.encoder.layers.19.self_attn.q_proj.weight,vision_model.encoder.layers.19.self_attn.q_proj.bias,vision_model.encoder.layers.19.self_attn.out_proj.weight,vision_model.encoder.layers.19.self_attn.out_proj.bias,vision_model.encoder.layers.19.layer_norm1.weight,vision_model.encoder.layers.19.layer_norm1.bias,vision_model.encoder.layers.19.mlp.fc1.weight,vision_model.encoder.layers.19.mlp.fc1.bias,vision_model.encoder.layers.19.mlp.fc2.weight,vision_model.encoder.layers.19.mlp.fc2.bias,vision_model.encoder.layers.19.layer_norm2.weight,vision_model.encoder.layers.19.layer_norm2.bias,vision_model.encoder.layers.20.self_attn.k_proj.weight,vision_model.encoder.layers.20.self_attn.k_proj.bias,vision_model.encoder.layers.20.self_attn.v_proj.weight,vision_model.encoder.layers.20.self_attn.v_proj.bias,vision_model.encoder.layers.20.self_attn.q_proj.weight,vision_model.encoder.layers.20.self_attn.q_proj.bias,vision_model.encoder.layers.20.self_attn.out_proj.weight,vision_model.encoder.layers.20.self_attn.out_proj.bias,vision_model.encoder.layers.20.layer_norm1.weight,vision_model.encoder.layers.20.layer_norm1.bias,vision_model.encoder.layers.20.mlp.fc1.weight,vision_model.encoder.layers.20.mlp.fc1.bias,vision_model.encoder.layers.20.mlp.fc2.weight,vision_model.encoder.layers.20.mlp.fc2.bias,vision_model.encoder.layers.20.layer_norm2.weight,vision_model.encoder.layers.20.layer_norm2.bias,vision_model.encoder.layers.21.self_attn.k_proj.weight,vision_model.encoder.layers.21.self_attn.k_proj.bias,vision_model.encoder.layers.21.self_attn.v_proj.weight,vision_model.encoder.layers.21.self_attn.v_proj.bias,vision_model.encoder.layers.21.self_attn.q_proj.weight,vision_model.encoder.layers.21.self_attn.q_proj.bias,vision_model.encoder.layers.21.self_attn.out_proj.weight,vision_model.encoder.layers.21.self_attn.out_proj.bias,vision_model.encoder.layers.21.layer_norm1.weight,vision_model.encoder.layers.21.layer_norm1.bias,vision_model.encoder.layers.21.mlp.fc1.weight,vision_model.encoder.layers.21.mlp.fc1.bias,vision_model.encoder.layers.21.mlp.fc2.weight,vision_model.encoder.layers.21.mlp.fc2.bias,vision_model.encoder.layers.21.layer_norm2.weight,vision_model.encoder.layers.21.layer_norm2.bias,vision_model.encoder.layers.22.self_attn.k_proj.weight,vision_model.encoder.layers.22.self_attn.k_proj.bias,vision_model.encoder.layers.22.self_attn.v_proj.weight,vision_model.encoder.layers.22.self_attn.v_proj.bias,vision_model.encoder.layers.22.self_attn.q_proj.weight,vision_model.encoder.layers.22.self_attn.q_proj.bias,vision_model.encoder.layers.22.self_attn.out_proj.weight,vision_model.encoder.layers.22.self_attn.out_proj.bias,vision_model.encoder.layers.22.layer_norm1.weight,vision_model.encoder.layers.22.layer_norm1.bias,vision_model.encoder.layers.22.mlp.fc1.weight,vision_model.encoder.layers.22.mlp.fc1.bias,vision_model.encoder.layers.22.mlp.fc2.weight,vision_model.encoder.layers.22.mlp.fc2.bias,vision_model.encoder.layers.22.layer_norm2.weight,vision_model.encoder.layers.22.layer_norm2.bias,vision_model.encoder.layers.23.self_attn.k_proj.weight,vision_model.encoder.layers.23.self_attn.k_proj.bias,vision_model.encoder.layers.23.self_attn.v_proj.weight,vision_model.encoder.layers.23.self_attn.v_proj.bias,vision_model.encoder.layers.23.self_attn.q_proj.weight,vision_model.encoder.layers.23.self_attn.q_proj.bias,vision_model.encoder.layers.23.self_attn.out_proj.weight,vision_model.encoder.layers.23.self_attn.out_proj.bias,vision_model.encoder.layers.23.layer_norm1.weight,vision_model.encoder.layers.23.layer_norm1.bias,vision_model.encoder.layers.23.mlp.fc1.weight,vision_model.encoder.layers.23.mlp.fc1.bias,vision_model.encoder.layers.23.mlp.fc2.weight,vision_model.encoder.layers.23.mlp.fc2.bias,vision_model.encoder.layers.23.layer_norm2.weight,vision_model.encoder.layers.23.layer_norm2.bias,vision_model.post_layernorm.weight,vision_model.post_layernorm.bias]. All weights are initialized. [2022-07-20 17:07:48,396 INFO] Using SimplePredict to predict... 5it [00:00, 5.19it/s]
一步执行
值得一提的是,上述所有训练/评估/预测代码,都已经被集成在EasyNLP/examples/appzoo_tutorials/text_vision/main.py中,此外,我们也预先编写好了多种可供直接执行的脚本。用户可以通过带参数运行main.py中指令,或者直接使用bash文件命令行执行的方式,一步执行上述所有训练/评估/预测操作。
main文件一步执行
用户通过以下代码带参数执行main.py中的指令,可直接对模型进行训练/评估/预测操作。
训练代码指令如下。参数中,tables指定了训练集和验证集tsv文件的路径,input_schema表示tsv的数据格式,first_sequence、second_sequence用于说明input_schema中哪些字段用于作为第一/第二列数据。模型存储的路径位于checkpoint_dir,learning_rate、epoch_num、random_seed、save_checkpoint_steps、sequence_length、train_batch_size等为训练的超参数。在本示例中,预训练模型指定为clip_chinese_roberta_large_with_vit_large。
! python main.py \ --mode train \ --worker_gpu=1 \ --tables=MUGE_MR_train_base64_part.tsv,MUGE_MR_valid_base64_part.tsv \ --input_schema=text:str:1,image:str:1 \ --first_sequence=text \ --second_sequence=image \ --checkpoint_dir=./clip_model/ \ --learning_rate=1e-4 \ --epoch_num=1 \ --random_seed=42 \ --save_checkpoint_steps=200 \ --sequence_length=32 \ --train_batch_size=32 \ --app_name=clip \ --user_defined_parameters='pretrain_model_name_or_path=clip_chinese_roberta_large_with_vit_large fix_vision=True mode=finetune'
评估代码如下,参数含义与训练是一致的。
! python main.py \ --mode evaluate \ --worker_gpu=1 \ --tables=MUGE_MR_valid_base64_part.tsv \ --input_schema=text:str:1,image:str:1 \ --first_sequence=text \ --second_sequence=image \ --checkpoint_dir=./clip_model/ \ --sequence_length=32 \ --micro_batch_size=32 \ --app_name=clip \ --user_defined_parameters=''
预测代码,也就是特征向量提取代码如下。参数同样与上面保持一致,以文本特征向量提取为例,输入为MUGE_MR_test_base64_part_text.tsv,输出结果可在text_feat.tsv中查看。
! python main.py \ --mode predict \ --worker_gpu=1 \ --tables=MUGE_MR_test_base64_part_text.tsv \ --outputs=text_feat.tsv \ --input_schema=text:str:1 \ --output_schema=text_feat \ --append_cols=text \ --first_sequence=text \ --second_sequence=image \ --checkpoint_path=./clip_model/ \ --micro_batch_size=32 \ --sequence_length=32 \ --app_name=clip \ --user_defined_parameters=''
利用bash文件命令行执行#
我们在EasyNLP/examples/appzoo_tutorials/text_vision/文件夹下封装好了多种可直接执行的bash脚本,用户同样可以通过直接使用bash文件命令行执行的方式来一步完成模型的训练/评估/预测。以下以run_train_eval_predict_user_defined_local.sh脚本为例。该bash文件需要传入两个参数,第一个参数为运行程序的GPU编号,一般为0;第二个参数代表模型的训练/评估/预测。
模型训练:
! bash run_train_eval_predict_user_defined_local.sh 0 train
模型评估:
! bash run_train_eval_predict_user_defined_local.sh 0 evaluate
模型预测:
! bash run_train_eval_predict_user_defined_local.sh 0 predict