直接使用
请打开基于EasyNLP的英文文本摘要,并点击右上角 “ 在DSW中打开” 。
基于Pegasus的英文摘要生成
文本摘要(Text Summarization)旨在从冗长、重复的文本序列中抽取、精炼或总结出其中的要点信息。Pegasus是由谷歌提出的一个序列到序列预训练模型,它为模型设计了一个难度较大的预训练目标,即为缺失句子的文档进行缺失句子生成,该模型在参数量小的前提下,在12个文本摘要数据集中超过或与最先进水平持平的性能。同时,在低资源的条件下,模型同样具有较好的文本摘要性能。
在EasyNLP中,我们提供了经过训练的Pegasus(其他可用模型见列表),以便用户能够受益于模型强大的建模能力。本文将以英文新闻标题生成任务为例,将Pegasus作为模型底座构建标题生成模型,展示如何利用EasyNLP进行模型构建、训练、评估、预测。
新增可用模型
hfl/brio-cnndm-uncased
运行环境要求
PAI-Pytorch 1.7/1.8镜像, GPU机型 P100 or V100, 内存32G
EasyNLP安装
建议从GitHub下载EasyNLP源代码进行安装,命令如下:
! git clone https://github.com/alibaba/EasyNLP.git ! pip install -r EasyNLP/requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ ! cd EasyNLP ! python setup.py install
您可以使用如下命令验证是否安装成功:
! which easynlp
如果您系统内已经安装完easynlp的CLI工具,则说明EasyNLP代码库已经安装。
数据准备
首先,您需要下载用于本示例的训练和测试集,并创建保存模型的文件夹,命令如下:
! wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/generation/en_train.tsv ! wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/generation/en_dev.tsv
--2022-08-25 10:51:39-- http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/generation/en_train.tsv Resolving atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27 Connecting to atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 4127467 (3.9M) [text/tab-separated-values] Saving to: ‘en_train.tsv.1’ en_train.tsv.1 100%[===================>] 3.94M 13.2MB/s in 0.3s 2022-08-25 10:51:40 (13.2 MB/s) - ‘en_train.tsv.1’ saved [4127467/4127467] --2022-08-25 10:51:40-- http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/generation/en_dev.tsv Resolving atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27 Connecting to atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2848282 (2.7M) [text/tab-separated-values] Saving to: ‘en_dev.tsv.1’ en_dev.tsv.1 100%[===================>] 2.72M 10.9MB/s in 0.3s 2022-08-25 10:51:41 (10.9 MB/s) - ‘en_dev.tsv.1’ saved [2848282/2848282]
数据下载完成后,可以通过以下代码查看第一条数据。在训练集和开发集中,每一行为一条新闻文本数据,包括新闻摘要和新闻原文,两者通过制表符(\t)隔开。
print('Training data sample:') ! head -n 1 en_train.tsv print('Development set data sample:') ! head -n 1 en_dev.tsv
Training data sample: President Joe Biden on Friday will present the Medal of Honor for the first time since he took office, honoring a U.S. veteran of the Korean War for his "conspicuous gallantry," the White House said. South Korean President Moon Jae-In, who is visiting the White House that day for talks with Biden, is scheduled to attend the ceremony, according to the White House. The recipient of the nation's highest military award, Col. Ralph Puckett Jr., is a resident of Columbus, Georgia, who served in the Korean War and the Vietnam War, the White House said in a press release. President Joe Biden on Friday will present the Medal of Honor for the first time since he took office, honoring a U.S. veteran of the Korean War for his "conspicuous gallantry," the White House said. South Korean President Moon Jae-In, who is visiting the White House that day for talks with Biden , is scheduled to attend the ceremony, according to the White House. The recipient of the nation's highest military award, Col. Ralph Puckett Jr., is a resident of Columbus, Georgia, who served in the Korean War and the Vietnam War, the White House said in a press release. Puckett "distinguished himself by acts of gallantry and intrepidity above and beyond the call of duty" in Korea in November 1950 by leading a unit of Army Rangers into a harrowing daylight attack on an enemy hill, the White House said. "To obtain supporting fire, First Lieutenant Puckett mounted the closest tank, exposing himself to the deadly enemy fire," the White House said. "Leaping from the tank, he shouted words of encouragement to his men and began to lead the Rangers in the attack." When enemy fire pinned down a platoon, Puckett left the relative safety of his position and "intentionally ran across an open area three times to draw enemy fire, thereby allowing the Rangers to locate and destroy the enemy positions and to seize Hill 205." "During the course of the night, the enemy launched a counterattack which lasted four hours," the press release said, adding that Puckett's leadership motivated the Rangers throughout. "As a result, five human wave attacks by a battalion strength enemy element were repulsed," the White House said. Puckett was injured by grenade fragments during the first of those waves, "but he refused evacuation and continually directed artillery support that decimated attacking enemy formations, repeatedly abandoned positions of relative safety to make his way from foxhole to foxhole to check the company's perimeter, and distributed ammunition amongst the Rangers," the White House said. On the sixth wave of attack, two mortar rounds landed in Puckett's foxhole, causing "grievous wounds," the press release said. "First Lieutenant Puckett commanded the Rangers to leave him behind and evacuate the area. Feeling a sense of duty to aid him, the Rangers refused the order and staged an effort to retrieve him from the foxhole while still under harassing fire from the enemy." "Ultimately, the Rangers succeeded in retrieving First Lieutenant Puckett and they moved to the bottom of the hill, where First Lieutenant Puckett called for devastating artillery fire on the top of the enemy controlled hill," the White House said. Puckett enlisted in the Army Enlisted Reserve Corps in 1943 and was discharged to the U.S. Military Academy in 1945. He served in the Korean War as a member of the 8th Army Ranger Company and in Vietnam as a member of the 101st Airborne Division. Development set data sample: Jeff Bezos' Blue Origin filed a protest with the Government Accountability Office against NASA on Monday. Blue Origin decried the award as "flawed" in a statement to CNBC, saying that NASA "moved the goalposts at the last minute." SpaceX was awarded $2.89 billion for NASA's Human Landing System program earlier this month, to build an astronaut lunar lander for the space agency. Jeff Bezos ' Blue Origin filed a protest with the Government Accountability Office against NASA on Monday, challenging the space agency's award of a nearly $3 billion moon lander contract to Elon Musk's SpaceX earlier this month. SpaceX, in a competition against Blue Origin and Leidos ' subsidiary Dynetics, was awarded $2.89 billion for NASA's Human Landing System program . The HLS program is focused on building a lunar lander that can carry astronauts to the moon's surface under NASA's Artemis missions. For HLS, SpaceX bid a variation of its Starship rocket, prototypes of which the company has been testing at its facility in Texas. NASA was previously expected to choose two of the three teams to competitively build lunar landers, making the sole selection of SpaceX a surprise given the agency's prior goals for the program to continue to be a competition. Blue Origin decried the award as "flawed" in a statement to CNBC, saying that NASA "moved the goalposts at the last minute." "In NASA's own words, it has made a 'high risk' selection. Their decision eliminates opportunities for competition, significantly narrows the supply base, and not only delays, but also endangers America's return to the Moon. Because of that, we've filed a protest with the GAO," Blue Origin said. Blue Origin revealed that NASA evaluated the company's HLS proposal to cost $5.99 billion, or roughly twice that of SpaceX. The company argued in its protest filing that NASA's cost for funding both proposals would have been under $9 billion – or near how much the agency spent for SpaceX and Boeing to develop competing astronaut capsules under the Commercial Crew program . "In failing to maintain two sources ... NASA's selection decision creates a number of issues for the HLS program and puts all of NASA's eggs in one basket," Blue Origin wrote in the protest. The New York Times first reported Blue Origin's GAO protest. Blue Origin based its protest around five objections. First, Bezos' company said NASA did not give SpaceX's competitors an opportunity to "meaningfully compete" after "the agency's requirements changed due to its undisclosed, perceived shortfall of funding" for the HLS program. Second and third, Blue Origin said that NASA's acquisition was flawed under the agency's acquisition rules and its evaluation of the company's proposal "unreasonable." Fourth, the company asserted that NASA "improperly and disparately" evaluated SpaceX's proposal. And finally, Blue Origin said that NASA's evaluation of the proposals changed the weight it gave to key criteria, making price "the most important factor because of perceived funding limitations." The company highlighted work done to develop its lunar lander, including an undisclosed amount of its own investment into the BE-7 rocket engine that it planned to use for the spacecraft. "Blue Origin's substantial commercial investment in the BE-7 engine program is direct evidence of its corporate commitment in lunar exploration," the company wrote in the GAO protest. The space agency announced the SpaceX contract on April 16, with a source selection document written by human spaceflight director Kathy Lueders outlining NASA's reasons for its decision. NASA's based its selection on three primary factors: Technical ability, price, and then management rating. SpaceX and Blue Origin both received "acceptable" technical ratings, with SpaceX's price the lowest "by a wide margin" and its management rating was "outstanding" – while Blue Origin's management was rated as "very good," the same as Dynetics. Notably, NASA's selection committee said it found "two instances of proposed advance payments within Blue Origin's proposal." "I concur with the ... assessment that these kickoff meeting-related payments are counter to the solicitation's instructions and render Blue Origin's proposal ineligible for award," Lueders wrote. NASA requested $3.4 billion for the HLS program in fiscal year 2021, but Congress approved only $850 million. In light of that lower-than-expected funding, Lueders acknowledged that picking only one company's proposal for the HLS program was "not NASA's optimal outcome" but within the agency's acquisition rules. Last week, Musk hailed the NASA selection as a "great honor" and said he thinks the agency's goal of landing astronauts on the moon by 2024 is "actually doable." "It's been now almost half a century since humans were last on the moon. That's too long, we need to get back there and have a permanent base on the moon — again, like a big permanently occupied base on the moon," Musk said.
初始化
在Python 3.6环境下,我们首先从刚刚安装好的EasyNLP中引入模型运行需要的各种库,并做一些初始化。在本教程中,我们使用pegasus-summary-generation-en作为预训练模型底座。
# 为了避免EasyNLP中的args与Jupyter系统的冲突,需要手动设置,否则无法进行初始化。 # 在命令行或py文件中运行文中代码则可忽略下述代码。 import sys sys.argv = ['main.py']
import imp import sys import os import torch.cuda sys.path.append('./') from easynlp.core import Trainer from easynlp.appzoo.sequence_generation.data import SequenceGenerationDataset from easynlp.appzoo.sequence_generation.model import SequenceGeneration from easynlp.appzoo.sequence_generation.evaluator import SequenceGenerationEvaluator from easynlp.appzoo.sequence_generation.predictor import SequenceGenerationPredictor from easynlp.appzoo import get_application_model_for_evaluation from easynlp.utils import initialize_easynlp, get_args from easynlp.utils.global_vars import parse_user_defined_parameters from easynlp.core import PredictorManager from easynlp.utils import get_pretrain_model_path initialize_easynlp() args = get_args() user_defined_parameters = 'language=en pretrain_model_name_or_path=alibaba-pai/pegasus-summary-generation-en copy=false max_encoder_length=512 min_decoder_length=64 max_decoder_length=128 no_repeat_ngram_size=2 num_beams=5 num_return_sequences=5' user_defined_parameters = parse_user_defined_parameters(user_defined_parameters) args.checkpoint_dir = "./finetuned_en_model"
[2022-08-25 11:31:07,183.183 dsw34730-66c85d4cdb-6v2c6:85473 INFO utils.py:30] NOTICE: PAIDEBUGGER is turned off. /home/pai/lib/python3.6/site-packages/OpenSSL/crypto.py:12: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release. from cryptography import x509 Please ignore the following import error if you are using tunnel table io. No module named '_common_io'
No module named 'easy_predict' ------------------------ arguments ------------------------ app_name ........................................ text_classify append_cols ..................................... None buckets ......................................... None checkpoint_dir .................................. None chief_hosts ..................................... data_threads .................................... 10 distributed_backend ............................. nccl do_lower_case ................................... False epoch_num ....................................... 3.0 export_tf_checkpoint_type ....................... easytransfer first_sequence .................................. None gradient_accumulation_steps ..................... 1 input_schema .................................... None is_chief ........................................ is_master_node .................................. True job_name ........................................ None label_enumerate_values .......................... None label_name ...................................... None learning_rate ................................... 5e-05 local_rank ...................................... None logging_steps ................................... 100 master_port ..................................... 23456 max_grad_norm ................................... 1.0 micro_batch_size ................................ 2 mode ............................................ train modelzoo_base_dir ............................... n_cpu ........................................... 1 n_gpu ........................................... 1 odps_config ..................................... None optimizer_type .................................. AdamW output_schema ................................... outputs ......................................... None predict_queue_size .............................. 1024 predict_slice_size .............................. 4096 predict_table_read_thread_num ................... 16 predict_thread_num .............................. 2 ps_hosts ........................................ random_seed ..................................... 1234 rank ............................................ 0 read_odps ....................................... False restore_works_dir ............................... ./.easynlp_predict_restore_works_dir resume_from_checkpoint .......................... None save_all_checkpoints ............................ False save_checkpoint_steps ........................... None second_sequence ................................. None sequence_length ................................. 16 skip_first_line ................................. False tables .......................................... None task_count ...................................... 1 task_index ...................................... 0 use_amp ......................................... False use_torchacc .................................... False user_defined_parameters ......................... None user_entry_file ................................. None user_script ..................................... None warmup_proportion ............................... 0.1 weight_decay .................................... 0.0001 worker_count .................................... 1 worker_cpu ...................................... -1 worker_gpu ...................................... -1 worker_hosts .................................... None world_size ...................................... 1 -------------------- end of arguments --------------------- > initializing torch distributed ...
[2022-08-25 11:31:09,214.214 dsw34730-66c85d4cdb-6v2c6:85473 INFO distributed_c10d.py:195] Added key: store_based_barrier_key:1 to store for rank: 0
Init dist done. World size: 1, rank 0, l_rank 0 > setting random seeds to 1234 ...
注意:上述代码如果出现“Address already in use”错误,则需要运行以下代码清理端口(默认为6000)上正在执行的程序。
netstat -tunlp|grep 6000
kill -9 PID (需要替换成上一行代码执行结果中对应的程序ID)
载入数据
我们使用EasyNLP中自带的SequenceGenerationDataset,对训练和测试数据进行载入。主要参数如下:
- language: EasyNLP中默认语言为中文,因此此处需要传入一个语言参数来指定英文文本摘要训练
- pretrained_model_name_or_path:预训练模型名称路径,这里我们使用封装好的get_pretrain_model_path函数,来处理模型名称”pegasus-summary-generation-en”,并自动下载模型
- max_seq_length:文本最大长度,超过将截断,不足将padding
- input_schema:输入数据的格式,逗号分隔的每一项对应数据文件中每行以\t分隔的一项,每项开头为其字段标识,如label、sent1等
- first_sequence、label_name:用于说明input_schema中哪些字段用于作为输入句子和标签列等
- label_enumerate_values:label类型列举
- is_training:是否为训练过程,train_dataset为True,valid_dataset为False
- app_name:指定当前需要执行的任务,如文本分类、序列标注、文本匹配、文本生成等
下面我们将手动设置一些参数以便进行实验。
args.tables = "./en_train.tsv,./en_dev.tsv" args.input_schema = "title:str:1,content:str:1" args.first_sequence = "content" args.second_sequence = "title" args.label_name = "title" args.learning_rate = 3e-5 args.epoch_num = 1 args.save_checkpoint_steps = 500 args.sequence_length = 512 args.micro_batch_size = 8 args.export_tf_checkpoint_type = "none" args.app_name = "sequence_generation" args.pretrained_model_name_or_path = user_defined_parameters.get('pretrain_model_name_or_path', None) args.pretrained_model_name_or_path = get_pretrain_model_path(args.pretrained_model_name_or_path) train_dataset = SequenceGenerationDataset( pretrained_model_name_or_path=args.pretrained_model_name_or_path, data_file=args.tables.split(",")[0], max_seq_length=args.sequence_length, input_schema=args.input_schema, first_sequence=args.first_sequence, second_sequence=args.second_sequence, user_defined_parameters=user_defined_parameters, is_training=True) valid_dataset = SequenceGenerationDataset( pretrained_model_name_or_path=args.pretrained_model_name_or_path, data_file=args.tables.split(",")[-1], max_seq_length=args.sequence_length, input_schema=args.input_schema, first_sequence=args.first_sequence, second_sequence=args.second_sequence, user_defined_parameters=user_defined_parameters, is_training=False)
`/root/.easynlp/modelzoo/alibaba-pai/pegasus-summary-generation-en.tgz` already exists
模型训练
处理好数据与模型载入后,我们开始训练模型。 我们使用EasyNLP中封装好的SequenceGeneration函数进行训练时的模型构建,其参数如下:
- pretrained_model_name_or_path:预训练模型名称路径,这里我们使用封装好的get_pretrain_model_path函数,来处理模型名称”pegasus-summary-generation-en”,并自动下载模型
- user_defined_parameters:用户自定义参数,直接填入刚刚处理好的自定义参数user_defined_parameters
构建模型并读取
model = SequenceGeneration(pretrained_model_name_or_path=args.pretrained_model_name_or_path, user_defined_parameters=user_defined_parameters)
Loaded weights of the model: [final_logits_bias,model.shared.weight,model.encoder.embed_tokens.weight,model.encoder.embed_positions.weight,model.encoder.layers.0.self_attn.k_proj.weight,model.encoder.layers.0.self_attn.k_proj.bias,model.encoder.layers.0.self_attn.v_proj.weight,model.encoder.layers.0.self_attn.v_proj.bias,model.encoder.layers.0.self_attn.q_proj.weight,model.encoder.layers.0.self_attn.q_proj.bias,model.encoder.layers.0.self_attn.out_proj.weight,model.encoder.layers.0.self_attn.out_proj.bias,model.encoder.layers.0.self_attn_layer_norm.weight,model.encoder.layers.0.self_attn_layer_norm.bias,model.encoder.layers.0.fc1.weight,model.encoder.layers.0.fc1.bias,model.encoder.layers.0.fc2.weight,model.encoder.layers.0.fc2.bias,model.encoder.layers.0.final_layer_norm.weight,model.encoder.layers.0.final_layer_norm.bias,model.encoder.layers.1.self_attn.k_proj.weight,model.encoder.layers.1.self_attn.k_proj.bias,model.encoder.layers.1.self_attn.v_proj.weight,model.encoder.layers.1.self_attn.v_proj.bias,model.encoder.layers.1.self_attn.q_proj.weight,model.encoder.layers.1.self_attn.q_proj.bias,model.encoder.layers.1.self_attn.out_proj.weight,model.encoder.layers.1.self_attn.out_proj.bias,model.encoder.layers.1.self_attn_layer_norm.weight,model.encoder.layers.1.self_attn_layer_norm.bias,model.encoder.layers.1.fc1.weight,model.encoder.layers.1.fc1.bias,model.encoder.layers.1.fc2.weight,model.encoder.layers.1.fc2.bias,model.encoder.layers.1.final_layer_norm.weight,model.encoder.layers.1.final_layer_norm.bias,model.encoder.layers.2.self_attn.k_proj.weight,model.encoder.layers.2.self_attn.k_proj.bias,model.encoder.layers.2.self_attn.v_proj.weight,model.encoder.layers.2.self_attn.v_proj.bias,model.encoder.layers.2.self_attn.q_proj.weight,model.encoder.layers.2.self_attn.q_proj.bias,model.encoder.layers.2.self_attn.out_proj.weight,model.encoder.layers.2.self_attn.out_proj.bias,model.encoder.layers.2.self_attn_layer_norm.weight,model.encoder.layers.2.self_attn_layer_norm.bias,model.encoder.layers.2.fc1.weight,model.encoder.layers.2.fc1.bias,model.encoder.layers.2.fc2.weight,model.encoder.layers.2.fc2.bias,model.encoder.layers.2.final_layer_norm.weight,model.encoder.layers.2.final_layer_norm.bias,model.encoder.layers.3.self_attn.k_proj.weight,model.encoder.layers.3.self_attn.k_proj.bias,model.encoder.layers.3.self_attn.v_proj.weight,model.encoder.layers.3.self_attn.v_proj.bias,model.encoder.layers.3.self_attn.q_proj.weight,model.encoder.layers.3.self_attn.q_proj.bias,model.encoder.layers.3.self_attn.out_proj.weight,model.encoder.layers.3.self_attn.out_proj.bias,model.encoder.layers.3.self_attn_layer_norm.weight,model.encoder.layers.3.self_attn_layer_norm.bias,model.encoder.layers.3.fc1.weight,model.encoder.layers.3.fc1.bias,model.encoder.layers.3.fc2.weight,model.encoder.layers.3.fc2.bias,model.encoder.layers.3.final_layer_norm.weight,model.encoder.layers.3.final_layer_norm.bias,model.encoder.layers.4.self_attn.k_proj.weight,model.encoder.layers.4.self_attn.k_proj.bias,model.encoder.layers.4.self_attn.v_proj.weight,model.encoder.layers.4.self_attn.v_proj.bias,model.encoder.layers.4.self_attn.q_proj.weight,model.encoder.layers.4.self_attn.q_proj.bias,model.encoder.layers.4.self_attn.out_proj.weight,model.encoder.layers.4.self_attn.out_proj.bias,model.encoder.layers.4.self_attn_layer_norm.weight,model.encoder.layers.4.self_attn_layer_norm.bias,model.encoder.layers.4.fc1.weight,model.encoder.layers.4.fc1.bias,model.encoder.layers.4.fc2.weight,model.encoder.layers.4.fc2.bias,model.encoder.layers.4.final_layer_norm.weight,model.encoder.layers.4.final_layer_norm.bias,model.encoder.layers.5.self_attn.k_proj.weight,model.encoder.layers.5.self_attn.k_proj.bias,model.encoder.layers.5.self_attn.v_proj.weight,model.encoder.layers.5.self_attn.v_proj.bias,model.encoder.layers.5.self_attn.q_proj.weight,model.encoder.layers.5.self_attn.q_proj.bias,model.encoder.layers.5.self_attn.out_proj.weight,model.encoder.layers.5.self_attn.out_proj.bias,model.encoder.layers.5.self_attn_layer_norm.weight,model.encoder.layers.5.self_attn_layer_norm.bias,model.encoder.layers.5.fc1.weight,model.encoder.layers.5.fc1.bias,model.encoder.layers.5.fc2.weight,model.encoder.layers.5.fc2.bias,model.encoder.layers.5.final_layer_norm.weight,model.encoder.layers.5.final_layer_norm.bias,model.encoder.layers.6.self_attn.k_proj.weight,model.encoder.layers.6.self_attn.k_proj.bias,model.encoder.layers.6.self_attn.v_proj.weight,model.encoder.layers.6.self_attn.v_proj.bias,model.encoder.layers.6.self_attn.q_proj.weight,model.encoder.layers.6.self_attn.q_proj.bias,model.encoder.layers.6.self_attn.out_proj.weight,model.encoder.layers.6.self_attn.out_proj.bias,model.encoder.layers.6.self_attn_layer_norm.weight,model.encoder.layers.6.self_attn_layer_norm.bias,model.encoder.layers.6.fc1.weight,model.encoder.layers.6.fc1.bias,model.encoder.layers.6.fc2.weight,model.encoder.layers.6.fc2.bias,model.encoder.layers.6.final_layer_norm.weight,model.encoder.layers.6.final_layer_norm.bias,model.encoder.layers.7.self_attn.k_proj.weight,model.encoder.layers.7.self_attn.k_proj.bias,model.encoder.layers.7.self_attn.v_proj.weight,model.encoder.layers.7.self_attn.v_proj.bias,model.encoder.layers.7.self_attn.q_proj.weight,model.encoder.layers.7.self_attn.q_proj.bias,model.encoder.layers.7.self_attn.out_proj.weight,model.encoder.layers.7.self_attn.out_proj.bias,model.encoder.layers.7.self_attn_layer_norm.weight,model.encoder.layers.7.self_attn_layer_norm.bias,model.encoder.layers.7.fc1.weight,model.encoder.layers.7.fc1.bias,model.encoder.layers.7.fc2.weight,model.encoder.layers.7.fc2.bias,model.encoder.layers.7.final_layer_norm.weight,model.encoder.layers.7.final_layer_norm.bias,model.encoder.layers.8.self_attn.k_proj.weight,model.encoder.layers.8.self_attn.k_proj.bias,model.encoder.layers.8.self_attn.v_proj.weight,model.encoder.layers.8.self_attn.v_proj.bias,model.encoder.layers.8.self_attn.q_proj.weight,model.encoder.layers.8.self_attn.q_proj.bias,model.encoder.layers.8.self_attn.out_proj.weight,model.encoder.layers.8.self_attn.out_proj.bias,model.encoder.layers.8.self_attn_layer_norm.weight,model.encoder.layers.8.self_attn_layer_norm.bias,model.encoder.layers.8.fc1.weight,model.encoder.layers.8.fc1.bias,model.encoder.layers.8.fc2.weight,model.encoder.layers.8.fc2.bias,model.encoder.layers.8.final_layer_norm.weight,model.encoder.layers.8.final_layer_norm.bias,model.encoder.layers.9.self_attn.k_proj.weight,model.encoder.layers.9.self_attn.k_proj.bias,model.encoder.layers.9.self_attn.v_proj.weight,model.encoder.layers.9.self_attn.v_proj.bias,model.encoder.layers.9.self_attn.q_proj.weight,model.encoder.layers.9.self_attn.q_proj.bias,model.encoder.layers.9.self_attn.out_proj.weight,model.encoder.layers.9.self_attn.out_proj.bias,model.encoder.layers.9.self_attn_layer_norm.weight,model.encoder.layers.9.self_attn_layer_norm.bias,model.encoder.layers.9.fc1.weight,model.encoder.layers.9.fc1.bias,model.encoder.layers.9.fc2.weight,model.encoder.layers.9.fc2.bias,model.encoder.layers.9.final_layer_norm.weight,model.encoder.layers.9.final_layer_norm.bias,model.encoder.layers.10.self_attn.k_proj.weight,model.encoder.layers.10.self_attn.k_proj.bias,model.encoder.layers.10.self_attn.v_proj.weight,model.encoder.layers.10.self_attn.v_proj.bias,model.encoder.layers.10.self_attn.q_proj.weight,model.encoder.layers.10.self_attn.q_proj.bias,model.encoder.layers.10.self_attn.out_proj.weight,model.encoder.layers.10.self_attn.out_proj.bias,model.encoder.layers.10.self_attn_layer_norm.weight,model.encoder.layers.10.self_attn_layer_norm.bias,model.encoder.layers.10.fc1.weight,model.encoder.layers.10.fc1.bias,model.encoder.layers.10.fc2.weight,model.encoder.layers.10.fc2.bias,model.encoder.layers.10.final_layer_norm.weight,model.encoder.layers.10.final_layer_norm.bias,model.encoder.layers.11.self_attn.k_proj.weight,model.encoder.layers.11.self_attn.k_proj.bias,model.encoder.layers.11.self_attn.v_proj.weight,model.encoder.layers.11.self_attn.v_proj.bias,model.encoder.layers.11.self_attn.q_proj.weight,model.encoder.layers.11.self_attn.q_proj.bias,model.encoder.layers.11.self_attn.out_proj.weight,model.encoder.layers.11.self_attn.out_proj.bias,model.encoder.layers.11.self_attn_layer_norm.weight,model.encoder.layers.11.self_attn_layer_norm.bias,model.encoder.layers.11.fc1.weight,model.encoder.layers.11.fc1.bias,model.encoder.layers.11.fc2.weight,model.encoder.layers.11.fc2.bias,model.encoder.layers.11.final_layer_norm.weight,model.encoder.layers.11.final_layer_norm.bias,model.encoder.layers.12.self_attn.k_proj.weight,model.encoder.layers.12.self_attn.k_proj.bias,model.encoder.layers.12.self_attn.v_proj.weight,model.encoder.layers.12.self_attn.v_proj.bias,model.encoder.layers.12.self_attn.q_proj.weight,model.encoder.layers.12.self_attn.q_proj.bias,model.encoder.layers.12.self_attn.out_proj.weight,model.encoder.layers.12.self_attn.out_proj.bias,model.encoder.layers.12.self_attn_layer_norm.weight,model.encoder.layers.12.self_attn_layer_norm.bias,model.encoder.layers.12.fc1.weight,model.encoder.layers.12.fc1.bias,model.encoder.layers.12.fc2.weight,model.encoder.layers.12.fc2.bias,model.encoder.layers.12.final_layer_norm.weight,model.encoder.layers.12.final_layer_norm.bias,model.encoder.layers.13.self_attn.k_proj.weight,model.encoder.layers.13.self_attn.k_proj.bias,model.encoder.layers.13.self_attn.v_proj.weight,model.encoder.layers.13.self_attn.v_proj.bias,model.encoder.layers.13.self_attn.q_proj.weight,model.encoder.layers.13.self_attn.q_proj.bias,model.encoder.layers.13.self_attn.out_proj.weight,model.encoder.layers.13.self_attn.out_proj.bias,model.encoder.layers.13.self_attn_layer_norm.weight,model.encoder.layers.13.self_attn_layer_norm.bias,model.encoder.layers.13.fc1.weight,model.encoder.layers.13.fc1.bias,model.encoder.layers.13.fc2.weight,model.encoder.layers.13.fc2.bias,model.encoder.layers.13.final_layer_norm.weight,model.encoder.layers.13.final_layer_norm.bias,model.encoder.layers.14.self_attn.k_proj.weight,model.encoder.layers.14.self_attn.k_proj.bias,model.encoder.layers.14.self_attn.v_proj.weight,model.encoder.layers.14.self_attn.v_proj.bias,model.encoder.layers.14.self_attn.q_proj.weight,model.encoder.layers.14.self_attn.q_proj.bias,model.encoder.layers.14.self_attn.out_proj.weight,model.encoder.layers.14.self_attn.out_proj.bias,model.encoder.layers.14.self_attn_layer_norm.weight,model.encoder.layers.14.self_attn_layer_norm.bias,model.encoder.layers.14.fc1.weight,model.encoder.layers.14.fc1.bias,model.encoder.layers.14.fc2.weight,model.encoder.layers.14.fc2.bias,model.encoder.layers.14.final_layer_norm.weight,model.encoder.layers.14.final_layer_norm.bias,model.encoder.layers.15.self_attn.k_proj.weight,model.encoder.layers.15.self_attn.k_proj.bias,model.encoder.layers.15.self_attn.v_proj.weight,model.encoder.layers.15.self_attn.v_proj.bias,model.encoder.layers.15.self_attn.q_proj.weight,model.encoder.layers.15.self_attn.q_proj.bias,model.encoder.layers.15.self_attn.out_proj.weight,model.encoder.layers.15.self_attn.out_proj.bias,model.encoder.layers.15.self_attn_layer_norm.weight,model.encoder.layers.15.self_attn_layer_norm.bias,model.encoder.layers.15.fc1.weight,model.encoder.layers.15.fc1.bias,model.encoder.layers.15.fc2.weight,model.encoder.layers.15.fc2.bias,model.encoder.layers.15.final_layer_norm.weight,model.encoder.layers.15.final_layer_norm.bias,model.encoder.layer_norm.weight,model.encoder.layer_norm.bias,model.decoder.embed_tokens.weight,model.decoder.embed_positions.weight,model.decoder.layers.0.self_attn.k_proj.weight,model.decoder.layers.0.self_attn.k_proj.bias,model.decoder.layers.0.self_attn.v_proj.weight,model.decoder.layers.0.self_attn.v_proj.bias,model.decoder.layers.0.self_attn.q_proj.weight,model.decoder.layers.0.self_attn.q_proj.bias,model.decoder.layers.0.self_attn.out_proj.weight,model.decoder.layers.0.self_attn.out_proj.bias,model.decoder.layers.0.self_attn_layer_norm.weight,model.decoder.layers.0.self_attn_layer_norm.bias,model.decoder.layers.0.encoder_attn.k_proj.weight,model.decoder.layers.0.encoder_attn.k_proj.bias,model.decoder.layers.0.encoder_attn.v_proj.weight,model.decoder.layers.0.encoder_attn.v_proj.bias,model.decoder.layers.0.encoder_attn.q_proj.weight,model.decoder.layers.0.encoder_attn.q_proj.bias,model.decoder.layers.0.encoder_attn.out_proj.weight,model.decoder.layers.0.encoder_attn.out_proj.bias,model.decoder.layers.0.encoder_attn_layer_norm.weight,model.decoder.layers.0.encoder_attn_layer_norm.bias,model.decoder.layers.0.fc1.weight,model.decoder.layers.0.fc1.bias,model.decoder.layers.0.fc2.weight,model.decoder.layers.0.fc2.bias,model.decoder.layers.0.final_layer_norm.weight,model.decoder.layers.0.final_layer_norm.bias,model.decoder.layers.1.self_attn.k_proj.weight,model.decoder.layers.1.self_attn.k_proj.bias,model.decoder.layers.1.self_attn.v_proj.weight,model.decoder.layers.1.self_attn.v_proj.bias,model.decoder.layers.1.self_attn.q_proj.weight,model.decoder.layers.1.self_attn.q_proj.bias,model.decoder.layers.1.self_attn.out_proj.weight,model.decoder.layers.1.self_attn.out_proj.bias,model.decoder.layers.1.self_attn_layer_norm.weight,model.decoder.layers.1.self_attn_layer_norm.bias,model.decoder.layers.1.encoder_attn.k_proj.weight,model.decoder.layers.1.encoder_attn.k_proj.bias,model.decoder.layers.1.encoder_attn.v_proj.weight,model.decoder.layers.1.encoder_attn.v_proj.bias,model.decoder.layers.1.encoder_attn.q_proj.weight,model.decoder.layers.1.encoder_attn.q_proj.bias,model.decoder.layers.1.encoder_attn.out_proj.weight,model.decoder.layers.1.encoder_attn.out_proj.bias,model.decoder.layers.1.encoder_attn_layer_norm.weight,model.decoder.layers.1.encoder_attn_layer_norm.bias,model.decoder.layers.1.fc1.weight,model.decoder.layers.1.fc1.bias,model.decoder.layers.1.fc2.weight,model.decoder.layers.1.fc2.bias,model.decoder.layers.1.final_layer_norm.weight,model.decoder.layers.1.final_layer_norm.bias,model.decoder.layers.2.self_attn.k_proj.weight,model.decoder.layers.2.self_attn.k_proj.bias,model.decoder.layers.2.self_attn.v_proj.weight,model.decoder.layers.2.self_attn.v_proj.bias,model.decoder.layers.2.self_attn.q_proj.weight,model.decoder.layers.2.self_attn.q_proj.bias,model.decoder.layers.2.self_attn.out_proj.weight,model.decoder.layers.2.self_attn.out_proj.bias,model.decoder.layers.2.self_attn_layer_norm.weight,model.decoder.layers.2.self_attn_layer_norm.bias,model.decoder.layers.2.encoder_attn.k_proj.weight,model.decoder.layers.2.encoder_attn.k_proj.bias,model.decoder.layers.2.encoder_attn.v_proj.weight,model.decoder.layers.2.encoder_attn.v_proj.bias,model.decoder.layers.2.encoder_attn.q_proj.weight,model.decoder.layers.2.encoder_attn.q_proj.bias,model.decoder.layers.2.encoder_attn.out_proj.weight,model.decoder.layers.2.encoder_attn.out_proj.bias,model.decoder.layers.2.encoder_attn_layer_norm.weight,model.decoder.layers.2.encoder_attn_layer_norm.bias,model.decoder.layers.2.fc1.weight,model.decoder.layers.2.fc1.bias,model.decoder.layers.2.fc2.weight,model.decoder.layers.2.fc2.bias,model.decoder.layers.2.final_layer_norm.weight,model.decoder.layers.2.final_layer_norm.bias,model.decoder.layers.3.self_attn.k_proj.weight,model.decoder.layers.3.self_attn.k_proj.bias,model.decoder.layers.3.self_attn.v_proj.weight,model.decoder.layers.3.self_attn.v_proj.bias,model.decoder.layers.3.self_attn.q_proj.weight,model.decoder.layers.3.self_attn.q_proj.bias,model.decoder.layers.3.self_attn.out_proj.weight,model.decoder.layers.3.self_attn.out_proj.bias,model.decoder.layers.3.self_attn_layer_norm.weight,model.decoder.layers.3.self_attn_layer_norm.bias,model.decoder.layers.3.encoder_attn.k_proj.weight,model.decoder.layers.3.encoder_attn.k_proj.bias,model.decoder.layers.3.encoder_attn.v_proj.weight,model.decoder.layers.3.encoder_attn.v_proj.bias,model.decoder.layers.3.encoder_attn.q_proj.weight,model.decoder.layers.3.encoder_attn.q_proj.bias,model.decoder.layers.3.encoder_attn.out_proj.weight,model.decoder.layers.3.encoder_attn.out_proj.bias,model.decoder.layers.3.encoder_attn_layer_norm.weight,model.decoder.layers.3.encoder_attn_layer_norm.bias,model.decoder.layers.3.fc1.weight,model.decoder.layers.3.fc1.bias,model.decoder.layers.3.fc2.weight,model.decoder.layers.3.fc2.bias,model.decoder.layers.3.final_layer_norm.weight,model.decoder.layers.3.final_layer_norm.bias,model.decoder.layers.4.self_attn.k_proj.weight,model.decoder.layers.4.self_attn.k_proj.bias,model.decoder.layers.4.self_attn.v_proj.weight,model.decoder.layers.4.self_attn.v_proj.bias,model.decoder.layers.4.self_attn.q_proj.weight,model.decoder.layers.4.self_attn.q_proj.bias,model.decoder.layers.4.self_attn.out_proj.weight,model.decoder.layers.4.self_attn.out_proj.bias,model.decoder.layers.4.self_attn_layer_norm.weight,model.decoder.layers.4.self_attn_layer_norm.bias,model.decoder.layers.4.encoder_attn.k_proj.weight,model.decoder.layers.4.encoder_attn.k_proj.bias,model.decoder.layers.4.encoder_attn.v_proj.weight,model.decoder.layers.4.encoder_attn.v_proj.bias,model.decoder.layers.4.encoder_attn.q_proj.weight,model.decoder.layers.4.encoder_attn.q_proj.bias,model.decoder.layers.4.encoder_attn.out_proj.weight,model.decoder.layers.4.encoder_attn.out_proj.bias,model.decoder.layers.4.encoder_attn_layer_norm.weight,model.decoder.layers.4.encoder_attn_layer_norm.bias,model.decoder.layers.4.fc1.weight,model.decoder.layers.4.fc1.bias,model.decoder.layers.4.fc2.weight,model.decoder.layers.4.fc2.bias,model.decoder.layers.4.final_layer_norm.weight,model.decoder.layers.4.final_layer_norm.bias,model.decoder.layers.5.self_attn.k_proj.weight,model.decoder.layers.5.self_attn.k_proj.bias,model.decoder.layers.5.self_attn.v_proj.weight,model.decoder.layers.5.self_attn.v_proj.bias,model.decoder.layers.5.self_attn.q_proj.weight,model.decoder.layers.5.self_attn.q_proj.bias,model.decoder.layers.5.self_attn.out_proj.weight,model.decoder.layers.5.self_attn.out_proj.bias,model.decoder.layers.5.self_attn_layer_norm.weight,model.decoder.layers.5.self_attn_layer_norm.bias,model.decoder.layers.5.encoder_attn.k_proj.weight,model.decoder.layers.5.encoder_attn.k_proj.bias,model.decoder.layers.5.encoder_attn.v_proj.weight,model.decoder.layers.5.encoder_attn.v_proj.bias,model.decoder.layers.5.encoder_attn.q_proj.weight,model.decoder.layers.5.encoder_attn.q_proj.bias,model.decoder.layers.5.encoder_attn.out_proj.weight,model.decoder.layers.5.encoder_attn.out_proj.bias,model.decoder.layers.5.encoder_attn_layer_norm.weight,model.decoder.layers.5.encoder_attn_layer_norm.bias,model.decoder.layers.5.fc1.weight,model.decoder.layers.5.fc1.bias,model.decoder.layers.5.fc2.weight,model.decoder.layers.5.fc2.bias,model.decoder.layers.5.final_layer_norm.weight,model.decoder.layers.5.final_layer_norm.bias,model.decoder.layers.6.self_attn.k_proj.weight,model.decoder.layers.6.self_attn.k_proj.bias,model.decoder.layers.6.self_attn.v_proj.weight,model.decoder.layers.6.self_attn.v_proj.bias,model.decoder.layers.6.self_attn.q_proj.weight,model.decoder.layers.6.self_attn.q_proj.bias,model.decoder.layers.6.self_attn.out_proj.weight,model.decoder.layers.6.self_attn.out_proj.bias,model.decoder.layers.6.self_attn_layer_norm.weight,model.decoder.layers.6.self_attn_layer_norm.bias,model.decoder.layers.6.encoder_attn.k_proj.weight,model.decoder.layers.6.encoder_attn.k_proj.bias,model.decoder.layers.6.encoder_attn.v_proj.weight,model.decoder.layers.6.encoder_attn.v_proj.bias,model.decoder.layers.6.encoder_attn.q_proj.weight,model.decoder.layers.6.encoder_attn.q_proj.bias,model.decoder.layers.6.encoder_attn.out_proj.weight,model.decoder.layers.6.encoder_attn.out_proj.bias,model.decoder.layers.6.encoder_attn_layer_norm.weight,model.decoder.layers.6.encoder_attn_layer_norm.bias,model.decoder.layers.6.fc1.weight,model.decoder.layers.6.fc1.bias,model.decoder.layers.6.fc2.weight,model.decoder.layers.6.fc2.bias,model.decoder.layers.6.final_layer_norm.weight,model.decoder.layers.6.final_layer_norm.bias,model.decoder.layers.7.self_attn.k_proj.weight,model.decoder.layers.7.self_attn.k_proj.bias,model.decoder.layers.7.self_attn.v_proj.weight,model.decoder.layers.7.self_attn.v_proj.bias,model.decoder.layers.7.self_attn.q_proj.weight,model.decoder.layers.7.self_attn.q_proj.bias,model.decoder.layers.7.self_attn.out_proj.weight,model.decoder.layers.7.self_attn.out_proj.bias,model.decoder.layers.7.self_attn_layer_norm.weight,model.decoder.layers.7.self_attn_layer_norm.bias,model.decoder.layers.7.encoder_attn.k_proj.weight,model.decoder.layers.7.encoder_attn.k_proj.bias,model.decoder.layers.7.encoder_attn.v_proj.weight,model.decoder.layers.7.encoder_attn.v_proj.bias,model.decoder.layers.7.encoder_attn.q_proj.weight,model.decoder.layers.7.encoder_attn.q_proj.bias,model.decoder.layers.7.encoder_attn.out_proj.weight,model.decoder.layers.7.encoder_attn.out_proj.bias,model.decoder.layers.7.encoder_attn_layer_norm.weight,model.decoder.layers.7.encoder_attn_layer_norm.bias,model.decoder.layers.7.fc1.weight,model.decoder.layers.7.fc1.bias,model.decoder.layers.7.fc2.weight,model.decoder.layers.7.fc2.bias,model.decoder.layers.7.final_layer_norm.weight,model.decoder.layers.7.final_layer_norm.bias,model.decoder.layers.8.self_attn.k_proj.weight,model.decoder.layers.8.self_attn.k_proj.bias,model.decoder.layers.8.self_attn.v_proj.weight,model.decoder.layers.8.self_attn.v_proj.bias,model.decoder.layers.8.self_attn.q_proj.weight,model.decoder.layers.8.self_attn.q_proj.bias,model.decoder.layers.8.self_attn.out_proj.weight,model.decoder.layers.8.self_attn.out_proj.bias,model.decoder.layers.8.self_attn_layer_norm.weight,model.decoder.layers.8.self_attn_layer_norm.bias,model.decoder.layers.8.encoder_attn.k_proj.weight,model.decoder.layers.8.encoder_attn.k_proj.bias,model.decoder.layers.8.encoder_attn.v_proj.weight,model.decoder.layers.8.encoder_attn.v_proj.bias,model.decoder.layers.8.encoder_attn.q_proj.weight,model.decoder.layers.8.encoder_attn.q_proj.bias,model.decoder.layers.8.encoder_attn.out_proj.weight,model.decoder.layers.8.encoder_attn.out_proj.bias,model.decoder.layers.8.encoder_attn_layer_norm.weight,model.decoder.layers.8.encoder_attn_layer_norm.bias,model.decoder.layers.8.fc1.weight,model.decoder.layers.8.fc1.bias,model.decoder.layers.8.fc2.weight,model.decoder.layers.8.fc2.bias,model.decoder.layers.8.final_layer_norm.weight,model.decoder.layers.8.final_layer_norm.bias,model.decoder.layers.9.self_attn.k_proj.weight,model.decoder.layers.9.self_attn.k_proj.bias,model.decoder.layers.9.self_attn.v_proj.weight,model.decoder.layers.9.self_attn.v_proj.bias,model.decoder.layers.9.self_attn.q_proj.weight,model.decoder.layers.9.self_attn.q_proj.bias,model.decoder.layers.9.self_attn.out_proj.weight,model.decoder.layers.9.self_attn.out_proj.bias,model.decoder.layers.9.self_attn_layer_norm.weight,model.decoder.layers.9.self_attn_layer_norm.bias,model.decoder.layers.9.encoder_attn.k_proj.weight,model.decoder.layers.9.encoder_attn.k_proj.bias,model.decoder.layers.9.encoder_attn.v_proj.weight,model.decoder.layers.9.encoder_attn.v_proj.bias,model.decoder.layers.9.encoder_attn.q_proj.weight,model.decoder.layers.9.encoder_attn.q_proj.bias,model.decoder.layers.9.encoder_attn.out_proj.weight,model.decoder.layers.9.encoder_attn.out_proj.bias,model.decoder.layers.9.encoder_attn_layer_norm.weight,model.decoder.layers.9.encoder_attn_layer_norm.bias,model.decoder.layers.9.fc1.weight,model.decoder.layers.9.fc1.bias,model.decoder.layers.9.fc2.weight,model.decoder.layers.9.fc2.bias,model.decoder.layers.9.final_layer_norm.weight,model.decoder.layers.9.final_layer_norm.bias,model.decoder.layers.10.self_attn.k_proj.weight,model.decoder.layers.10.self_attn.k_proj.bias,model.decoder.layers.10.self_attn.v_proj.weight,model.decoder.layers.10.self_attn.v_proj.bias,model.decoder.layers.10.self_attn.q_proj.weight,model.decoder.layers.10.self_attn.q_proj.bias,model.decoder.layers.10.self_attn.out_proj.weight,model.decoder.layers.10.self_attn.out_proj.bias,model.decoder.layers.10.self_attn_layer_norm.weight,model.decoder.layers.10.self_attn_layer_norm.bias,model.decoder.layers.10.encoder_attn.k_proj.weight,model.decoder.layers.10.encoder_attn.k_proj.bias,model.decoder.layers.10.encoder_attn.v_proj.weight,model.decoder.layers.10.encoder_attn.v_proj.bias,model.decoder.layers.10.encoder_attn.q_proj.weight,model.decoder.layers.10.encoder_attn.q_proj.bias,model.decoder.layers.10.encoder_attn.out_proj.weight,model.decoder.layers.10.encoder_attn.out_proj.bias,model.decoder.layers.10.encoder_attn_layer_norm.weight,model.decoder.layers.10.encoder_attn_layer_norm.bias,model.decoder.layers.10.fc1.weight,model.decoder.layers.10.fc1.bias,model.decoder.layers.10.fc2.weight,model.decoder.layers.10.fc2.bias,model.decoder.layers.10.final_layer_norm.weight,model.decoder.layers.10.final_layer_norm.bias,model.decoder.layers.11.self_attn.k_proj.weight,model.decoder.layers.11.self_attn.k_proj.bias,model.decoder.layers.11.self_attn.v_proj.weight,model.decoder.layers.11.self_attn.v_proj.bias,model.decoder.layers.11.self_attn.q_proj.weight,model.decoder.layers.11.self_attn.q_proj.bias,model.decoder.layers.11.self_attn.out_proj.weight,model.decoder.layers.11.self_attn.out_proj.bias,model.decoder.layers.11.self_attn_layer_norm.weight,model.decoder.layers.11.self_attn_layer_norm.bias,model.decoder.layers.11.encoder_attn.k_proj.weight,model.decoder.layers.11.encoder_attn.k_proj.bias,model.decoder.layers.11.encoder_attn.v_proj.weight,model.decoder.layers.11.encoder_attn.v_proj.bias,model.decoder.layers.11.encoder_attn.q_proj.weight,model.decoder.layers.11.encoder_attn.q_proj.bias,model.decoder.layers.11.encoder_attn.out_proj.weight,model.decoder.layers.11.encoder_attn.out_proj.bias,model.decoder.layers.11.encoder_attn_layer_norm.weight,model.decoder.layers.11.encoder_attn_layer_norm.bias,model.decoder.layers.11.fc1.weight,model.decoder.layers.11.fc1.bias,model.decoder.layers.11.fc2.weight,model.decoder.layers.11.fc2.bias,model.decoder.layers.11.final_layer_norm.weight,model.decoder.layers.11.final_layer_norm.bias,model.decoder.layers.12.self_attn.k_proj.weight,model.decoder.layers.12.self_attn.k_proj.bias,model.decoder.layers.12.self_attn.v_proj.weight,model.decoder.layers.12.self_attn.v_proj.bias,model.decoder.layers.12.self_attn.q_proj.weight,model.decoder.layers.12.self_attn.q_proj.bias,model.decoder.layers.12.self_attn.out_proj.weight,model.decoder.layers.12.self_attn.out_proj.bias,model.decoder.layers.12.self_attn_layer_norm.weight,model.decoder.layers.12.self_attn_layer_norm.bias,model.decoder.layers.12.encoder_attn.k_proj.weight,model.decoder.layers.12.encoder_attn.k_proj.bias,model.decoder.layers.12.encoder_attn.v_proj.weight,model.decoder.layers.12.encoder_attn.v_proj.bias,model.decoder.layers.12.encoder_attn.q_proj.weight,model.decoder.layers.12.encoder_attn.q_proj.bias,model.decoder.layers.12.encoder_attn.out_proj.weight,model.decoder.layers.12.encoder_attn.out_proj.bias,model.decoder.layers.12.encoder_attn_layer_norm.weight,model.decoder.layers.12.encoder_attn_layer_norm.bias,model.decoder.layers.12.fc1.weight,model.decoder.layers.12.fc1.bias,model.decoder.layers.12.fc2.weight,model.decoder.layers.12.fc2.bias,model.decoder.layers.12.final_layer_norm.weight,model.decoder.layers.12.final_layer_norm.bias,model.decoder.layers.13.self_attn.k_proj.weight,model.decoder.layers.13.self_attn.k_proj.bias,model.decoder.layers.13.self_attn.v_proj.weight,model.decoder.layers.13.self_attn.v_proj.bias,model.decoder.layers.13.self_attn.q_proj.weight,model.decoder.layers.13.self_attn.q_proj.bias,model.decoder.layers.13.self_attn.out_proj.weight,model.decoder.layers.13.self_attn.out_proj.bias,model.decoder.layers.13.self_attn_layer_norm.weight,model.decoder.layers.13.self_attn_layer_norm.bias,model.decoder.layers.13.encoder_attn.k_proj.weight,model.decoder.layers.13.encoder_attn.k_proj.bias,model.decoder.layers.13.encoder_attn.v_proj.weight,model.decoder.layers.13.encoder_attn.v_proj.bias,model.decoder.layers.13.encoder_attn.q_proj.weight,model.decoder.layers.13.encoder_attn.q_proj.bias,model.decoder.layers.13.encoder_attn.out_proj.weight,model.decoder.layers.13.encoder_attn.out_proj.bias,model.decoder.layers.13.encoder_attn_layer_norm.weight,model.decoder.layers.13.encoder_attn_layer_norm.bias,model.decoder.layers.13.fc1.weight,model.decoder.layers.13.fc1.bias,model.decoder.layers.13.fc2.weight,model.decoder.layers.13.fc2.bias,model.decoder.layers.13.final_layer_norm.weight,model.decoder.layers.13.final_layer_norm.bias,model.decoder.layers.14.self_attn.k_proj.weight,model.decoder.layers.14.self_attn.k_proj.bias,model.decoder.layers.14.self_attn.v_proj.weight,model.decoder.layers.14.self_attn.v_proj.bias,model.decoder.layers.14.self_attn.q_proj.weight,model.decoder.layers.14.self_attn.q_proj.bias,model.decoder.layers.14.self_attn.out_proj.weight,model.decoder.layers.14.self_attn.out_proj.bias,model.decoder.layers.14.self_attn_layer_norm.weight,model.decoder.layers.14.self_attn_layer_norm.bias,model.decoder.layers.14.encoder_attn.k_proj.weight,model.decoder.layers.14.encoder_attn.k_proj.bias,model.decoder.layers.14.encoder_attn.v_proj.weight,model.decoder.layers.14.encoder_attn.v_proj.bias,model.decoder.layers.14.encoder_attn.q_proj.weight,model.decoder.layers.14.encoder_attn.q_proj.bias,model.decoder.layers.14.encoder_attn.out_proj.weight,model.decoder.layers.14.encoder_attn.out_proj.bias,model.decoder.layers.14.encoder_attn_layer_norm.weight,model.decoder.layers.14.encoder_attn_layer_norm.bias,model.decoder.layers.14.fc1.weight,model.decoder.layers.14.fc1.bias,model.decoder.layers.14.fc2.weight,model.decoder.layers.14.fc2.bias,model.decoder.layers.14.final_layer_norm.weight,model.decoder.layers.14.final_layer_norm.bias,model.decoder.layers.15.self_attn.k_proj.weight,model.decoder.layers.15.self_attn.k_proj.bias,model.decoder.layers.15.self_attn.v_proj.weight,model.decoder.layers.15.self_attn.v_proj.bias,model.decoder.layers.15.self_attn.q_proj.weight,model.decoder.layers.15.self_attn.q_proj.bias,model.decoder.layers.15.self_attn.out_proj.weight,model.decoder.layers.15.self_attn.out_proj.bias,model.decoder.layers.15.self_attn_layer_norm.weight,model.decoder.layers.15.self_attn_layer_norm.bias,model.decoder.layers.15.encoder_attn.k_proj.weight,model.decoder.layers.15.encoder_attn.k_proj.bias,model.decoder.layers.15.encoder_attn.v_proj.weight,model.decoder.layers.15.encoder_attn.v_proj.bias,model.decoder.layers.15.encoder_attn.q_proj.weight,model.decoder.layers.15.encoder_attn.q_proj.bias,model.decoder.layers.15.encoder_attn.out_proj.weight,model.decoder.layers.15.encoder_attn.out_proj.bias,model.decoder.layers.15.encoder_attn_layer_norm.weight,model.decoder.layers.15.encoder_attn_layer_norm.bias,model.decoder.layers.15.fc1.weight,model.decoder.layers.15.fc1.bias,model.decoder.layers.15.fc2.weight,model.decoder.layers.15.fc2.bias,model.decoder.layers.15.final_layer_norm.weight,model.decoder.layers.15.final_layer_norm.bias,model.decoder.layer_norm.weight,model.decoder.layer_norm.bias,lm_head.weight]. All weights are initialized.
构建训练器并训练
extra_para = {'pretrained_model_name_or_path':args.pretrained_model_name_or_path} evaluator = SequenceGenerationEvaluator(valid_dataset=valid_dataset, user_defined_parameters=user_defined_parameters, **extra_para) trainer = Trainer(model=model, train_dataset=train_dataset, user_defined_parameters=user_defined_parameters, evaluator=evaluator) trainer.train()
[2022-08-25 10:55:53,630 INFO] ========== Initializing Tensorboard ========== [2022-08-25 10:55:53,668 INFO] ========== Training Start ========== [2022-08-25 10:55:53,669 INFO] Num of GPUs (all) = 1 [2022-08-25 10:55:53,671 INFO] Num of CPUs per worker = 1 [2022-08-25 10:55:53,672 INFO] Num dataset examples = 1000 [2022-08-25 10:55:53,674 INFO] Num training examples = 1000 [2022-08-25 10:55:53,675 INFO] Num validation examples = 700 [2022-08-25 10:55:53,676 INFO] Train. batch size = 8 [2022-08-25 10:55:53,677 INFO] Train. micro batch size = 8 [2022-08-25 10:55:53,678 INFO] Train. batch no. = 125 [2022-08-25 10:55:53,679 INFO] Evaluation batch size = 8 [2022-08-25 10:55:53,681 INFO] Total training steps = 125 [2022-08-25 10:55:53,681 INFO] Sequence length = 512 [2022-08-25 10:55:53,682 INFO] Saving steps = 500 [2022-08-25 10:55:53,683 INFO] Distributed_backend = nccl [2022-08-25 10:55:53,683 INFO] Worker Count = 1 [2022-08-25 10:55:53,684 INFO] Worker CPU = -1 [2022-08-25 10:55:53,684 INFO] Worker data threads = 10 [2022-08-25 10:55:53,688 INFO] num model params = 570,797,056 [2022-08-25 10:55:53,692 INFO] num trainable params = 568,699,904 [2022-08-25 10:55:53,692 INFO] [2022-08-25 10:55:53,693 INFO] ========== Model Config ========== [2022-08-25 10:55:53,694 INFO] { "activation_dropout": 0.1, "activation_function": "relu", "add_bias_logits": false, "add_final_layer_norm": true, "architectures": [ "PegasusForConditionalGeneration" ], "attention_dropout": 0.1, "bos_token_id": 0, "classif_dropout": 0.0, "classifier_dropout": 0.0, "d_model": 1024, "decoder_attention_heads": 16, "decoder_ffn_dim": 4096, "decoder_layerdrop": 0.0, "decoder_layers": 16, "decoder_start_token_id": 0, "dropout": 0.1, "easynlp_version": "0.0.3", "encoder_attention_heads": 16, "encoder_ffn_dim": 4096, "encoder_layerdrop": 0.0, "encoder_layers": 16, "eos_token_id": 1, "extra_pos_embeddings": 1, "forced_eos_token_id": 1, "gradient_checkpointing": false, "id2label": { "0": "LABEL_0", "1": "LABEL_1", "2": "LABEL_2" }, "init_std": 0.02, "is_encoder_decoder": true, "label2id": { "LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2 }, "length_penalty": 0.8, "max_length": 128, "max_position_embeddings": 1024, "min_length": 32, "model_type": "pegasus", "normalize_before": true, "normalize_embedding": false, "num_beams": 8, "num_hidden_layers": 16, "pad_token_id": 0, "scale_embedding": true, "static_position_embeddings": true, "use_cache": true, "vocab_size": 96103 }
optimizer type: AdamW
/home/pai/lib/python3.6/site-packages/pai_easynlp-0.0.7-py3.6.egg/easynlp/core/optimizers.py:441: UserWarning: This overload of add_ is deprecated: add_(Number alpha, Tensor other) Consider using one of the following signatures instead: add_(Tensor other, *, Number alpha) (Triggered internally at /workspace/artifacts/paipytorch1.8/dist/ubuntu18.04-py3.6-cuda10.1/build/src/torch/csrc/utils/python_arg_parser.cpp:1005.) exp_avg.mul_(beta1).add_(1.0 - beta1, grad) /home/pai/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`. warnings.warn("To get the last learning rate computed by the scheduler, " [2022-08-25 10:57:27,064 INFO] Epoch [ 0/ 1], step [100/125], lr 0.000007, 93.37 s [2022-08-25 10:57:27,065 INFO] loss : 0.8108
Training Time: 118.0245532989502, rank 0, gsteps 125
100%|██████████| 700/700 [22:31<00:00, 1.93s/it] [2022-08-25 11:20:24,646 INFO] Saving best model to ./finetuned_en_model/pytorch_model.bin...
Rouge 1/2/L: 37.78/18.57/35.34
[2022-08-25 11:21:13,500 INFO] Best score: 35.33964534239289 [2022-08-25 11:21:13,501 INFO] Training Time: 1520.4240629673004
模型评估
训练过程结束后,模型被我们保存在一开始指定好的checkpoint_dir中,本地路径为”./finetuned_en_model/”。我们可以对训练好的模型进行效果评估。我们使用EasyNLP中的SequenceGenerationEvaluator来初始化evaluator,并模型迁移至GPU机器,进行模型评估。
args.tables = "en_dev.tsv" extra_para = {'pretrained_model_name_or_path':args.pretrained_model_name_or_path} evaluator = SequenceGenerationEvaluator(valid_dataset=valid_dataset, user_defined_parameters=user_defined_parameters, **extra_para) if args.n_gpu > 0: model.to(torch.cuda.current_device()) else: model.to("cpu") evaluator.evaluate(model=model)
100%|██████████| 700/700 [11:23<00:00, 1.02it/s]
Rouge 1/2/L: 33.76/16.46/31.70
[('rouge-l', 31.699629026817473), ('rouge-1', 33.756938153964946), ('rouge-2', 16.462475090936373)]
模型预测
我们同样可以使用训练好的模型进行新闻标题生成。我们首先创建一个predictor,并据此实例化一个PredictorManager实例。我们指定预测好的结果输出在en.preds.txt。
args.tables = "en_dev.tsv" args.outputs = "en.preds.txt" args.input_schema = "title:str:1,content:str:1" args.output_schema = "predictions,beams" args.append_cols="title,content" args.micro_batch_size = 32 predictor = SequenceGenerationPredictor(model_dir=args.checkpoint_dir, model_cls=SequenceGeneration, first_sequence=args.first_sequence, user_defined_parameters=user_defined_parameters) predictor_manager = PredictorManager( predictor=predictor, input_file=args.tables.split(",")[0], input_schema=args.input_schema, output_file=args.outputs, output_schema=args.output_schema, append_cols=args.append_cols, batch_size=args.micro_batch_size ) predictor_manager.run()
Loaded weights of the model: [final_logits_bias,model.shared.weight,model.encoder.embed_tokens.weight,model.encoder.embed_positions.weight,model.encoder.layers.0.self_attn.k_proj.weight,model.encoder.layers.0.self_attn.k_proj.bias,model.encoder.layers.0.self_attn.v_proj.weight,model.encoder.layers.0.self_attn.v_proj.bias,model.encoder.layers.0.self_attn.q_proj.weight,model.encoder.layers.0.self_attn.q_proj.bias,model.encoder.layers.0.self_attn.out_proj.weight,model.encoder.layers.0.self_attn.out_proj.bias,model.encoder.layers.0.self_attn_layer_norm.weight,model.encoder.layers.0.self_attn_layer_norm.bias,model.encoder.layers.0.fc1.weight,model.encoder.layers.0.fc1.bias,model.encoder.layers.0.fc2.weight,model.encoder.layers.0.fc2.bias,model.encoder.layers.0.final_layer_norm.weight,model.encoder.layers.0.final_layer_norm.bias,model.encoder.layers.1.self_attn.k_proj.weight,model.encoder.layers.1.self_attn.k_proj.bias,model.encoder.layers.1.self_attn.v_proj.weight,model.encoder.layers.1.self_attn.v_proj.bias,model.encoder.layers.1.self_attn.q_proj.weight,model.encoder.layers.1.self_attn.q_proj.bias,model.encoder.layers.1.self_attn.out_proj.weight,model.encoder.layers.1.self_attn.out_proj.bias,model.encoder.layers.1.self_attn_layer_norm.weight,model.encoder.layers.1.self_attn_layer_norm.bias,model.encoder.layers.1.fc1.weight,model.encoder.layers.1.fc1.bias,model.encoder.layers.1.fc2.weight,model.encoder.layers.1.fc2.bias,model.encoder.layers.1.final_layer_norm.weight,model.encoder.layers.1.final_layer_norm.bias,model.encoder.layers.2.self_attn.k_proj.weight,model.encoder.layers.2.self_attn.k_proj.bias,model.encoder.layers.2.self_attn.v_proj.weight,model.encoder.layers.2.self_attn.v_proj.bias,model.encoder.layers.2.self_attn.q_proj.weight,model.encoder.layers.2.self_attn.q_proj.bias,model.encoder.layers.2.self_attn.out_proj.weight,model.encoder.layers.2.self_attn.out_proj.bias,model.encoder.layers.2.self_attn_layer_norm.weight,model.encoder.layers.2.self_attn_layer_norm.bias,model.encoder.layers.2.fc1.weight,model.encoder.layers.2.fc1.bias,model.encoder.layers.2.fc2.weight,model.encoder.layers.2.fc2.bias,model.encoder.layers.2.final_layer_norm.weight,model.encoder.layers.2.final_layer_norm.bias,model.encoder.layers.3.self_attn.k_proj.weight,model.encoder.layers.3.self_attn.k_proj.bias,model.encoder.layers.3.self_attn.v_proj.weight,model.encoder.layers.3.self_attn.v_proj.bias,model.encoder.layers.3.self_attn.q_proj.weight,model.encoder.layers.3.self_attn.q_proj.bias,model.encoder.layers.3.self_attn.out_proj.weight,model.encoder.layers.3.self_attn.out_proj.bias,model.encoder.layers.3.self_attn_layer_norm.weight,model.encoder.layers.3.self_attn_layer_norm.bias,model.encoder.layers.3.fc1.weight,model.encoder.layers.3.fc1.bias,model.encoder.layers.3.fc2.weight,model.encoder.layers.3.fc2.bias,model.encoder.layers.3.final_layer_norm.weight,model.encoder.layers.3.final_layer_norm.bias,model.encoder.layers.4.self_attn.k_proj.weight,model.encoder.layers.4.self_attn.k_proj.bias,model.encoder.layers.4.self_attn.v_proj.weight,model.encoder.layers.4.self_attn.v_proj.bias,model.encoder.layers.4.self_attn.q_proj.weight,model.encoder.layers.4.self_attn.q_proj.bias,model.encoder.layers.4.self_attn.out_proj.weight,model.encoder.layers.4.self_attn.out_proj.bias,model.encoder.layers.4.self_attn_layer_norm.weight,model.encoder.layers.4.self_attn_layer_norm.bias,model.encoder.layers.4.fc1.weight,model.encoder.layers.4.fc1.bias,model.encoder.layers.4.fc2.weight,model.encoder.layers.4.fc2.bias,model.encoder.layers.4.final_layer_norm.weight,model.encoder.layers.4.final_layer_norm.bias,model.encoder.layers.5.self_attn.k_proj.weight,model.encoder.layers.5.self_attn.k_proj.bias,model.encoder.layers.5.self_attn.v_proj.weight,model.encoder.layers.5.self_attn.v_proj.bias,model.encoder.layers.5.self_attn.q_proj.weight,model.encoder.layers.5.self_attn.q_proj.bias,model.encoder.layers.5.self_attn.out_proj.weight,model.encoder.layers.5.self_attn.out_proj.bias,model.encoder.layers.5.self_attn_layer_norm.weight,model.encoder.layers.5.self_attn_layer_norm.bias,model.encoder.layers.5.fc1.weight,model.encoder.layers.5.fc1.bias,model.encoder.layers.5.fc2.weight,model.encoder.layers.5.fc2.bias,model.encoder.layers.5.final_layer_norm.weight,model.encoder.layers.5.final_layer_norm.bias,model.encoder.layers.6.self_attn.k_proj.weight,model.encoder.layers.6.self_attn.k_proj.bias,model.encoder.layers.6.self_attn.v_proj.weight,model.encoder.layers.6.self_attn.v_proj.bias,model.encoder.layers.6.self_attn.q_proj.weight,model.encoder.layers.6.self_attn.q_proj.bias,model.encoder.layers.6.self_attn.out_proj.weight,model.encoder.layers.6.self_attn.out_proj.bias,model.encoder.layers.6.self_attn_layer_norm.weight,model.encoder.layers.6.self_attn_layer_norm.bias,model.encoder.layers.6.fc1.weight,model.encoder.layers.6.fc1.bias,model.encoder.layers.6.fc2.weight,model.encoder.layers.6.fc2.bias,model.encoder.layers.6.final_layer_norm.weight,model.encoder.layers.6.final_layer_norm.bias,model.encoder.layers.7.self_attn.k_proj.weight,model.encoder.layers.7.self_attn.k_proj.bias,model.encoder.layers.7.self_attn.v_proj.weight,model.encoder.layers.7.self_attn.v_proj.bias,model.encoder.layers.7.self_attn.q_proj.weight,model.encoder.layers.7.self_attn.q_proj.bias,model.encoder.layers.7.self_attn.out_proj.weight,model.encoder.layers.7.self_attn.out_proj.bias,model.encoder.layers.7.self_attn_layer_norm.weight,model.encoder.layers.7.self_attn_layer_norm.bias,model.encoder.layers.7.fc1.weight,model.encoder.layers.7.fc1.bias,model.encoder.layers.7.fc2.weight,model.encoder.layers.7.fc2.bias,model.encoder.layers.7.final_layer_norm.weight,model.encoder.layers.7.final_layer_norm.bias,model.encoder.layers.8.self_attn.k_proj.weight,model.encoder.layers.8.self_attn.k_proj.bias,model.encoder.layers.8.self_attn.v_proj.weight,model.encoder.layers.8.self_attn.v_proj.bias,model.encoder.layers.8.self_attn.q_proj.weight,model.encoder.layers.8.self_attn.q_proj.bias,model.encoder.layers.8.self_attn.out_proj.weight,model.encoder.layers.8.self_attn.out_proj.bias,model.encoder.layers.8.self_attn_layer_norm.weight,model.encoder.layers.8.self_attn_layer_norm.bias,model.encoder.layers.8.fc1.weight,model.encoder.layers.8.fc1.bias,model.encoder.layers.8.fc2.weight,model.encoder.layers.8.fc2.bias,model.encoder.layers.8.final_layer_norm.weight,model.encoder.layers.8.final_layer_norm.bias,model.encoder.layers.9.self_attn.k_proj.weight,model.encoder.layers.9.self_attn.k_proj.bias,model.encoder.layers.9.self_attn.v_proj.weight,model.encoder.layers.9.self_attn.v_proj.bias,model.encoder.layers.9.self_attn.q_proj.weight,model.encoder.layers.9.self_attn.q_proj.bias,model.encoder.layers.9.self_attn.out_proj.weight,model.encoder.layers.9.self_attn.out_proj.bias,model.encoder.layers.9.self_attn_layer_norm.weight,model.encoder.layers.9.self_attn_layer_norm.bias,model.encoder.layers.9.fc1.weight,model.encoder.layers.9.fc1.bias,model.encoder.layers.9.fc2.weight,model.encoder.layers.9.fc2.bias,model.encoder.layers.9.final_layer_norm.weight,model.encoder.layers.9.final_layer_norm.bias,model.encoder.layers.10.self_attn.k_proj.weight,model.encoder.layers.10.self_attn.k_proj.bias,model.encoder.layers.10.self_attn.v_proj.weight,model.encoder.layers.10.self_attn.v_proj.bias,model.encoder.layers.10.self_attn.q_proj.weight,model.encoder.layers.10.self_attn.q_proj.bias,model.encoder.layers.10.self_attn.out_proj.weight,model.encoder.layers.10.self_attn.out_proj.bias,model.encoder.layers.10.self_attn_layer_norm.weight,model.encoder.layers.10.self_attn_layer_norm.bias,model.encoder.layers.10.fc1.weight,model.encoder.layers.10.fc1.bias,model.encoder.layers.10.fc2.weight,model.encoder.layers.10.fc2.bias,model.encoder.layers.10.final_layer_norm.weight,model.encoder.layers.10.final_layer_norm.bias,model.encoder.layers.11.self_attn.k_proj.weight,model.encoder.layers.11.self_attn.k_proj.bias,model.encoder.layers.11.self_attn.v_proj.weight,model.encoder.layers.11.self_attn.v_proj.bias,model.encoder.layers.11.self_attn.q_proj.weight,model.encoder.layers.11.self_attn.q_proj.bias,model.encoder.layers.11.self_attn.out_proj.weight,model.encoder.layers.11.self_attn.out_proj.bias,model.encoder.layers.11.self_attn_layer_norm.weight,model.encoder.layers.11.self_attn_layer_norm.bias,model.encoder.layers.11.fc1.weight,model.encoder.layers.11.fc1.bias,model.encoder.layers.11.fc2.weight,model.encoder.layers.11.fc2.bias,model.encoder.layers.11.final_layer_norm.weight,model.encoder.layers.11.final_layer_norm.bias,model.encoder.layers.12.self_attn.k_proj.weight,model.encoder.layers.12.self_attn.k_proj.bias,model.encoder.layers.12.self_attn.v_proj.weight,model.encoder.layers.12.self_attn.v_proj.bias,model.encoder.layers.12.self_attn.q_proj.weight,model.encoder.layers.12.self_attn.q_proj.bias,model.encoder.layers.12.self_attn.out_proj.weight,model.encoder.layers.12.self_attn.out_proj.bias,model.encoder.layers.12.self_attn_layer_norm.weight,model.encoder.layers.12.self_attn_layer_norm.bias,model.encoder.layers.12.fc1.weight,model.encoder.layers.12.fc1.bias,model.encoder.layers.12.fc2.weight,model.encoder.layers.12.fc2.bias,model.encoder.layers.12.final_layer_norm.weight,model.encoder.layers.12.final_layer_norm.bias,model.encoder.layers.13.self_attn.k_proj.weight,model.encoder.layers.13.self_attn.k_proj.bias,model.encoder.layers.13.self_attn.v_proj.weight,model.encoder.layers.13.self_attn.v_proj.bias,model.encoder.layers.13.self_attn.q_proj.weight,model.encoder.layers.13.self_attn.q_proj.bias,model.encoder.layers.13.self_attn.out_proj.weight,model.encoder.layers.13.self_attn.out_proj.bias,model.encoder.layers.13.self_attn_layer_norm.weight,model.encoder.layers.13.self_attn_layer_norm.bias,model.encoder.layers.13.fc1.weight,model.encoder.layers.13.fc1.bias,model.encoder.layers.13.fc2.weight,model.encoder.layers.13.fc2.bias,model.encoder.layers.13.final_layer_norm.weight,model.encoder.layers.13.final_layer_norm.bias,model.encoder.layers.14.self_attn.k_proj.weight,model.encoder.layers.14.self_attn.k_proj.bias,model.encoder.layers.14.self_attn.v_proj.weight,model.encoder.layers.14.self_attn.v_proj.bias,model.encoder.layers.14.self_attn.q_proj.weight,model.encoder.layers.14.self_attn.q_proj.bias,model.encoder.layers.14.self_attn.out_proj.weight,model.encoder.layers.14.self_attn.out_proj.bias,model.encoder.layers.14.self_attn_layer_norm.weight,model.encoder.layers.14.self_attn_layer_norm.bias,model.encoder.layers.14.fc1.weight,model.encoder.layers.14.fc1.bias,model.encoder.layers.14.fc2.weight,model.encoder.layers.14.fc2.bias,model.encoder.layers.14.final_layer_norm.weight,model.encoder.layers.14.final_layer_norm.bias,model.encoder.layers.15.self_attn.k_proj.weight,model.encoder.layers.15.self_attn.k_proj.bias,model.encoder.layers.15.self_attn.v_proj.weight,model.encoder.layers.15.self_attn.v_proj.bias,model.encoder.layers.15.self_attn.q_proj.weight,model.encoder.layers.15.self_attn.q_proj.bias,model.encoder.layers.15.self_attn.out_proj.weight,model.encoder.layers.15.self_attn.out_proj.bias,model.encoder.layers.15.self_attn_layer_norm.weight,model.encoder.layers.15.self_attn_layer_norm.bias,model.encoder.layers.15.fc1.weight,model.encoder.layers.15.fc1.bias,model.encoder.layers.15.fc2.weight,model.encoder.layers.15.fc2.bias,model.encoder.layers.15.final_layer_norm.weight,model.encoder.layers.15.final_layer_norm.bias,model.encoder.layer_norm.weight,model.encoder.layer_norm.bias,model.decoder.embed_tokens.weight,model.decoder.embed_positions.weight,model.decoder.layers.0.self_attn.k_proj.weight,model.decoder.layers.0.self_attn.k_proj.bias,model.decoder.layers.0.self_attn.v_proj.weight,model.decoder.layers.0.self_attn.v_proj.bias,model.decoder.layers.0.self_attn.q_proj.weight,model.decoder.layers.0.self_attn.q_proj.bias,model.decoder.layers.0.self_attn.out_proj.weight,model.decoder.layers.0.self_attn.out_proj.bias,model.decoder.layers.0.self_attn_layer_norm.weight,model.decoder.layers.0.self_attn_layer_norm.bias,model.decoder.layers.0.encoder_attn.k_proj.weight,model.decoder.layers.0.encoder_attn.k_proj.bias,model.decoder.layers.0.encoder_attn.v_proj.weight,model.decoder.layers.0.encoder_attn.v_proj.bias,model.decoder.layers.0.encoder_attn.q_proj.weight,model.decoder.layers.0.encoder_attn.q_proj.bias,model.decoder.layers.0.encoder_attn.out_proj.weight,model.decoder.layers.0.encoder_attn.out_proj.bias,model.decoder.layers.0.encoder_attn_layer_norm.weight,model.decoder.layers.0.encoder_attn_layer_norm.bias,model.decoder.layers.0.fc1.weight,model.decoder.layers.0.fc1.bias,model.decoder.layers.0.fc2.weight,model.decoder.layers.0.fc2.bias,model.decoder.layers.0.final_layer_norm.weight,model.decoder.layers.0.final_layer_norm.bias,model.decoder.layers.1.self_attn.k_proj.weight,model.decoder.layers.1.self_attn.k_proj.bias,model.decoder.layers.1.self_attn.v_proj.weight,model.decoder.layers.1.self_attn.v_proj.bias,model.decoder.layers.1.self_attn.q_proj.weight,model.decoder.layers.1.self_attn.q_proj.bias,model.decoder.layers.1.self_attn.out_proj.weight,model.decoder.layers.1.self_attn.out_proj.bias,model.decoder.layers.1.self_attn_layer_norm.weight,model.decoder.layers.1.self_attn_layer_norm.bias,model.decoder.layers.1.encoder_attn.k_proj.weight,model.decoder.layers.1.encoder_attn.k_proj.bias,model.decoder.layers.1.encoder_attn.v_proj.weight,model.decoder.layers.1.encoder_attn.v_proj.bias,model.decoder.layers.1.encoder_attn.q_proj.weight,model.decoder.layers.1.encoder_attn.q_proj.bias,model.decoder.layers.1.encoder_attn.out_proj.weight,model.decoder.layers.1.encoder_attn.out_proj.bias,model.decoder.layers.1.encoder_attn_layer_norm.weight,model.decoder.layers.1.encoder_attn_layer_norm.bias,model.decoder.layers.1.fc1.weight,model.decoder.layers.1.fc1.bias,model.decoder.layers.1.fc2.weight,model.decoder.layers.1.fc2.bias,model.decoder.layers.1.final_layer_norm.weight,model.decoder.layers.1.final_layer_norm.bias,model.decoder.layers.2.self_attn.k_proj.weight,model.decoder.layers.2.self_attn.k_proj.bias,model.decoder.layers.2.self_attn.v_proj.weight,model.decoder.layers.2.self_attn.v_proj.bias,model.decoder.layers.2.self_attn.q_proj.weight,model.decoder.layers.2.self_attn.q_proj.bias,model.decoder.layers.2.self_attn.out_proj.weight,model.decoder.layers.2.self_attn.out_proj.bias,model.decoder.layers.2.self_attn_layer_norm.weight,model.decoder.layers.2.self_attn_layer_norm.bias,model.decoder.layers.2.encoder_attn.k_proj.weight,model.decoder.layers.2.encoder_attn.k_proj.bias,model.decoder.layers.2.encoder_attn.v_proj.weight,model.decoder.layers.2.encoder_attn.v_proj.bias,model.decoder.layers.2.encoder_attn.q_proj.weight,model.decoder.layers.2.encoder_attn.q_proj.bias,model.decoder.layers.2.encoder_attn.out_proj.weight,model.decoder.layers.2.encoder_attn.out_proj.bias,model.decoder.layers.2.encoder_attn_layer_norm.weight,model.decoder.layers.2.encoder_attn_layer_norm.bias,model.decoder.layers.2.fc1.weight,model.decoder.layers.2.fc1.bias,model.decoder.layers.2.fc2.weight,model.decoder.layers.2.fc2.bias,model.decoder.layers.2.final_layer_norm.weight,model.decoder.layers.2.final_layer_norm.bias,model.decoder.layers.3.self_attn.k_proj.weight,model.decoder.layers.3.self_attn.k_proj.bias,model.decoder.layers.3.self_attn.v_proj.weight,model.decoder.layers.3.self_attn.v_proj.bias,model.decoder.layers.3.self_attn.q_proj.weight,model.decoder.layers.3.self_attn.q_proj.bias,model.decoder.layers.3.self_attn.out_proj.weight,model.decoder.layers.3.self_attn.out_proj.bias,model.decoder.layers.3.self_attn_layer_norm.weight,model.decoder.layers.3.self_attn_layer_norm.bias,model.decoder.layers.3.encoder_attn.k_proj.weight,model.decoder.layers.3.encoder_attn.k_proj.bias,model.decoder.layers.3.encoder_attn.v_proj.weight,model.decoder.layers.3.encoder_attn.v_proj.bias,model.decoder.layers.3.encoder_attn.q_proj.weight,model.decoder.layers.3.encoder_attn.q_proj.bias,model.decoder.layers.3.encoder_attn.out_proj.weight,model.decoder.layers.3.encoder_attn.out_proj.bias,model.decoder.layers.3.encoder_attn_layer_norm.weight,model.decoder.layers.3.encoder_attn_layer_norm.bias,model.decoder.layers.3.fc1.weight,model.decoder.layers.3.fc1.bias,model.decoder.layers.3.fc2.weight,model.decoder.layers.3.fc2.bias,model.decoder.layers.3.final_layer_norm.weight,model.decoder.layers.3.final_layer_norm.bias,model.decoder.layers.4.self_attn.k_proj.weight,model.decoder.layers.4.self_attn.k_proj.bias,model.decoder.layers.4.self_attn.v_proj.weight,model.decoder.layers.4.self_attn.v_proj.bias,model.decoder.layers.4.self_attn.q_proj.weight,model.decoder.layers.4.self_attn.q_proj.bias,model.decoder.layers.4.self_attn.out_proj.weight,model.decoder.layers.4.self_attn.out_proj.bias,model.decoder.layers.4.self_attn_layer_norm.weight,model.decoder.layers.4.self_attn_layer_norm.bias,model.decoder.layers.4.encoder_attn.k_proj.weight,model.decoder.layers.4.encoder_attn.k_proj.bias,model.decoder.layers.4.encoder_attn.v_proj.weight,model.decoder.layers.4.encoder_attn.v_proj.bias,model.decoder.layers.4.encoder_attn.q_proj.weight,model.decoder.layers.4.encoder_attn.q_proj.bias,model.decoder.layers.4.encoder_attn.out_proj.weight,model.decoder.layers.4.encoder_attn.out_proj.bias,model.decoder.layers.4.encoder_attn_layer_norm.weight,model.decoder.layers.4.encoder_attn_layer_norm.bias,model.decoder.layers.4.fc1.weight,model.decoder.layers.4.fc1.bias,model.decoder.layers.4.fc2.weight,model.decoder.layers.4.fc2.bias,model.decoder.layers.4.final_layer_norm.weight,model.decoder.layers.4.final_layer_norm.bias,model.decoder.layers.5.self_attn.k_proj.weight,model.decoder.layers.5.self_attn.k_proj.bias,model.decoder.layers.5.self_attn.v_proj.weight,model.decoder.layers.5.self_attn.v_proj.bias,model.decoder.layers.5.self_attn.q_proj.weight,model.decoder.layers.5.self_attn.q_proj.bias,model.decoder.layers.5.self_attn.out_proj.weight,model.decoder.layers.5.self_attn.out_proj.bias,model.decoder.layers.5.self_attn_layer_norm.weight,model.decoder.layers.5.self_attn_layer_norm.bias,model.decoder.layers.5.encoder_attn.k_proj.weight,model.decoder.layers.5.encoder_attn.k_proj.bias,model.decoder.layers.5.encoder_attn.v_proj.weight,model.decoder.layers.5.encoder_attn.v_proj.bias,model.decoder.layers.5.encoder_attn.q_proj.weight,model.decoder.layers.5.encoder_attn.q_proj.bias,model.decoder.layers.5.encoder_attn.out_proj.weight,model.decoder.layers.5.encoder_attn.out_proj.bias,model.decoder.layers.5.encoder_attn_layer_norm.weight,model.decoder.layers.5.encoder_attn_layer_norm.bias,model.decoder.layers.5.fc1.weight,model.decoder.layers.5.fc1.bias,model.decoder.layers.5.fc2.weight,model.decoder.layers.5.fc2.bias,model.decoder.layers.5.final_layer_norm.weight,model.decoder.layers.5.final_layer_norm.bias,model.decoder.layers.6.self_attn.k_proj.weight,model.decoder.layers.6.self_attn.k_proj.bias,model.decoder.layers.6.self_attn.v_proj.weight,model.decoder.layers.6.self_attn.v_proj.bias,model.decoder.layers.6.self_attn.q_proj.weight,model.decoder.layers.6.self_attn.q_proj.bias,model.decoder.layers.6.self_attn.out_proj.weight,model.decoder.layers.6.self_attn.out_proj.bias,model.decoder.layers.6.self_attn_layer_norm.weight,model.decoder.layers.6.self_attn_layer_norm.bias,model.decoder.layers.6.encoder_attn.k_proj.weight,model.decoder.layers.6.encoder_attn.k_proj.bias,model.decoder.layers.6.encoder_attn.v_proj.weight,model.decoder.layers.6.encoder_attn.v_proj.bias,model.decoder.layers.6.encoder_attn.q_proj.weight,model.decoder.layers.6.encoder_attn.q_proj.bias,model.decoder.layers.6.encoder_attn.out_proj.weight,model.decoder.layers.6.encoder_attn.out_proj.bias,model.decoder.layers.6.encoder_attn_layer_norm.weight,model.decoder.layers.6.encoder_attn_layer_norm.bias,model.decoder.layers.6.fc1.weight,model.decoder.layers.6.fc1.bias,model.decoder.layers.6.fc2.weight,model.decoder.layers.6.fc2.bias,model.decoder.layers.6.final_layer_norm.weight,model.decoder.layers.6.final_layer_norm.bias,model.decoder.layers.7.self_attn.k_proj.weight,model.decoder.layers.7.self_attn.k_proj.bias,model.decoder.layers.7.self_attn.v_proj.weight,model.decoder.layers.7.self_attn.v_proj.bias,model.decoder.layers.7.self_attn.q_proj.weight,model.decoder.layers.7.self_attn.q_proj.bias,model.decoder.layers.7.self_attn.out_proj.weight,model.decoder.layers.7.self_attn.out_proj.bias,model.decoder.layers.7.self_attn_layer_norm.weight,model.decoder.layers.7.self_attn_layer_norm.bias,model.decoder.layers.7.encoder_attn.k_proj.weight,model.decoder.layers.7.encoder_attn.k_proj.bias,model.decoder.layers.7.encoder_attn.v_proj.weight,model.decoder.layers.7.encoder_attn.v_proj.bias,model.decoder.layers.7.encoder_attn.q_proj.weight,model.decoder.layers.7.encoder_attn.q_proj.bias,model.decoder.layers.7.encoder_attn.out_proj.weight,model.decoder.layers.7.encoder_attn.out_proj.bias,model.decoder.layers.7.encoder_attn_layer_norm.weight,model.decoder.layers.7.encoder_attn_layer_norm.bias,model.decoder.layers.7.fc1.weight,model.decoder.layers.7.fc1.bias,model.decoder.layers.7.fc2.weight,model.decoder.layers.7.fc2.bias,model.decoder.layers.7.final_layer_norm.weight,model.decoder.layers.7.final_layer_norm.bias,model.decoder.layers.8.self_attn.k_proj.weight,model.decoder.layers.8.self_attn.k_proj.bias,model.decoder.layers.8.self_attn.v_proj.weight,model.decoder.layers.8.self_attn.v_proj.bias,model.decoder.layers.8.self_attn.q_proj.weight,model.decoder.layers.8.self_attn.q_proj.bias,model.decoder.layers.8.self_attn.out_proj.weight,model.decoder.layers.8.self_attn.out_proj.bias,model.decoder.layers.8.self_attn_layer_norm.weight,model.decoder.layers.8.self_attn_layer_norm.bias,model.decoder.layers.8.encoder_attn.k_proj.weight,model.decoder.layers.8.encoder_attn.k_proj.bias,model.decoder.layers.8.encoder_attn.v_proj.weight,model.decoder.layers.8.encoder_attn.v_proj.bias,model.decoder.layers.8.encoder_attn.q_proj.weight,model.decoder.layers.8.encoder_attn.q_proj.bias,model.decoder.layers.8.encoder_attn.out_proj.weight,model.decoder.layers.8.encoder_attn.out_proj.bias,model.decoder.layers.8.encoder_attn_layer_norm.weight,model.decoder.layers.8.encoder_attn_layer_norm.bias,model.decoder.layers.8.fc1.weight,model.decoder.layers.8.fc1.bias,model.decoder.layers.8.fc2.weight,model.decoder.layers.8.fc2.bias,model.decoder.layers.8.final_layer_norm.weight,model.decoder.layers.8.final_layer_norm.bias,model.decoder.layers.9.self_attn.k_proj.weight,model.decoder.layers.9.self_attn.k_proj.bias,model.decoder.layers.9.self_attn.v_proj.weight,model.decoder.layers.9.self_attn.v_proj.bias,model.decoder.layers.9.self_attn.q_proj.weight,model.decoder.layers.9.self_attn.q_proj.bias,model.decoder.layers.9.self_attn.out_proj.weight,model.decoder.layers.9.self_attn.out_proj.bias,model.decoder.layers.9.self_attn_layer_norm.weight,model.decoder.layers.9.self_attn_layer_norm.bias,model.decoder.layers.9.encoder_attn.k_proj.weight,model.decoder.layers.9.encoder_attn.k_proj.bias,model.decoder.layers.9.encoder_attn.v_proj.weight,model.decoder.layers.9.encoder_attn.v_proj.bias,model.decoder.layers.9.encoder_attn.q_proj.weight,model.decoder.layers.9.encoder_attn.q_proj.bias,model.decoder.layers.9.encoder_attn.out_proj.weight,model.decoder.layers.9.encoder_attn.out_proj.bias,model.decoder.layers.9.encoder_attn_layer_norm.weight,model.decoder.layers.9.encoder_attn_layer_norm.bias,model.decoder.layers.9.fc1.weight,model.decoder.layers.9.fc1.bias,model.decoder.layers.9.fc2.weight,model.decoder.layers.9.fc2.bias,model.decoder.layers.9.final_layer_norm.weight,model.decoder.layers.9.final_layer_norm.bias,model.decoder.layers.10.self_attn.k_proj.weight,model.decoder.layers.10.self_attn.k_proj.bias,model.decoder.layers.10.self_attn.v_proj.weight,model.decoder.layers.10.self_attn.v_proj.bias,model.decoder.layers.10.self_attn.q_proj.weight,model.decoder.layers.10.self_attn.q_proj.bias,model.decoder.layers.10.self_attn.out_proj.weight,model.decoder.layers.10.self_attn.out_proj.bias,model.decoder.layers.10.self_attn_layer_norm.weight,model.decoder.layers.10.self_attn_layer_norm.bias,model.decoder.layers.10.encoder_attn.k_proj.weight,model.decoder.layers.10.encoder_attn.k_proj.bias,model.decoder.layers.10.encoder_attn.v_proj.weight,model.decoder.layers.10.encoder_attn.v_proj.bias,model.decoder.layers.10.encoder_attn.q_proj.weight,model.decoder.layers.10.encoder_attn.q_proj.bias,model.decoder.layers.10.encoder_attn.out_proj.weight,model.decoder.layers.10.encoder_attn.out_proj.bias,model.decoder.layers.10.encoder_attn_layer_norm.weight,model.decoder.layers.10.encoder_attn_layer_norm.bias,model.decoder.layers.10.fc1.weight,model.decoder.layers.10.fc1.bias,model.decoder.layers.10.fc2.weight,model.decoder.layers.10.fc2.bias,model.decoder.layers.10.final_layer_norm.weight,model.decoder.layers.10.final_layer_norm.bias,model.decoder.layers.11.self_attn.k_proj.weight,model.decoder.layers.11.self_attn.k_proj.bias,model.decoder.layers.11.self_attn.v_proj.weight,model.decoder.layers.11.self_attn.v_proj.bias,model.decoder.layers.11.self_attn.q_proj.weight,model.decoder.layers.11.self_attn.q_proj.bias,model.decoder.layers.11.self_attn.out_proj.weight,model.decoder.layers.11.self_attn.out_proj.bias,model.decoder.layers.11.self_attn_layer_norm.weight,model.decoder.layers.11.self_attn_layer_norm.bias,model.decoder.layers.11.encoder_attn.k_proj.weight,model.decoder.layers.11.encoder_attn.k_proj.bias,model.decoder.layers.11.encoder_attn.v_proj.weight,model.decoder.layers.11.encoder_attn.v_proj.bias,model.decoder.layers.11.encoder_attn.q_proj.weight,model.decoder.layers.11.encoder_attn.q_proj.bias,model.decoder.layers.11.encoder_attn.out_proj.weight,model.decoder.layers.11.encoder_attn.out_proj.bias,model.decoder.layers.11.encoder_attn_layer_norm.weight,model.decoder.layers.11.encoder_attn_layer_norm.bias,model.decoder.layers.11.fc1.weight,model.decoder.layers.11.fc1.bias,model.decoder.layers.11.fc2.weight,model.decoder.layers.11.fc2.bias,model.decoder.layers.11.final_layer_norm.weight,model.decoder.layers.11.final_layer_norm.bias,model.decoder.layers.12.self_attn.k_proj.weight,model.decoder.layers.12.self_attn.k_proj.bias,model.decoder.layers.12.self_attn.v_proj.weight,model.decoder.layers.12.self_attn.v_proj.bias,model.decoder.layers.12.self_attn.q_proj.weight,model.decoder.layers.12.self_attn.q_proj.bias,model.decoder.layers.12.self_attn.out_proj.weight,model.decoder.layers.12.self_attn.out_proj.bias,model.decoder.layers.12.self_attn_layer_norm.weight,model.decoder.layers.12.self_attn_layer_norm.bias,model.decoder.layers.12.encoder_attn.k_proj.weight,model.decoder.layers.12.encoder_attn.k_proj.bias,model.decoder.layers.12.encoder_attn.v_proj.weight,model.decoder.layers.12.encoder_attn.v_proj.bias,model.decoder.layers.12.encoder_attn.q_proj.weight,model.decoder.layers.12.encoder_attn.q_proj.bias,model.decoder.layers.12.encoder_attn.out_proj.weight,model.decoder.layers.12.encoder_attn.out_proj.bias,model.decoder.layers.12.encoder_attn_layer_norm.weight,model.decoder.layers.12.encoder_attn_layer_norm.bias,model.decoder.layers.12.fc1.weight,model.decoder.layers.12.fc1.bias,model.decoder.layers.12.fc2.weight,model.decoder.layers.12.fc2.bias,model.decoder.layers.12.final_layer_norm.weight,model.decoder.layers.12.final_layer_norm.bias,model.decoder.layers.13.self_attn.k_proj.weight,model.decoder.layers.13.self_attn.k_proj.bias,model.decoder.layers.13.self_attn.v_proj.weight,model.decoder.layers.13.self_attn.v_proj.bias,model.decoder.layers.13.self_attn.q_proj.weight,model.decoder.layers.13.self_attn.q_proj.bias,model.decoder.layers.13.self_attn.out_proj.weight,model.decoder.layers.13.self_attn.out_proj.bias,model.decoder.layers.13.self_attn_layer_norm.weight,model.decoder.layers.13.self_attn_layer_norm.bias,model.decoder.layers.13.encoder_attn.k_proj.weight,model.decoder.layers.13.encoder_attn.k_proj.bias,model.decoder.layers.13.encoder_attn.v_proj.weight,model.decoder.layers.13.encoder_attn.v_proj.bias,model.decoder.layers.13.encoder_attn.q_proj.weight,model.decoder.layers.13.encoder_attn.q_proj.bias,model.decoder.layers.13.encoder_attn.out_proj.weight,model.decoder.layers.13.encoder_attn.out_proj.bias,model.decoder.layers.13.encoder_attn_layer_norm.weight,model.decoder.layers.13.encoder_attn_layer_norm.bias,model.decoder.layers.13.fc1.weight,model.decoder.layers.13.fc1.bias,model.decoder.layers.13.fc2.weight,model.decoder.layers.13.fc2.bias,model.decoder.layers.13.final_layer_norm.weight,model.decoder.layers.13.final_layer_norm.bias,model.decoder.layers.14.self_attn.k_proj.weight,model.decoder.layers.14.self_attn.k_proj.bias,model.decoder.layers.14.self_attn.v_proj.weight,model.decoder.layers.14.self_attn.v_proj.bias,model.decoder.layers.14.self_attn.q_proj.weight,model.decoder.layers.14.self_attn.q_proj.bias,model.decoder.layers.14.self_attn.out_proj.weight,model.decoder.layers.14.self_attn.out_proj.bias,model.decoder.layers.14.self_attn_layer_norm.weight,model.decoder.layers.14.self_attn_layer_norm.bias,model.decoder.layers.14.encoder_attn.k_proj.weight,model.decoder.layers.14.encoder_attn.k_proj.bias,model.decoder.layers.14.encoder_attn.v_proj.weight,model.decoder.layers.14.encoder_attn.v_proj.bias,model.decoder.layers.14.encoder_attn.q_proj.weight,model.decoder.layers.14.encoder_attn.q_proj.bias,model.decoder.layers.14.encoder_attn.out_proj.weight,model.decoder.layers.14.encoder_attn.out_proj.bias,model.decoder.layers.14.encoder_attn_layer_norm.weight,model.decoder.layers.14.encoder_attn_layer_norm.bias,model.decoder.layers.14.fc1.weight,model.decoder.layers.14.fc1.bias,model.decoder.layers.14.fc2.weight,model.decoder.layers.14.fc2.bias,model.decoder.layers.14.final_layer_norm.weight,model.decoder.layers.14.final_layer_norm.bias,model.decoder.layers.15.self_attn.k_proj.weight,model.decoder.layers.15.self_attn.k_proj.bias,model.decoder.layers.15.self_attn.v_proj.weight,model.decoder.layers.15.self_attn.v_proj.bias,model.decoder.layers.15.self_attn.q_proj.weight,model.decoder.layers.15.self_attn.q_proj.bias,model.decoder.layers.15.self_attn.out_proj.weight,model.decoder.layers.15.self_attn.out_proj.bias,model.decoder.layers.15.self_attn_layer_norm.weight,model.decoder.layers.15.self_attn_layer_norm.bias,model.decoder.layers.15.encoder_attn.k_proj.weight,model.decoder.layers.15.encoder_attn.k_proj.bias,model.decoder.layers.15.encoder_attn.v_proj.weight,model.decoder.layers.15.encoder_attn.v_proj.bias,model.decoder.layers.15.encoder_attn.q_proj.weight,model.decoder.layers.15.encoder_attn.q_proj.bias,model.decoder.layers.15.encoder_attn.out_proj.weight,model.decoder.layers.15.encoder_attn.out_proj.bias,model.decoder.layers.15.encoder_attn_layer_norm.weight,model.decoder.layers.15.encoder_attn_layer_norm.bias,model.decoder.layers.15.fc1.weight,model.decoder.layers.15.fc1.bias,model.decoder.layers.15.fc2.weight,model.decoder.layers.15.fc2.bias,model.decoder.layers.15.final_layer_norm.weight,model.decoder.layers.15.final_layer_norm.bias,model.decoder.layer_norm.weight,model.decoder.layer_norm.bias,lm_head.weight]. All weights are initialized. [2022-08-25 11:31:37,255 INFO] Using SimplePredict to predict... 22it [27:18, 74.47s/it]
print('Labeled samples:') ! tail -n 1 en_dev.tsv print('\n\nPredicted results:') ! tail -n 1 en.preds.txt
Labeled samples: Papa John's fell short of Wall Street's earnings estimates, but demand for its pizza remains high during the coronavirus pandemic. Worldwide, its same-store sales surged 15.5% in the quarter. Employee bonuses and higher commodity costs weighed on its profits in the latest period. Papa John's on Thursday reported quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. The company's stock fell more than 7% in premarket trading. Here's what the company reported compared with what Wall Street was expecting, based on a survey of analysts by Refinitiv: The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million, or 18 cents per share, a year earlier. It spent $6 million, or 12 cents per share, in the fourth quarter on its strategic reorganization, including opening a Georgia office. The company also paid out $2.7 million in end-of-year bonuses for its restaurant workers, shaving off 6 cents per share. Increased commodity costs also hit profits during the quarter. Excluding reorganization costs, Papa John's earned 40 cents per share, missing the 46 cents per share expected by analysts surveyed by Refinitiv. Net sales rose 12.5% to $469.8 million, beating expectations of $467.9 million. Worldwide, its same-store sales surged 15.5% in the quarter. North American same-store sales increased by 13.5%. Papa John's also raked in higher royalties from its franchisees because its operator assistance program, which began in the wake of the scandal that involved founder John Schnatter . International same-store sales climbed 21.4% in the quarter. Papa John's opened 40 net new locations, primarily due to international openings. As of Dec. 27, about 65 of the company's 5,400 locations were temporarily closed due to government restrictions, primarily in Latin America and Europe. The company also shared an update on its plans to open an office in Atlanta, saying it's on track to open by summer. Papa John's expects to spend $15 million to $20 million through 2021 related to the costs of adding the office, including employee severance, recruitment and relocation. Papa John's declined to provide an outlook for its financial targets during 2021, citing the uncertainty caused by the pandemic. Also Thursday, Domino's Pizza reported quarterly earnings that missed estimates . Predicted results: quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. Papa John's stock fell more than 7% in premarket trading. The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million a year earlier. quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. Papa John's stock fell more than 7% in premarket trading. The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million a year earlier.||quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. Papa John's stock fell more than 7% in premarket trading. The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million.||quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. Papa John's stock fell more than 7% in premarket trading. The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million in the fourth quarter of last year.||quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. Papa John's stock fell more than 7% in premarket trading. The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million last year.||quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. Papa John's stock fell more than 7% in premarket trading. The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million in the same period a year earlier. Papa John's fell short of Wall Street's earnings estimates, but demand for its pizza remains high during the coronavirus pandemic. Worldwide, its same-store sales surged 15.5% in the quarter. Employee bonuses and higher commodity costs weighed on its profits in the latest period. Papa John's on Thursday reported quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. The company's stock fell more than 7% in premarket trading. Here's what the company reported compared with what Wall Street was expecting, based on a survey of analysts by Refinitiv: The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million, or 18 cents per share, a year earlier. It spent $6 million, or 12 cents per share, in the fourth quarter on its strategic reorganization, including opening a Georgia office. The company also paid out $2.7 million in end-of-year bonuses for its restaurant workers, shaving off 6 cents per share. Increased commodity costs also hit profits during the quarter. Excluding reorganization costs, Papa John's earned 40 cents per share, missing the 46 cents per share expected by analysts surveyed by Refinitiv. Net sales rose 12.5% to $469.8 million, beating expectations of $467.9 million. Worldwide, its same-store sales surged 15.5% in the quarter. North American same-store sales increased by 13.5%. Papa John's also raked in higher royalties from its franchisees because its operator assistance program, which began in the wake of the scandal that involved founder John Schnatter . International same-store sales climbed 21.4% in the quarter. Papa John's opened 40 net new locations, primarily due to international openings. As of Dec. 27, about 65 of the company's 5,400 locations were temporarily closed due to government restrictions, primarily in Latin America and Europe. The company also shared an update on its plans to open an office in Atlanta, saying it's on track to open by summer. Papa John's expects to spend $15 million to $20 million through 2021 related to the costs of adding the office, including employee severance, recruitment and relocation. Papa John's declined to provide an outlook for its financial targets during 2021, citing the uncertainty caused by the pandemic. Also Thursday, Domino's Pizza reported quarterly earnings that missed estimates .
上面展示了数据集中的1条数据以及经过训练以后模型的预测结果。预测的标题为第一列,第二列为集束搜索(beam search)的5条结果,以||隔开。预测结果同时还包含原始新闻原文,相互之间以\t隔开。
一步执行
值得一提的是,上述所有训练/评估/预测代码,都已经被集成在EasyNLP/examples/appzoo_tutorials/sequence_generation/main.py中,此外,我们也预先编写好了多种可供直接执行的脚本。用户可以通过带参数运行上述main.py文件,或者直接执行脚本文件run_user_defined_local_en.sh的方式,一步执行上述所有训练/评估/预测操作。
mian.py文件一步执行
用户通过以下代码带参数执行main.py中的指令,可直接对模型进行训练/评估/预测操作。 训练代码指令如下。具体的参数解释可见上文,此处不再赘述。
模型训练代码如下:
! python main.py \ --mode train \ --app_name=sequence_generation \ --worker_gpu=1 \ --tables=./en_train.tsv,./en_dev.tsv \ --input_schema=title:str:1,content:str:1 \ --first_sequence=content \ --second_sequence=title \ --label_name=title \ --checkpoint_dir=./finetuned_en_model \ --micro_batch_size=1 \ --sequence_length=512 \ --epoch_num 1 \ --save_checkpoint_steps=500 \ --export_tf_checkpoint_type none \ --user_defined_parameters 'language=en pretrain_model_name_or_path=alibaba-pai/pegasus-summary-generation-en copy=false max_encoder_length=512 min_decoder_length=64 max_decoder_length=128 no_repeat_ngram_size=2 num_beams=5 num_return_sequences=5'
模型评估代码如下:
! python main.py \ --mode=evaluate \ --app_name=sequence_generation \ --worker_gpu=1 \ --tables=./en_dev.tsv \ --input_schema=title:str:1,content:str:1 \ --output_schema=predictions,beams \ --append_cols=title,content \ --first_sequence=content \ --second_sequence=title \ --checkpoint_dir=./finetuned_en_model \ --micro_batch_size 32 \ --sequence_length 512 \ --user_defined_parameters 'language=en copy=false max_encoder_length=512 min_decoder_length=64 max_decoder_length=128 no_repeat_ngram_size=2 num_beams=5 num_return_sequences=5'
模型预测代码如下:
! python main.py \ --mode=predict \ --app_name=sequence_generation \ --worker_gpu=1 \ --tables=./en_dev.tsv \ --outputs=./en.preds.txt \ --input_schema=title:str:1,content:str:1 \ --output_schema=predictions,beams \ --append_cols=title,content \ --first_sequence=content \ --checkpoint_dir=./finetuned_en_model \ --micro_batch_size 32 \ --sequence_length 512 \ --user_defined_parameters 'language=en copy=false max_encoder_length=512 min_decoder_length=64 max_decoder_length=128 no_repeat_ngram_size=2 num_beams=5 num_return_sequences=5'
利用bash文件命令行执行
我们在EasyNLP/examples/appzoo_tutorials/sequence_generation文件夹下封装好了多种可直接执行的脚本,用户同样可以通过带参数执行脚本文件的方式来一步完成模型的训练/评估/预测。以下以run_user_defined_local_en.sh脚本为例。该脚本文件需要传入两个参数,第一个参数为运行程序的GPU编号,一般为0;第二个参数代表模型的训练/评估/预测。
模型训练:
! bash examples/appzoo_tutorials/sequence_generation/run_user_defined_local_en.sh 0 train
模型评估:
! bash examples/appzoo_tutorials/sequence_generation/run_user_defined_local_en.sh 0 evaluate
模型预测:
! bash examples/appzoo_tutorials/sequence_generation/run_user_defined_local_en.sh 0 predict