【DSW Gallery】基于EasyNLP的英文文本摘要

本文涉及的产品
模型在线服务 PAI-EAS,A10/V100等 500元 1个月
交互式建模 PAI-DSW,每月250计算时 3个月
模型训练 PAI-DLC,100CU*H 3个月
简介: EasyNLP提供多种模型的训练及预测功能,旨在帮助自然语言开发者方便快捷地构建模型并应用于生产。本文以英文文本摘要为例,为您介绍如何在PAI-DSW中使用EasyNLP。

直接使用

请打开基于EasyNLP的英文文本摘要,并点击右上角 “ 在DSW中打开” 。

image.png


基于Pegasus的英文摘要生成

文本摘要(Text Summarization)旨在从冗长、重复的文本序列中抽取、精炼或总结出其中的要点信息。Pegasus是由谷歌提出的一个序列到序列预训练模型,它为模型设计了一个难度较大的预训练目标,即为缺失句子的文档进行缺失句子生成,该模型在参数量小的前提下,在12个文本摘要数据集中超过或与最先进水平持平的性能。同时,在低资源的条件下,模型同样具有较好的文本摘要性能。

EasyNLP中,我们提供了经过训练的Pegasus(其他可用模型见列表),以便用户能够受益于模型强大的建模能力。本文将以英文新闻标题生成任务为例,将Pegasus作为模型底座构建标题生成模型,展示如何利用EasyNLP进行模型构建、训练、评估、预测。

新增可用模型

hfl/brio-cnndm-uncased

运行环境要求

PAI-Pytorch 1.7/1.8镜像, GPU机型 P100 or V100, 内存32G

EasyNLP安装

建议从GitHub下载EasyNLP源代码进行安装,命令如下:

! git clone https://github.com/alibaba/EasyNLP.git
! pip install -r EasyNLP/requirements.txt -i http://mirrors.aliyun.com/pypi/simple/
! cd EasyNLP 
! python setup.py install

您可以使用如下命令验证是否安装成功:

! which easynlp

如果您系统内已经安装完easynlp的CLI工具,则说明EasyNLP代码库已经安装。

数据准备

首先,您需要下载用于本示例的训练和测试集,并创建保存模型的文件夹,命令如下:

! wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/generation/en_train.tsv
! wget http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/generation/en_dev.tsv
--2022-08-25 10:51:39--  http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/generation/en_train.tsv
Resolving atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27
Connecting to atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4127467 (3.9M) [text/tab-separated-values]
Saving to: ‘en_train.tsv.1’
en_train.tsv.1      100%[===================>]   3.94M  13.2MB/s    in 0.3s    
2022-08-25 10:51:40 (13.2 MB/s) - ‘en_train.tsv.1’ saved [4127467/4127467]
--2022-08-25 10:51:40--  http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/generation/en_dev.tsv
Resolving atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27
Connecting to atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2848282 (2.7M) [text/tab-separated-values]
Saving to: ‘en_dev.tsv.1’
en_dev.tsv.1        100%[===================>]   2.72M  10.9MB/s    in 0.3s    
2022-08-25 10:51:41 (10.9 MB/s) - ‘en_dev.tsv.1’ saved [2848282/2848282]

数据下载完成后,可以通过以下代码查看第一条数据。在训练集和开发集中,每一行为一条新闻文本数据,包括新闻摘要和新闻原文,两者通过制表符(\t)隔开。

print('Training data sample:')
! head -n 1 en_train.tsv
print('Development set data sample:')
! head -n 1 en_dev.tsv
Training data sample:
President Joe Biden on Friday will present the Medal of Honor for the first time since he took office, honoring a U.S. veteran of the Korean War for his "conspicuous gallantry," the White House said. South Korean President Moon Jae-In, who is visiting the White House that day for talks with Biden, is scheduled to attend the ceremony, according to the White House. The recipient of the nation's highest military award, Col. Ralph Puckett Jr., is a resident of Columbus, Georgia, who served in the Korean War and the Vietnam War, the White House said in a press release.  President Joe Biden on Friday will present the Medal of Honor for the first time since he took office, honoring a U.S. veteran of the Korean War for his "conspicuous gallantry," the White House said. South Korean President Moon Jae-In, who is visiting the White House that day for talks with Biden , is scheduled to attend the ceremony, according to the White House. The recipient of the nation's highest military award, Col. Ralph Puckett Jr., is a resident of Columbus, Georgia, who served in the Korean War and the Vietnam War, the White House said in a press release. Puckett "distinguished himself by acts of gallantry and intrepidity above and beyond the call of duty" in Korea in November 1950 by leading a unit of Army Rangers into a harrowing daylight attack on an enemy hill, the White House said. "To obtain supporting fire, First Lieutenant Puckett mounted the closest tank, exposing himself to the deadly enemy fire," the White House said. "Leaping from the tank, he shouted words of encouragement to his men and began to lead the Rangers in the attack." When enemy fire pinned down a platoon, Puckett left the relative safety of his position and "intentionally ran across an open area three times to draw enemy fire, thereby allowing the Rangers to locate and destroy the enemy positions and to seize Hill 205." "During the course of the night, the enemy launched a counterattack which lasted four hours," the press release said, adding that Puckett's leadership motivated the Rangers throughout. "As a result, five human wave attacks by a battalion strength enemy element were repulsed," the White House said. Puckett was injured by grenade fragments during the first of those waves, "but he refused evacuation and continually directed artillery support that decimated attacking enemy formations, repeatedly abandoned positions of relative safety to make his way from foxhole to foxhole to check the company's perimeter, and distributed ammunition amongst the Rangers," the White House said. On the sixth wave of attack, two mortar rounds landed in Puckett's foxhole, causing "grievous wounds," the press release said. "First Lieutenant Puckett commanded the Rangers to leave him behind and evacuate the area. Feeling a sense of duty to aid him, the Rangers refused the order and staged an effort to retrieve him from the foxhole while still under harassing fire from the enemy." "Ultimately, the Rangers succeeded in retrieving First Lieutenant Puckett and they moved to the bottom of the hill, where First Lieutenant Puckett called for devastating artillery fire on the top of the enemy controlled hill," the White House said. Puckett enlisted in the Army Enlisted Reserve Corps in 1943 and was discharged to the U.S. Military Academy in 1945. He served in the Korean War as a member of the 8th Army Ranger Company and in Vietnam as a member of the 101st Airborne Division.
Development set data sample:
Jeff Bezos' Blue Origin filed a protest with the Government Accountability Office against NASA on Monday. Blue Origin decried the award as "flawed" in a statement to CNBC, saying that NASA "moved the goalposts at the last minute." SpaceX was awarded $2.89 billion for NASA's Human Landing System program earlier this month, to build an astronaut lunar lander for the space agency.  Jeff Bezos ' Blue Origin filed a protest with the Government Accountability Office against NASA on Monday, challenging the space agency's award of a nearly $3 billion moon lander contract to Elon Musk's SpaceX earlier this month. SpaceX, in a competition against Blue Origin and Leidos ' subsidiary Dynetics, was awarded $2.89 billion for NASA's Human Landing System program . The HLS program is focused on building a lunar lander that can carry astronauts to the moon's surface under NASA's Artemis missions. For HLS, SpaceX bid a variation of its Starship rocket, prototypes of which the company has been testing at its facility in Texas. NASA was previously expected to choose two of the three teams to competitively build lunar landers, making the sole selection of SpaceX a surprise given the agency's prior goals for the program to continue to be a competition. Blue Origin decried the award as "flawed" in a statement to CNBC, saying that NASA "moved the goalposts at the last minute." "In NASA's own words, it has made a 'high risk' selection. Their decision eliminates opportunities for competition, significantly narrows the supply base, and not only delays, but also endangers America's return to the Moon. Because of that, we've filed a protest with the GAO," Blue Origin said. Blue Origin revealed that NASA evaluated the company's HLS proposal to cost $5.99 billion, or roughly twice that of SpaceX. The company argued in its protest filing that NASA's cost for funding both proposals would have been under $9 billion – or near how much the agency spent for SpaceX and Boeing to develop competing astronaut capsules under the Commercial Crew program . "In failing to maintain two sources ... NASA's selection decision creates a number of issues for the HLS program and puts all of NASA's eggs in one basket," Blue Origin wrote in the protest. The New York Times first reported Blue Origin's GAO protest. Blue Origin based its protest around five objections. First, Bezos' company said NASA did not give SpaceX's competitors an opportunity to "meaningfully compete" after "the agency's requirements changed due to its undisclosed, perceived shortfall of funding" for the HLS program. Second and third, Blue Origin said that NASA's acquisition was flawed under the agency's acquisition rules and its evaluation of the company's proposal "unreasonable." Fourth, the company asserted that NASA "improperly and disparately" evaluated SpaceX's proposal. And finally, Blue Origin said that NASA's evaluation of the proposals changed the weight it gave to key criteria, making price "the most important factor because of perceived funding limitations." The company highlighted work done to develop its lunar lander, including an undisclosed amount of its own investment into the BE-7 rocket engine that it planned to use for the spacecraft. "Blue Origin's substantial commercial investment in the BE-7 engine program is direct evidence of its corporate commitment in lunar exploration," the company wrote in the GAO protest. The space agency announced the SpaceX contract on April 16, with a source selection document written by human spaceflight director Kathy Lueders outlining NASA's reasons for its decision. NASA's based its selection on three primary factors: Technical ability, price, and then management rating. SpaceX and Blue Origin both received "acceptable" technical ratings, with SpaceX's price the lowest "by a wide margin" and its management rating was "outstanding" – while Blue Origin's management was rated as "very good," the same as Dynetics. Notably, NASA's selection committee said it found "two instances of proposed advance payments within Blue Origin's proposal." "I concur with the ... assessment that these kickoff meeting-related payments are counter to the solicitation's instructions and render Blue Origin's proposal ineligible for award," Lueders wrote. NASA requested $3.4 billion for the HLS program in fiscal year 2021, but Congress approved only $850 million. In light of that lower-than-expected funding, Lueders acknowledged that picking only one company's proposal for the HLS program was "not NASA's optimal outcome" but within the agency's acquisition rules. Last week, Musk hailed the NASA selection as a "great honor" and said he thinks the agency's goal of landing astronauts on the moon by 2024 is "actually doable." "It's been now almost half a century since humans were last on the moon. That's too long, we need to get back there and have a permanent base on the moon — again, like a big permanently occupied base on the moon," Musk said.

初始化

在Python 3.6环境下,我们首先从刚刚安装好的EasyNLP中引入模型运行需要的各种库,并做一些初始化。在本教程中,我们使用pegasus-summary-generation-en作为预训练模型底座。

# 为了避免EasyNLP中的args与Jupyter系统的冲突,需要手动设置,否则无法进行初始化。
# 在命令行或py文件中运行文中代码则可忽略下述代码。
import sys
sys.argv = ['main.py']
import imp
import sys
import os
import torch.cuda
sys.path.append('./')
from easynlp.core import Trainer
from easynlp.appzoo.sequence_generation.data import SequenceGenerationDataset
from easynlp.appzoo.sequence_generation.model import SequenceGeneration
from easynlp.appzoo.sequence_generation.evaluator import SequenceGenerationEvaluator
from easynlp.appzoo.sequence_generation.predictor import SequenceGenerationPredictor
from easynlp.appzoo import get_application_model_for_evaluation
from easynlp.utils import initialize_easynlp, get_args
from easynlp.utils.global_vars import parse_user_defined_parameters
from easynlp.core import PredictorManager
from easynlp.utils import get_pretrain_model_path
initialize_easynlp()
args = get_args()
user_defined_parameters = 'language=en pretrain_model_name_or_path=alibaba-pai/pegasus-summary-generation-en copy=false max_encoder_length=512 min_decoder_length=64 max_decoder_length=128 no_repeat_ngram_size=2 num_beams=5 num_return_sequences=5'
user_defined_parameters = parse_user_defined_parameters(user_defined_parameters)
args.checkpoint_dir = "./finetuned_en_model"
[2022-08-25 11:31:07,183.183 dsw34730-66c85d4cdb-6v2c6:85473 INFO utils.py:30] NOTICE: PAIDEBUGGER is turned off.
/home/pai/lib/python3.6/site-packages/OpenSSL/crypto.py:12: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
  from cryptography import x509
Please ignore the following import error if you are using tunnel table io.
No module named '_common_io'
No module named 'easy_predict'
------------------------ arguments ------------------------
  app_name ........................................ text_classify
  append_cols ..................................... None
  buckets ......................................... None
  checkpoint_dir .................................. None
  chief_hosts ..................................... 
  data_threads .................................... 10
  distributed_backend ............................. nccl
  do_lower_case ................................... False
  epoch_num ....................................... 3.0
  export_tf_checkpoint_type ....................... easytransfer
  first_sequence .................................. None
  gradient_accumulation_steps ..................... 1
  input_schema .................................... None
  is_chief ........................................ 
  is_master_node .................................. True
  job_name ........................................ None
  label_enumerate_values .......................... None
  label_name ...................................... None
  learning_rate ................................... 5e-05
  local_rank ...................................... None
  logging_steps ................................... 100
  master_port ..................................... 23456
  max_grad_norm ................................... 1.0
  micro_batch_size ................................ 2
  mode ............................................ train
  modelzoo_base_dir ............................... 
  n_cpu ........................................... 1
  n_gpu ........................................... 1
  odps_config ..................................... None
  optimizer_type .................................. AdamW
  output_schema ................................... 
  outputs ......................................... None
  predict_queue_size .............................. 1024
  predict_slice_size .............................. 4096
  predict_table_read_thread_num ................... 16
  predict_thread_num .............................. 2
  ps_hosts ........................................ 
  random_seed ..................................... 1234
  rank ............................................ 0
  read_odps ....................................... False
  restore_works_dir ............................... ./.easynlp_predict_restore_works_dir
  resume_from_checkpoint .......................... None
  save_all_checkpoints ............................ False
  save_checkpoint_steps ........................... None
  second_sequence ................................. None
  sequence_length ................................. 16
  skip_first_line ................................. False
  tables .......................................... None
  task_count ...................................... 1
  task_index ...................................... 0
  use_amp ......................................... False
  use_torchacc .................................... False
  user_defined_parameters ......................... None
  user_entry_file ................................. None
  user_script ..................................... None
  warmup_proportion ............................... 0.1
  weight_decay .................................... 0.0001
  worker_count .................................... 1
  worker_cpu ...................................... -1
  worker_gpu ...................................... -1
  worker_hosts .................................... None
  world_size ...................................... 1
-------------------- end of arguments ---------------------
> initializing torch distributed ...
[2022-08-25 11:31:09,214.214 dsw34730-66c85d4cdb-6v2c6:85473 INFO distributed_c10d.py:195] Added key: store_based_barrier_key:1 to store for rank: 0
Init dist done. World size: 1, rank 0, l_rank 0
> setting random seeds to 1234 ...

注意:上述代码如果出现“Address already in use”错误,则需要运行以下代码清理端口(默认为6000)上正在执行的程序。

netstat -tunlp|grep 6000

kill -9 PID (需要替换成上一行代码执行结果中对应的程序ID)

载入数据

我们使用EasyNLP中自带的SequenceGenerationDataset,对训练和测试数据进行载入。主要参数如下:

  • language: EasyNLP中默认语言为中文,因此此处需要传入一个语言参数来指定英文文本摘要训练
  • pretrained_model_name_or_path:预训练模型名称路径,这里我们使用封装好的get_pretrain_model_path函数,来处理模型名称”pegasus-summary-generation-en”,并自动下载模型
  • max_seq_length:文本最大长度,超过将截断,不足将padding
  • input_schema:输入数据的格式,逗号分隔的每一项对应数据文件中每行以\t分隔的一项,每项开头为其字段标识,如label、sent1等
  • first_sequence、label_name:用于说明input_schema中哪些字段用于作为输入句子和标签列等
  • label_enumerate_values:label类型列举
  • is_training:是否为训练过程,train_dataset为True,valid_dataset为False
  • app_name:指定当前需要执行的任务,如文本分类、序列标注、文本匹配、文本生成等

下面我们将手动设置一些参数以便进行实验。

args.tables = "./en_train.tsv,./en_dev.tsv"
args.input_schema = "title:str:1,content:str:1"
args.first_sequence = "content"
args.second_sequence = "title" 
args.label_name = "title"
args.learning_rate = 3e-5
args.epoch_num = 1
args.save_checkpoint_steps = 500
args.sequence_length = 512
args.micro_batch_size = 8
args.export_tf_checkpoint_type = "none"
args.app_name = "sequence_generation"
args.pretrained_model_name_or_path = user_defined_parameters.get('pretrain_model_name_or_path', None)
args.pretrained_model_name_or_path = get_pretrain_model_path(args.pretrained_model_name_or_path)
train_dataset = SequenceGenerationDataset(
        pretrained_model_name_or_path=args.pretrained_model_name_or_path,
        data_file=args.tables.split(",")[0],
        max_seq_length=args.sequence_length,
        input_schema=args.input_schema,
        first_sequence=args.first_sequence,
        second_sequence=args.second_sequence,
        user_defined_parameters=user_defined_parameters,
        is_training=True)
valid_dataset = SequenceGenerationDataset(
        pretrained_model_name_or_path=args.pretrained_model_name_or_path,
        data_file=args.tables.split(",")[-1],
        max_seq_length=args.sequence_length,
        input_schema=args.input_schema,
        first_sequence=args.first_sequence,
        second_sequence=args.second_sequence,
        user_defined_parameters=user_defined_parameters,
        is_training=False)
`/root/.easynlp/modelzoo/alibaba-pai/pegasus-summary-generation-en.tgz` already exists

模型训练

处理好数据与模型载入后,我们开始训练模型。 我们使用EasyNLP中封装好的SequenceGeneration函数进行训练时的模型构建,其参数如下:

  • pretrained_model_name_or_path:预训练模型名称路径,这里我们使用封装好的get_pretrain_model_path函数,来处理模型名称”pegasus-summary-generation-en”,并自动下载模型
  • user_defined_parameters:用户自定义参数,直接填入刚刚处理好的自定义参数user_defined_parameters

构建模型并读取

model = SequenceGeneration(pretrained_model_name_or_path=args.pretrained_model_name_or_path, user_defined_parameters=user_defined_parameters)
Loaded weights of the model:
 [final_logits_bias,model.shared.weight,model.encoder.embed_tokens.weight,model.encoder.embed_positions.weight,model.encoder.layers.0.self_attn.k_proj.weight,model.encoder.layers.0.self_attn.k_proj.bias,model.encoder.layers.0.self_attn.v_proj.weight,model.encoder.layers.0.self_attn.v_proj.bias,model.encoder.layers.0.self_attn.q_proj.weight,model.encoder.layers.0.self_attn.q_proj.bias,model.encoder.layers.0.self_attn.out_proj.weight,model.encoder.layers.0.self_attn.out_proj.bias,model.encoder.layers.0.self_attn_layer_norm.weight,model.encoder.layers.0.self_attn_layer_norm.bias,model.encoder.layers.0.fc1.weight,model.encoder.layers.0.fc1.bias,model.encoder.layers.0.fc2.weight,model.encoder.layers.0.fc2.bias,model.encoder.layers.0.final_layer_norm.weight,model.encoder.layers.0.final_layer_norm.bias,model.encoder.layers.1.self_attn.k_proj.weight,model.encoder.layers.1.self_attn.k_proj.bias,model.encoder.layers.1.self_attn.v_proj.weight,model.encoder.layers.1.self_attn.v_proj.bias,model.encoder.layers.1.self_attn.q_proj.weight,model.encoder.layers.1.self_attn.q_proj.bias,model.encoder.layers.1.self_attn.out_proj.weight,model.encoder.layers.1.self_attn.out_proj.bias,model.encoder.layers.1.self_attn_layer_norm.weight,model.encoder.layers.1.self_attn_layer_norm.bias,model.encoder.layers.1.fc1.weight,model.encoder.layers.1.fc1.bias,model.encoder.layers.1.fc2.weight,model.encoder.layers.1.fc2.bias,model.encoder.layers.1.final_layer_norm.weight,model.encoder.layers.1.final_layer_norm.bias,model.encoder.layers.2.self_attn.k_proj.weight,model.encoder.layers.2.self_attn.k_proj.bias,model.encoder.layers.2.self_attn.v_proj.weight,model.encoder.layers.2.self_attn.v_proj.bias,model.encoder.layers.2.self_attn.q_proj.weight,model.encoder.layers.2.self_attn.q_proj.bias,model.encoder.layers.2.self_attn.out_proj.weight,model.encoder.layers.2.self_attn.out_proj.bias,model.encoder.layers.2.self_attn_layer_norm.weight,model.encoder.layers.2.self_attn_layer_norm.bias,model.encoder.layers.2.fc1.weight,model.encoder.layers.2.fc1.bias,model.encoder.layers.2.fc2.weight,model.encoder.layers.2.fc2.bias,model.encoder.layers.2.final_layer_norm.weight,model.encoder.layers.2.final_layer_norm.bias,model.encoder.layers.3.self_attn.k_proj.weight,model.encoder.layers.3.self_attn.k_proj.bias,model.encoder.layers.3.self_attn.v_proj.weight,model.encoder.layers.3.self_attn.v_proj.bias,model.encoder.layers.3.self_attn.q_proj.weight,model.encoder.layers.3.self_attn.q_proj.bias,model.encoder.layers.3.self_attn.out_proj.weight,model.encoder.layers.3.self_attn.out_proj.bias,model.encoder.layers.3.self_attn_layer_norm.weight,model.encoder.layers.3.self_attn_layer_norm.bias,model.encoder.layers.3.fc1.weight,model.encoder.layers.3.fc1.bias,model.encoder.layers.3.fc2.weight,model.encoder.layers.3.fc2.bias,model.encoder.layers.3.final_layer_norm.weight,model.encoder.layers.3.final_layer_norm.bias,model.encoder.layers.4.self_attn.k_proj.weight,model.encoder.layers.4.self_attn.k_proj.bias,model.encoder.layers.4.self_attn.v_proj.weight,model.encoder.layers.4.self_attn.v_proj.bias,model.encoder.layers.4.self_attn.q_proj.weight,model.encoder.layers.4.self_attn.q_proj.bias,model.encoder.layers.4.self_attn.out_proj.weight,model.encoder.layers.4.self_attn.out_proj.bias,model.encoder.layers.4.self_attn_layer_norm.weight,model.encoder.layers.4.self_attn_layer_norm.bias,model.encoder.layers.4.fc1.weight,model.encoder.layers.4.fc1.bias,model.encoder.layers.4.fc2.weight,model.encoder.layers.4.fc2.bias,model.encoder.layers.4.final_layer_norm.weight,model.encoder.layers.4.final_layer_norm.bias,model.encoder.layers.5.self_attn.k_proj.weight,model.encoder.layers.5.self_attn.k_proj.bias,model.encoder.layers.5.self_attn.v_proj.weight,model.encoder.layers.5.self_attn.v_proj.bias,model.encoder.layers.5.self_attn.q_proj.weight,model.encoder.layers.5.self_attn.q_proj.bias,model.encoder.layers.5.self_attn.out_proj.weight,model.encoder.layers.5.self_attn.out_proj.bias,model.encoder.layers.5.self_attn_layer_norm.weight,model.encoder.layers.5.self_attn_layer_norm.bias,model.encoder.layers.5.fc1.weight,model.encoder.layers.5.fc1.bias,model.encoder.layers.5.fc2.weight,model.encoder.layers.5.fc2.bias,model.encoder.layers.5.final_layer_norm.weight,model.encoder.layers.5.final_layer_norm.bias,model.encoder.layers.6.self_attn.k_proj.weight,model.encoder.layers.6.self_attn.k_proj.bias,model.encoder.layers.6.self_attn.v_proj.weight,model.encoder.layers.6.self_attn.v_proj.bias,model.encoder.layers.6.self_attn.q_proj.weight,model.encoder.layers.6.self_attn.q_proj.bias,model.encoder.layers.6.self_attn.out_proj.weight,model.encoder.layers.6.self_attn.out_proj.bias,model.encoder.layers.6.self_attn_layer_norm.weight,model.encoder.layers.6.self_attn_layer_norm.bias,model.encoder.layers.6.fc1.weight,model.encoder.layers.6.fc1.bias,model.encoder.layers.6.fc2.weight,model.encoder.layers.6.fc2.bias,model.encoder.layers.6.final_layer_norm.weight,model.encoder.layers.6.final_layer_norm.bias,model.encoder.layers.7.self_attn.k_proj.weight,model.encoder.layers.7.self_attn.k_proj.bias,model.encoder.layers.7.self_attn.v_proj.weight,model.encoder.layers.7.self_attn.v_proj.bias,model.encoder.layers.7.self_attn.q_proj.weight,model.encoder.layers.7.self_attn.q_proj.bias,model.encoder.layers.7.self_attn.out_proj.weight,model.encoder.layers.7.self_attn.out_proj.bias,model.encoder.layers.7.self_attn_layer_norm.weight,model.encoder.layers.7.self_attn_layer_norm.bias,model.encoder.layers.7.fc1.weight,model.encoder.layers.7.fc1.bias,model.encoder.layers.7.fc2.weight,model.encoder.layers.7.fc2.bias,model.encoder.layers.7.final_layer_norm.weight,model.encoder.layers.7.final_layer_norm.bias,model.encoder.layers.8.self_attn.k_proj.weight,model.encoder.layers.8.self_attn.k_proj.bias,model.encoder.layers.8.self_attn.v_proj.weight,model.encoder.layers.8.self_attn.v_proj.bias,model.encoder.layers.8.self_attn.q_proj.weight,model.encoder.layers.8.self_attn.q_proj.bias,model.encoder.layers.8.self_attn.out_proj.weight,model.encoder.layers.8.self_attn.out_proj.bias,model.encoder.layers.8.self_attn_layer_norm.weight,model.encoder.layers.8.self_attn_layer_norm.bias,model.encoder.layers.8.fc1.weight,model.encoder.layers.8.fc1.bias,model.encoder.layers.8.fc2.weight,model.encoder.layers.8.fc2.bias,model.encoder.layers.8.final_layer_norm.weight,model.encoder.layers.8.final_layer_norm.bias,model.encoder.layers.9.self_attn.k_proj.weight,model.encoder.layers.9.self_attn.k_proj.bias,model.encoder.layers.9.self_attn.v_proj.weight,model.encoder.layers.9.self_attn.v_proj.bias,model.encoder.layers.9.self_attn.q_proj.weight,model.encoder.layers.9.self_attn.q_proj.bias,model.encoder.layers.9.self_attn.out_proj.weight,model.encoder.layers.9.self_attn.out_proj.bias,model.encoder.layers.9.self_attn_layer_norm.weight,model.encoder.layers.9.self_attn_layer_norm.bias,model.encoder.layers.9.fc1.weight,model.encoder.layers.9.fc1.bias,model.encoder.layers.9.fc2.weight,model.encoder.layers.9.fc2.bias,model.encoder.layers.9.final_layer_norm.weight,model.encoder.layers.9.final_layer_norm.bias,model.encoder.layers.10.self_attn.k_proj.weight,model.encoder.layers.10.self_attn.k_proj.bias,model.encoder.layers.10.self_attn.v_proj.weight,model.encoder.layers.10.self_attn.v_proj.bias,model.encoder.layers.10.self_attn.q_proj.weight,model.encoder.layers.10.self_attn.q_proj.bias,model.encoder.layers.10.self_attn.out_proj.weight,model.encoder.layers.10.self_attn.out_proj.bias,model.encoder.layers.10.self_attn_layer_norm.weight,model.encoder.layers.10.self_attn_layer_norm.bias,model.encoder.layers.10.fc1.weight,model.encoder.layers.10.fc1.bias,model.encoder.layers.10.fc2.weight,model.encoder.layers.10.fc2.bias,model.encoder.layers.10.final_layer_norm.weight,model.encoder.layers.10.final_layer_norm.bias,model.encoder.layers.11.self_attn.k_proj.weight,model.encoder.layers.11.self_attn.k_proj.bias,model.encoder.layers.11.self_attn.v_proj.weight,model.encoder.layers.11.self_attn.v_proj.bias,model.encoder.layers.11.self_attn.q_proj.weight,model.encoder.layers.11.self_attn.q_proj.bias,model.encoder.layers.11.self_attn.out_proj.weight,model.encoder.layers.11.self_attn.out_proj.bias,model.encoder.layers.11.self_attn_layer_norm.weight,model.encoder.layers.11.self_attn_layer_norm.bias,model.encoder.layers.11.fc1.weight,model.encoder.layers.11.fc1.bias,model.encoder.layers.11.fc2.weight,model.encoder.layers.11.fc2.bias,model.encoder.layers.11.final_layer_norm.weight,model.encoder.layers.11.final_layer_norm.bias,model.encoder.layers.12.self_attn.k_proj.weight,model.encoder.layers.12.self_attn.k_proj.bias,model.encoder.layers.12.self_attn.v_proj.weight,model.encoder.layers.12.self_attn.v_proj.bias,model.encoder.layers.12.self_attn.q_proj.weight,model.encoder.layers.12.self_attn.q_proj.bias,model.encoder.layers.12.self_attn.out_proj.weight,model.encoder.layers.12.self_attn.out_proj.bias,model.encoder.layers.12.self_attn_layer_norm.weight,model.encoder.layers.12.self_attn_layer_norm.bias,model.encoder.layers.12.fc1.weight,model.encoder.layers.12.fc1.bias,model.encoder.layers.12.fc2.weight,model.encoder.layers.12.fc2.bias,model.encoder.layers.12.final_layer_norm.weight,model.encoder.layers.12.final_layer_norm.bias,model.encoder.layers.13.self_attn.k_proj.weight,model.encoder.layers.13.self_attn.k_proj.bias,model.encoder.layers.13.self_attn.v_proj.weight,model.encoder.layers.13.self_attn.v_proj.bias,model.encoder.layers.13.self_attn.q_proj.weight,model.encoder.layers.13.self_attn.q_proj.bias,model.encoder.layers.13.self_attn.out_proj.weight,model.encoder.layers.13.self_attn.out_proj.bias,model.encoder.layers.13.self_attn_layer_norm.weight,model.encoder.layers.13.self_attn_layer_norm.bias,model.encoder.layers.13.fc1.weight,model.encoder.layers.13.fc1.bias,model.encoder.layers.13.fc2.weight,model.encoder.layers.13.fc2.bias,model.encoder.layers.13.final_layer_norm.weight,model.encoder.layers.13.final_layer_norm.bias,model.encoder.layers.14.self_attn.k_proj.weight,model.encoder.layers.14.self_attn.k_proj.bias,model.encoder.layers.14.self_attn.v_proj.weight,model.encoder.layers.14.self_attn.v_proj.bias,model.encoder.layers.14.self_attn.q_proj.weight,model.encoder.layers.14.self_attn.q_proj.bias,model.encoder.layers.14.self_attn.out_proj.weight,model.encoder.layers.14.self_attn.out_proj.bias,model.encoder.layers.14.self_attn_layer_norm.weight,model.encoder.layers.14.self_attn_layer_norm.bias,model.encoder.layers.14.fc1.weight,model.encoder.layers.14.fc1.bias,model.encoder.layers.14.fc2.weight,model.encoder.layers.14.fc2.bias,model.encoder.layers.14.final_layer_norm.weight,model.encoder.layers.14.final_layer_norm.bias,model.encoder.layers.15.self_attn.k_proj.weight,model.encoder.layers.15.self_attn.k_proj.bias,model.encoder.layers.15.self_attn.v_proj.weight,model.encoder.layers.15.self_attn.v_proj.bias,model.encoder.layers.15.self_attn.q_proj.weight,model.encoder.layers.15.self_attn.q_proj.bias,model.encoder.layers.15.self_attn.out_proj.weight,model.encoder.layers.15.self_attn.out_proj.bias,model.encoder.layers.15.self_attn_layer_norm.weight,model.encoder.layers.15.self_attn_layer_norm.bias,model.encoder.layers.15.fc1.weight,model.encoder.layers.15.fc1.bias,model.encoder.layers.15.fc2.weight,model.encoder.layers.15.fc2.bias,model.encoder.layers.15.final_layer_norm.weight,model.encoder.layers.15.final_layer_norm.bias,model.encoder.layer_norm.weight,model.encoder.layer_norm.bias,model.decoder.embed_tokens.weight,model.decoder.embed_positions.weight,model.decoder.layers.0.self_attn.k_proj.weight,model.decoder.layers.0.self_attn.k_proj.bias,model.decoder.layers.0.self_attn.v_proj.weight,model.decoder.layers.0.self_attn.v_proj.bias,model.decoder.layers.0.self_attn.q_proj.weight,model.decoder.layers.0.self_attn.q_proj.bias,model.decoder.layers.0.self_attn.out_proj.weight,model.decoder.layers.0.self_attn.out_proj.bias,model.decoder.layers.0.self_attn_layer_norm.weight,model.decoder.layers.0.self_attn_layer_norm.bias,model.decoder.layers.0.encoder_attn.k_proj.weight,model.decoder.layers.0.encoder_attn.k_proj.bias,model.decoder.layers.0.encoder_attn.v_proj.weight,model.decoder.layers.0.encoder_attn.v_proj.bias,model.decoder.layers.0.encoder_attn.q_proj.weight,model.decoder.layers.0.encoder_attn.q_proj.bias,model.decoder.layers.0.encoder_attn.out_proj.weight,model.decoder.layers.0.encoder_attn.out_proj.bias,model.decoder.layers.0.encoder_attn_layer_norm.weight,model.decoder.layers.0.encoder_attn_layer_norm.bias,model.decoder.layers.0.fc1.weight,model.decoder.layers.0.fc1.bias,model.decoder.layers.0.fc2.weight,model.decoder.layers.0.fc2.bias,model.decoder.layers.0.final_layer_norm.weight,model.decoder.layers.0.final_layer_norm.bias,model.decoder.layers.1.self_attn.k_proj.weight,model.decoder.layers.1.self_attn.k_proj.bias,model.decoder.layers.1.self_attn.v_proj.weight,model.decoder.layers.1.self_attn.v_proj.bias,model.decoder.layers.1.self_attn.q_proj.weight,model.decoder.layers.1.self_attn.q_proj.bias,model.decoder.layers.1.self_attn.out_proj.weight,model.decoder.layers.1.self_attn.out_proj.bias,model.decoder.layers.1.self_attn_layer_norm.weight,model.decoder.layers.1.self_attn_layer_norm.bias,model.decoder.layers.1.encoder_attn.k_proj.weight,model.decoder.layers.1.encoder_attn.k_proj.bias,model.decoder.layers.1.encoder_attn.v_proj.weight,model.decoder.layers.1.encoder_attn.v_proj.bias,model.decoder.layers.1.encoder_attn.q_proj.weight,model.decoder.layers.1.encoder_attn.q_proj.bias,model.decoder.layers.1.encoder_attn.out_proj.weight,model.decoder.layers.1.encoder_attn.out_proj.bias,model.decoder.layers.1.encoder_attn_layer_norm.weight,model.decoder.layers.1.encoder_attn_layer_norm.bias,model.decoder.layers.1.fc1.weight,model.decoder.layers.1.fc1.bias,model.decoder.layers.1.fc2.weight,model.decoder.layers.1.fc2.bias,model.decoder.layers.1.final_layer_norm.weight,model.decoder.layers.1.final_layer_norm.bias,model.decoder.layers.2.self_attn.k_proj.weight,model.decoder.layers.2.self_attn.k_proj.bias,model.decoder.layers.2.self_attn.v_proj.weight,model.decoder.layers.2.self_attn.v_proj.bias,model.decoder.layers.2.self_attn.q_proj.weight,model.decoder.layers.2.self_attn.q_proj.bias,model.decoder.layers.2.self_attn.out_proj.weight,model.decoder.layers.2.self_attn.out_proj.bias,model.decoder.layers.2.self_attn_layer_norm.weight,model.decoder.layers.2.self_attn_layer_norm.bias,model.decoder.layers.2.encoder_attn.k_proj.weight,model.decoder.layers.2.encoder_attn.k_proj.bias,model.decoder.layers.2.encoder_attn.v_proj.weight,model.decoder.layers.2.encoder_attn.v_proj.bias,model.decoder.layers.2.encoder_attn.q_proj.weight,model.decoder.layers.2.encoder_attn.q_proj.bias,model.decoder.layers.2.encoder_attn.out_proj.weight,model.decoder.layers.2.encoder_attn.out_proj.bias,model.decoder.layers.2.encoder_attn_layer_norm.weight,model.decoder.layers.2.encoder_attn_layer_norm.bias,model.decoder.layers.2.fc1.weight,model.decoder.layers.2.fc1.bias,model.decoder.layers.2.fc2.weight,model.decoder.layers.2.fc2.bias,model.decoder.layers.2.final_layer_norm.weight,model.decoder.layers.2.final_layer_norm.bias,model.decoder.layers.3.self_attn.k_proj.weight,model.decoder.layers.3.self_attn.k_proj.bias,model.decoder.layers.3.self_attn.v_proj.weight,model.decoder.layers.3.self_attn.v_proj.bias,model.decoder.layers.3.self_attn.q_proj.weight,model.decoder.layers.3.self_attn.q_proj.bias,model.decoder.layers.3.self_attn.out_proj.weight,model.decoder.layers.3.self_attn.out_proj.bias,model.decoder.layers.3.self_attn_layer_norm.weight,model.decoder.layers.3.self_attn_layer_norm.bias,model.decoder.layers.3.encoder_attn.k_proj.weight,model.decoder.layers.3.encoder_attn.k_proj.bias,model.decoder.layers.3.encoder_attn.v_proj.weight,model.decoder.layers.3.encoder_attn.v_proj.bias,model.decoder.layers.3.encoder_attn.q_proj.weight,model.decoder.layers.3.encoder_attn.q_proj.bias,model.decoder.layers.3.encoder_attn.out_proj.weight,model.decoder.layers.3.encoder_attn.out_proj.bias,model.decoder.layers.3.encoder_attn_layer_norm.weight,model.decoder.layers.3.encoder_attn_layer_norm.bias,model.decoder.layers.3.fc1.weight,model.decoder.layers.3.fc1.bias,model.decoder.layers.3.fc2.weight,model.decoder.layers.3.fc2.bias,model.decoder.layers.3.final_layer_norm.weight,model.decoder.layers.3.final_layer_norm.bias,model.decoder.layers.4.self_attn.k_proj.weight,model.decoder.layers.4.self_attn.k_proj.bias,model.decoder.layers.4.self_attn.v_proj.weight,model.decoder.layers.4.self_attn.v_proj.bias,model.decoder.layers.4.self_attn.q_proj.weight,model.decoder.layers.4.self_attn.q_proj.bias,model.decoder.layers.4.self_attn.out_proj.weight,model.decoder.layers.4.self_attn.out_proj.bias,model.decoder.layers.4.self_attn_layer_norm.weight,model.decoder.layers.4.self_attn_layer_norm.bias,model.decoder.layers.4.encoder_attn.k_proj.weight,model.decoder.layers.4.encoder_attn.k_proj.bias,model.decoder.layers.4.encoder_attn.v_proj.weight,model.decoder.layers.4.encoder_attn.v_proj.bias,model.decoder.layers.4.encoder_attn.q_proj.weight,model.decoder.layers.4.encoder_attn.q_proj.bias,model.decoder.layers.4.encoder_attn.out_proj.weight,model.decoder.layers.4.encoder_attn.out_proj.bias,model.decoder.layers.4.encoder_attn_layer_norm.weight,model.decoder.layers.4.encoder_attn_layer_norm.bias,model.decoder.layers.4.fc1.weight,model.decoder.layers.4.fc1.bias,model.decoder.layers.4.fc2.weight,model.decoder.layers.4.fc2.bias,model.decoder.layers.4.final_layer_norm.weight,model.decoder.layers.4.final_layer_norm.bias,model.decoder.layers.5.self_attn.k_proj.weight,model.decoder.layers.5.self_attn.k_proj.bias,model.decoder.layers.5.self_attn.v_proj.weight,model.decoder.layers.5.self_attn.v_proj.bias,model.decoder.layers.5.self_attn.q_proj.weight,model.decoder.layers.5.self_attn.q_proj.bias,model.decoder.layers.5.self_attn.out_proj.weight,model.decoder.layers.5.self_attn.out_proj.bias,model.decoder.layers.5.self_attn_layer_norm.weight,model.decoder.layers.5.self_attn_layer_norm.bias,model.decoder.layers.5.encoder_attn.k_proj.weight,model.decoder.layers.5.encoder_attn.k_proj.bias,model.decoder.layers.5.encoder_attn.v_proj.weight,model.decoder.layers.5.encoder_attn.v_proj.bias,model.decoder.layers.5.encoder_attn.q_proj.weight,model.decoder.layers.5.encoder_attn.q_proj.bias,model.decoder.layers.5.encoder_attn.out_proj.weight,model.decoder.layers.5.encoder_attn.out_proj.bias,model.decoder.layers.5.encoder_attn_layer_norm.weight,model.decoder.layers.5.encoder_attn_layer_norm.bias,model.decoder.layers.5.fc1.weight,model.decoder.layers.5.fc1.bias,model.decoder.layers.5.fc2.weight,model.decoder.layers.5.fc2.bias,model.decoder.layers.5.final_layer_norm.weight,model.decoder.layers.5.final_layer_norm.bias,model.decoder.layers.6.self_attn.k_proj.weight,model.decoder.layers.6.self_attn.k_proj.bias,model.decoder.layers.6.self_attn.v_proj.weight,model.decoder.layers.6.self_attn.v_proj.bias,model.decoder.layers.6.self_attn.q_proj.weight,model.decoder.layers.6.self_attn.q_proj.bias,model.decoder.layers.6.self_attn.out_proj.weight,model.decoder.layers.6.self_attn.out_proj.bias,model.decoder.layers.6.self_attn_layer_norm.weight,model.decoder.layers.6.self_attn_layer_norm.bias,model.decoder.layers.6.encoder_attn.k_proj.weight,model.decoder.layers.6.encoder_attn.k_proj.bias,model.decoder.layers.6.encoder_attn.v_proj.weight,model.decoder.layers.6.encoder_attn.v_proj.bias,model.decoder.layers.6.encoder_attn.q_proj.weight,model.decoder.layers.6.encoder_attn.q_proj.bias,model.decoder.layers.6.encoder_attn.out_proj.weight,model.decoder.layers.6.encoder_attn.out_proj.bias,model.decoder.layers.6.encoder_attn_layer_norm.weight,model.decoder.layers.6.encoder_attn_layer_norm.bias,model.decoder.layers.6.fc1.weight,model.decoder.layers.6.fc1.bias,model.decoder.layers.6.fc2.weight,model.decoder.layers.6.fc2.bias,model.decoder.layers.6.final_layer_norm.weight,model.decoder.layers.6.final_layer_norm.bias,model.decoder.layers.7.self_attn.k_proj.weight,model.decoder.layers.7.self_attn.k_proj.bias,model.decoder.layers.7.self_attn.v_proj.weight,model.decoder.layers.7.self_attn.v_proj.bias,model.decoder.layers.7.self_attn.q_proj.weight,model.decoder.layers.7.self_attn.q_proj.bias,model.decoder.layers.7.self_attn.out_proj.weight,model.decoder.layers.7.self_attn.out_proj.bias,model.decoder.layers.7.self_attn_layer_norm.weight,model.decoder.layers.7.self_attn_layer_norm.bias,model.decoder.layers.7.encoder_attn.k_proj.weight,model.decoder.layers.7.encoder_attn.k_proj.bias,model.decoder.layers.7.encoder_attn.v_proj.weight,model.decoder.layers.7.encoder_attn.v_proj.bias,model.decoder.layers.7.encoder_attn.q_proj.weight,model.decoder.layers.7.encoder_attn.q_proj.bias,model.decoder.layers.7.encoder_attn.out_proj.weight,model.decoder.layers.7.encoder_attn.out_proj.bias,model.decoder.layers.7.encoder_attn_layer_norm.weight,model.decoder.layers.7.encoder_attn_layer_norm.bias,model.decoder.layers.7.fc1.weight,model.decoder.layers.7.fc1.bias,model.decoder.layers.7.fc2.weight,model.decoder.layers.7.fc2.bias,model.decoder.layers.7.final_layer_norm.weight,model.decoder.layers.7.final_layer_norm.bias,model.decoder.layers.8.self_attn.k_proj.weight,model.decoder.layers.8.self_attn.k_proj.bias,model.decoder.layers.8.self_attn.v_proj.weight,model.decoder.layers.8.self_attn.v_proj.bias,model.decoder.layers.8.self_attn.q_proj.weight,model.decoder.layers.8.self_attn.q_proj.bias,model.decoder.layers.8.self_attn.out_proj.weight,model.decoder.layers.8.self_attn.out_proj.bias,model.decoder.layers.8.self_attn_layer_norm.weight,model.decoder.layers.8.self_attn_layer_norm.bias,model.decoder.layers.8.encoder_attn.k_proj.weight,model.decoder.layers.8.encoder_attn.k_proj.bias,model.decoder.layers.8.encoder_attn.v_proj.weight,model.decoder.layers.8.encoder_attn.v_proj.bias,model.decoder.layers.8.encoder_attn.q_proj.weight,model.decoder.layers.8.encoder_attn.q_proj.bias,model.decoder.layers.8.encoder_attn.out_proj.weight,model.decoder.layers.8.encoder_attn.out_proj.bias,model.decoder.layers.8.encoder_attn_layer_norm.weight,model.decoder.layers.8.encoder_attn_layer_norm.bias,model.decoder.layers.8.fc1.weight,model.decoder.layers.8.fc1.bias,model.decoder.layers.8.fc2.weight,model.decoder.layers.8.fc2.bias,model.decoder.layers.8.final_layer_norm.weight,model.decoder.layers.8.final_layer_norm.bias,model.decoder.layers.9.self_attn.k_proj.weight,model.decoder.layers.9.self_attn.k_proj.bias,model.decoder.layers.9.self_attn.v_proj.weight,model.decoder.layers.9.self_attn.v_proj.bias,model.decoder.layers.9.self_attn.q_proj.weight,model.decoder.layers.9.self_attn.q_proj.bias,model.decoder.layers.9.self_attn.out_proj.weight,model.decoder.layers.9.self_attn.out_proj.bias,model.decoder.layers.9.self_attn_layer_norm.weight,model.decoder.layers.9.self_attn_layer_norm.bias,model.decoder.layers.9.encoder_attn.k_proj.weight,model.decoder.layers.9.encoder_attn.k_proj.bias,model.decoder.layers.9.encoder_attn.v_proj.weight,model.decoder.layers.9.encoder_attn.v_proj.bias,model.decoder.layers.9.encoder_attn.q_proj.weight,model.decoder.layers.9.encoder_attn.q_proj.bias,model.decoder.layers.9.encoder_attn.out_proj.weight,model.decoder.layers.9.encoder_attn.out_proj.bias,model.decoder.layers.9.encoder_attn_layer_norm.weight,model.decoder.layers.9.encoder_attn_layer_norm.bias,model.decoder.layers.9.fc1.weight,model.decoder.layers.9.fc1.bias,model.decoder.layers.9.fc2.weight,model.decoder.layers.9.fc2.bias,model.decoder.layers.9.final_layer_norm.weight,model.decoder.layers.9.final_layer_norm.bias,model.decoder.layers.10.self_attn.k_proj.weight,model.decoder.layers.10.self_attn.k_proj.bias,model.decoder.layers.10.self_attn.v_proj.weight,model.decoder.layers.10.self_attn.v_proj.bias,model.decoder.layers.10.self_attn.q_proj.weight,model.decoder.layers.10.self_attn.q_proj.bias,model.decoder.layers.10.self_attn.out_proj.weight,model.decoder.layers.10.self_attn.out_proj.bias,model.decoder.layers.10.self_attn_layer_norm.weight,model.decoder.layers.10.self_attn_layer_norm.bias,model.decoder.layers.10.encoder_attn.k_proj.weight,model.decoder.layers.10.encoder_attn.k_proj.bias,model.decoder.layers.10.encoder_attn.v_proj.weight,model.decoder.layers.10.encoder_attn.v_proj.bias,model.decoder.layers.10.encoder_attn.q_proj.weight,model.decoder.layers.10.encoder_attn.q_proj.bias,model.decoder.layers.10.encoder_attn.out_proj.weight,model.decoder.layers.10.encoder_attn.out_proj.bias,model.decoder.layers.10.encoder_attn_layer_norm.weight,model.decoder.layers.10.encoder_attn_layer_norm.bias,model.decoder.layers.10.fc1.weight,model.decoder.layers.10.fc1.bias,model.decoder.layers.10.fc2.weight,model.decoder.layers.10.fc2.bias,model.decoder.layers.10.final_layer_norm.weight,model.decoder.layers.10.final_layer_norm.bias,model.decoder.layers.11.self_attn.k_proj.weight,model.decoder.layers.11.self_attn.k_proj.bias,model.decoder.layers.11.self_attn.v_proj.weight,model.decoder.layers.11.self_attn.v_proj.bias,model.decoder.layers.11.self_attn.q_proj.weight,model.decoder.layers.11.self_attn.q_proj.bias,model.decoder.layers.11.self_attn.out_proj.weight,model.decoder.layers.11.self_attn.out_proj.bias,model.decoder.layers.11.self_attn_layer_norm.weight,model.decoder.layers.11.self_attn_layer_norm.bias,model.decoder.layers.11.encoder_attn.k_proj.weight,model.decoder.layers.11.encoder_attn.k_proj.bias,model.decoder.layers.11.encoder_attn.v_proj.weight,model.decoder.layers.11.encoder_attn.v_proj.bias,model.decoder.layers.11.encoder_attn.q_proj.weight,model.decoder.layers.11.encoder_attn.q_proj.bias,model.decoder.layers.11.encoder_attn.out_proj.weight,model.decoder.layers.11.encoder_attn.out_proj.bias,model.decoder.layers.11.encoder_attn_layer_norm.weight,model.decoder.layers.11.encoder_attn_layer_norm.bias,model.decoder.layers.11.fc1.weight,model.decoder.layers.11.fc1.bias,model.decoder.layers.11.fc2.weight,model.decoder.layers.11.fc2.bias,model.decoder.layers.11.final_layer_norm.weight,model.decoder.layers.11.final_layer_norm.bias,model.decoder.layers.12.self_attn.k_proj.weight,model.decoder.layers.12.self_attn.k_proj.bias,model.decoder.layers.12.self_attn.v_proj.weight,model.decoder.layers.12.self_attn.v_proj.bias,model.decoder.layers.12.self_attn.q_proj.weight,model.decoder.layers.12.self_attn.q_proj.bias,model.decoder.layers.12.self_attn.out_proj.weight,model.decoder.layers.12.self_attn.out_proj.bias,model.decoder.layers.12.self_attn_layer_norm.weight,model.decoder.layers.12.self_attn_layer_norm.bias,model.decoder.layers.12.encoder_attn.k_proj.weight,model.decoder.layers.12.encoder_attn.k_proj.bias,model.decoder.layers.12.encoder_attn.v_proj.weight,model.decoder.layers.12.encoder_attn.v_proj.bias,model.decoder.layers.12.encoder_attn.q_proj.weight,model.decoder.layers.12.encoder_attn.q_proj.bias,model.decoder.layers.12.encoder_attn.out_proj.weight,model.decoder.layers.12.encoder_attn.out_proj.bias,model.decoder.layers.12.encoder_attn_layer_norm.weight,model.decoder.layers.12.encoder_attn_layer_norm.bias,model.decoder.layers.12.fc1.weight,model.decoder.layers.12.fc1.bias,model.decoder.layers.12.fc2.weight,model.decoder.layers.12.fc2.bias,model.decoder.layers.12.final_layer_norm.weight,model.decoder.layers.12.final_layer_norm.bias,model.decoder.layers.13.self_attn.k_proj.weight,model.decoder.layers.13.self_attn.k_proj.bias,model.decoder.layers.13.self_attn.v_proj.weight,model.decoder.layers.13.self_attn.v_proj.bias,model.decoder.layers.13.self_attn.q_proj.weight,model.decoder.layers.13.self_attn.q_proj.bias,model.decoder.layers.13.self_attn.out_proj.weight,model.decoder.layers.13.self_attn.out_proj.bias,model.decoder.layers.13.self_attn_layer_norm.weight,model.decoder.layers.13.self_attn_layer_norm.bias,model.decoder.layers.13.encoder_attn.k_proj.weight,model.decoder.layers.13.encoder_attn.k_proj.bias,model.decoder.layers.13.encoder_attn.v_proj.weight,model.decoder.layers.13.encoder_attn.v_proj.bias,model.decoder.layers.13.encoder_attn.q_proj.weight,model.decoder.layers.13.encoder_attn.q_proj.bias,model.decoder.layers.13.encoder_attn.out_proj.weight,model.decoder.layers.13.encoder_attn.out_proj.bias,model.decoder.layers.13.encoder_attn_layer_norm.weight,model.decoder.layers.13.encoder_attn_layer_norm.bias,model.decoder.layers.13.fc1.weight,model.decoder.layers.13.fc1.bias,model.decoder.layers.13.fc2.weight,model.decoder.layers.13.fc2.bias,model.decoder.layers.13.final_layer_norm.weight,model.decoder.layers.13.final_layer_norm.bias,model.decoder.layers.14.self_attn.k_proj.weight,model.decoder.layers.14.self_attn.k_proj.bias,model.decoder.layers.14.self_attn.v_proj.weight,model.decoder.layers.14.self_attn.v_proj.bias,model.decoder.layers.14.self_attn.q_proj.weight,model.decoder.layers.14.self_attn.q_proj.bias,model.decoder.layers.14.self_attn.out_proj.weight,model.decoder.layers.14.self_attn.out_proj.bias,model.decoder.layers.14.self_attn_layer_norm.weight,model.decoder.layers.14.self_attn_layer_norm.bias,model.decoder.layers.14.encoder_attn.k_proj.weight,model.decoder.layers.14.encoder_attn.k_proj.bias,model.decoder.layers.14.encoder_attn.v_proj.weight,model.decoder.layers.14.encoder_attn.v_proj.bias,model.decoder.layers.14.encoder_attn.q_proj.weight,model.decoder.layers.14.encoder_attn.q_proj.bias,model.decoder.layers.14.encoder_attn.out_proj.weight,model.decoder.layers.14.encoder_attn.out_proj.bias,model.decoder.layers.14.encoder_attn_layer_norm.weight,model.decoder.layers.14.encoder_attn_layer_norm.bias,model.decoder.layers.14.fc1.weight,model.decoder.layers.14.fc1.bias,model.decoder.layers.14.fc2.weight,model.decoder.layers.14.fc2.bias,model.decoder.layers.14.final_layer_norm.weight,model.decoder.layers.14.final_layer_norm.bias,model.decoder.layers.15.self_attn.k_proj.weight,model.decoder.layers.15.self_attn.k_proj.bias,model.decoder.layers.15.self_attn.v_proj.weight,model.decoder.layers.15.self_attn.v_proj.bias,model.decoder.layers.15.self_attn.q_proj.weight,model.decoder.layers.15.self_attn.q_proj.bias,model.decoder.layers.15.self_attn.out_proj.weight,model.decoder.layers.15.self_attn.out_proj.bias,model.decoder.layers.15.self_attn_layer_norm.weight,model.decoder.layers.15.self_attn_layer_norm.bias,model.decoder.layers.15.encoder_attn.k_proj.weight,model.decoder.layers.15.encoder_attn.k_proj.bias,model.decoder.layers.15.encoder_attn.v_proj.weight,model.decoder.layers.15.encoder_attn.v_proj.bias,model.decoder.layers.15.encoder_attn.q_proj.weight,model.decoder.layers.15.encoder_attn.q_proj.bias,model.decoder.layers.15.encoder_attn.out_proj.weight,model.decoder.layers.15.encoder_attn.out_proj.bias,model.decoder.layers.15.encoder_attn_layer_norm.weight,model.decoder.layers.15.encoder_attn_layer_norm.bias,model.decoder.layers.15.fc1.weight,model.decoder.layers.15.fc1.bias,model.decoder.layers.15.fc2.weight,model.decoder.layers.15.fc2.bias,model.decoder.layers.15.final_layer_norm.weight,model.decoder.layers.15.final_layer_norm.bias,model.decoder.layer_norm.weight,model.decoder.layer_norm.bias,lm_head.weight].
All weights are initialized.

构建训练器并训练

extra_para = {'pretrained_model_name_or_path':args.pretrained_model_name_or_path}
evaluator = SequenceGenerationEvaluator(valid_dataset=valid_dataset, user_defined_parameters=user_defined_parameters, **extra_para)
trainer = Trainer(model=model, train_dataset=train_dataset, user_defined_parameters=user_defined_parameters,
                evaluator=evaluator)
trainer.train()
[2022-08-25 10:55:53,630 INFO] ========== Initializing Tensorboard ==========
[2022-08-25 10:55:53,668 INFO] ========== Training Start ==========
[2022-08-25 10:55:53,669 INFO]   Num of GPUs (all)       = 1
[2022-08-25 10:55:53,671 INFO]   Num of CPUs per worker  = 1
[2022-08-25 10:55:53,672 INFO]   Num dataset examples    = 1000
[2022-08-25 10:55:53,674 INFO]   Num training examples   = 1000
[2022-08-25 10:55:53,675 INFO]   Num validation examples = 700
[2022-08-25 10:55:53,676 INFO]   Train. batch size       = 8
[2022-08-25 10:55:53,677 INFO]   Train. micro batch size = 8
[2022-08-25 10:55:53,678 INFO]   Train. batch no.        = 125
[2022-08-25 10:55:53,679 INFO]   Evaluation batch size   = 8
[2022-08-25 10:55:53,681 INFO]   Total training steps    = 125
[2022-08-25 10:55:53,681 INFO]   Sequence length         = 512
[2022-08-25 10:55:53,682 INFO]   Saving steps            = 500
[2022-08-25 10:55:53,683 INFO]   Distributed_backend     = nccl
[2022-08-25 10:55:53,683 INFO]   Worker Count            = 1
[2022-08-25 10:55:53,684 INFO]   Worker CPU              = -1
[2022-08-25 10:55:53,684 INFO]   Worker data threads     = 10
[2022-08-25 10:55:53,688 INFO]   num model params        = 570,797,056
[2022-08-25 10:55:53,692 INFO]   num trainable params    = 568,699,904
[2022-08-25 10:55:53,692 INFO] 
[2022-08-25 10:55:53,693 INFO] ========== Model Config ==========
[2022-08-25 10:55:53,694 INFO] {
  "activation_dropout": 0.1,
  "activation_function": "relu",
  "add_bias_logits": false,
  "add_final_layer_norm": true,
  "architectures": [
    "PegasusForConditionalGeneration"
  ],
  "attention_dropout": 0.1,
  "bos_token_id": 0,
  "classif_dropout": 0.0,
  "classifier_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 16,
  "decoder_start_token_id": 0,
  "dropout": 0.1,
  "easynlp_version": "0.0.3",
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 4096,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 16,
  "eos_token_id": 1,
  "extra_pos_embeddings": 1,
  "forced_eos_token_id": 1,
  "gradient_checkpointing": false,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "is_encoder_decoder": true,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2
  },
  "length_penalty": 0.8,
  "max_length": 128,
  "max_position_embeddings": 1024,
  "min_length": 32,
  "model_type": "pegasus",
  "normalize_before": true,
  "normalize_embedding": false,
  "num_beams": 8,
  "num_hidden_layers": 16,
  "pad_token_id": 0,
  "scale_embedding": true,
  "static_position_embeddings": true,
  "use_cache": true,
  "vocab_size": 96103
}
optimizer type: AdamW
/home/pai/lib/python3.6/site-packages/pai_easynlp-0.0.7-py3.6.egg/easynlp/core/optimizers.py:441: UserWarning: This overload of add_ is deprecated:
  add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
  add_(Tensor other, *, Number alpha) (Triggered internally at  /workspace/artifacts/paipytorch1.8/dist/ubuntu18.04-py3.6-cuda10.1/build/src/torch/csrc/utils/python_arg_parser.cpp:1005.)
  exp_avg.mul_(beta1).add_(1.0 - beta1, grad)
/home/pai/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
  warnings.warn("To get the last learning rate computed by the scheduler, "
[2022-08-25 10:57:27,064 INFO] Epoch [ 0/ 1], step [100/125], lr 0.000007, 93.37 s
[2022-08-25 10:57:27,065 INFO]   loss      : 0.8108 
Training Time: 118.0245532989502, rank 0, gsteps 125
100%|██████████| 700/700 [22:31<00:00,  1.93s/it]
[2022-08-25 11:20:24,646 INFO] Saving best model to ./finetuned_en_model/pytorch_model.bin...
Rouge 1/2/L: 37.78/18.57/35.34
[2022-08-25 11:21:13,500 INFO] Best score: 35.33964534239289
[2022-08-25 11:21:13,501 INFO] Training Time: 1520.4240629673004

模型评估

训练过程结束后,模型被我们保存在一开始指定好的checkpoint_dir中,本地路径为”./finetuned_en_model/”。我们可以对训练好的模型进行效果评估。我们使用EasyNLP中的SequenceGenerationEvaluator来初始化evaluator,并模型迁移至GPU机器,进行模型评估。

args.tables = "en_dev.tsv"
extra_para = {'pretrained_model_name_or_path':args.pretrained_model_name_or_path}
evaluator = SequenceGenerationEvaluator(valid_dataset=valid_dataset, user_defined_parameters=user_defined_parameters, **extra_para)
if args.n_gpu > 0:
    model.to(torch.cuda.current_device())
else:
    model.to("cpu")
evaluator.evaluate(model=model)
100%|██████████| 700/700 [11:23<00:00,  1.02it/s]
Rouge 1/2/L: 33.76/16.46/31.70
[('rouge-l', 31.699629026817473),
 ('rouge-1', 33.756938153964946),
 ('rouge-2', 16.462475090936373)]

模型预测

我们同样可以使用训练好的模型进行新闻标题生成。我们首先创建一个predictor,并据此实例化一个PredictorManager实例。我们指定预测好的结果输出在en.preds.txt

args.tables = "en_dev.tsv"
args.outputs = "en.preds.txt"
args.input_schema = "title:str:1,content:str:1"
args.output_schema = "predictions,beams"
args.append_cols="title,content"
args.micro_batch_size = 32
predictor = SequenceGenerationPredictor(model_dir=args.checkpoint_dir, model_cls=SequenceGeneration,
                                      first_sequence=args.first_sequence, user_defined_parameters=user_defined_parameters)
predictor_manager = PredictorManager(
    predictor=predictor,
    input_file=args.tables.split(",")[0],
    input_schema=args.input_schema,
    output_file=args.outputs,
    output_schema=args.output_schema,
    append_cols=args.append_cols,
    batch_size=args.micro_batch_size
)
predictor_manager.run()
 Loaded weights of the model:
 [final_logits_bias,model.shared.weight,model.encoder.embed_tokens.weight,model.encoder.embed_positions.weight,model.encoder.layers.0.self_attn.k_proj.weight,model.encoder.layers.0.self_attn.k_proj.bias,model.encoder.layers.0.self_attn.v_proj.weight,model.encoder.layers.0.self_attn.v_proj.bias,model.encoder.layers.0.self_attn.q_proj.weight,model.encoder.layers.0.self_attn.q_proj.bias,model.encoder.layers.0.self_attn.out_proj.weight,model.encoder.layers.0.self_attn.out_proj.bias,model.encoder.layers.0.self_attn_layer_norm.weight,model.encoder.layers.0.self_attn_layer_norm.bias,model.encoder.layers.0.fc1.weight,model.encoder.layers.0.fc1.bias,model.encoder.layers.0.fc2.weight,model.encoder.layers.0.fc2.bias,model.encoder.layers.0.final_layer_norm.weight,model.encoder.layers.0.final_layer_norm.bias,model.encoder.layers.1.self_attn.k_proj.weight,model.encoder.layers.1.self_attn.k_proj.bias,model.encoder.layers.1.self_attn.v_proj.weight,model.encoder.layers.1.self_attn.v_proj.bias,model.encoder.layers.1.self_attn.q_proj.weight,model.encoder.layers.1.self_attn.q_proj.bias,model.encoder.layers.1.self_attn.out_proj.weight,model.encoder.layers.1.self_attn.out_proj.bias,model.encoder.layers.1.self_attn_layer_norm.weight,model.encoder.layers.1.self_attn_layer_norm.bias,model.encoder.layers.1.fc1.weight,model.encoder.layers.1.fc1.bias,model.encoder.layers.1.fc2.weight,model.encoder.layers.1.fc2.bias,model.encoder.layers.1.final_layer_norm.weight,model.encoder.layers.1.final_layer_norm.bias,model.encoder.layers.2.self_attn.k_proj.weight,model.encoder.layers.2.self_attn.k_proj.bias,model.encoder.layers.2.self_attn.v_proj.weight,model.encoder.layers.2.self_attn.v_proj.bias,model.encoder.layers.2.self_attn.q_proj.weight,model.encoder.layers.2.self_attn.q_proj.bias,model.encoder.layers.2.self_attn.out_proj.weight,model.encoder.layers.2.self_attn.out_proj.bias,model.encoder.layers.2.self_attn_layer_norm.weight,model.encoder.layers.2.self_attn_layer_norm.bias,model.encoder.layers.2.fc1.weight,model.encoder.layers.2.fc1.bias,model.encoder.layers.2.fc2.weight,model.encoder.layers.2.fc2.bias,model.encoder.layers.2.final_layer_norm.weight,model.encoder.layers.2.final_layer_norm.bias,model.encoder.layers.3.self_attn.k_proj.weight,model.encoder.layers.3.self_attn.k_proj.bias,model.encoder.layers.3.self_attn.v_proj.weight,model.encoder.layers.3.self_attn.v_proj.bias,model.encoder.layers.3.self_attn.q_proj.weight,model.encoder.layers.3.self_attn.q_proj.bias,model.encoder.layers.3.self_attn.out_proj.weight,model.encoder.layers.3.self_attn.out_proj.bias,model.encoder.layers.3.self_attn_layer_norm.weight,model.encoder.layers.3.self_attn_layer_norm.bias,model.encoder.layers.3.fc1.weight,model.encoder.layers.3.fc1.bias,model.encoder.layers.3.fc2.weight,model.encoder.layers.3.fc2.bias,model.encoder.layers.3.final_layer_norm.weight,model.encoder.layers.3.final_layer_norm.bias,model.encoder.layers.4.self_attn.k_proj.weight,model.encoder.layers.4.self_attn.k_proj.bias,model.encoder.layers.4.self_attn.v_proj.weight,model.encoder.layers.4.self_attn.v_proj.bias,model.encoder.layers.4.self_attn.q_proj.weight,model.encoder.layers.4.self_attn.q_proj.bias,model.encoder.layers.4.self_attn.out_proj.weight,model.encoder.layers.4.self_attn.out_proj.bias,model.encoder.layers.4.self_attn_layer_norm.weight,model.encoder.layers.4.self_attn_layer_norm.bias,model.encoder.layers.4.fc1.weight,model.encoder.layers.4.fc1.bias,model.encoder.layers.4.fc2.weight,model.encoder.layers.4.fc2.bias,model.encoder.layers.4.final_layer_norm.weight,model.encoder.layers.4.final_layer_norm.bias,model.encoder.layers.5.self_attn.k_proj.weight,model.encoder.layers.5.self_attn.k_proj.bias,model.encoder.layers.5.self_attn.v_proj.weight,model.encoder.layers.5.self_attn.v_proj.bias,model.encoder.layers.5.self_attn.q_proj.weight,model.encoder.layers.5.self_attn.q_proj.bias,model.encoder.layers.5.self_attn.out_proj.weight,model.encoder.layers.5.self_attn.out_proj.bias,model.encoder.layers.5.self_attn_layer_norm.weight,model.encoder.layers.5.self_attn_layer_norm.bias,model.encoder.layers.5.fc1.weight,model.encoder.layers.5.fc1.bias,model.encoder.layers.5.fc2.weight,model.encoder.layers.5.fc2.bias,model.encoder.layers.5.final_layer_norm.weight,model.encoder.layers.5.final_layer_norm.bias,model.encoder.layers.6.self_attn.k_proj.weight,model.encoder.layers.6.self_attn.k_proj.bias,model.encoder.layers.6.self_attn.v_proj.weight,model.encoder.layers.6.self_attn.v_proj.bias,model.encoder.layers.6.self_attn.q_proj.weight,model.encoder.layers.6.self_attn.q_proj.bias,model.encoder.layers.6.self_attn.out_proj.weight,model.encoder.layers.6.self_attn.out_proj.bias,model.encoder.layers.6.self_attn_layer_norm.weight,model.encoder.layers.6.self_attn_layer_norm.bias,model.encoder.layers.6.fc1.weight,model.encoder.layers.6.fc1.bias,model.encoder.layers.6.fc2.weight,model.encoder.layers.6.fc2.bias,model.encoder.layers.6.final_layer_norm.weight,model.encoder.layers.6.final_layer_norm.bias,model.encoder.layers.7.self_attn.k_proj.weight,model.encoder.layers.7.self_attn.k_proj.bias,model.encoder.layers.7.self_attn.v_proj.weight,model.encoder.layers.7.self_attn.v_proj.bias,model.encoder.layers.7.self_attn.q_proj.weight,model.encoder.layers.7.self_attn.q_proj.bias,model.encoder.layers.7.self_attn.out_proj.weight,model.encoder.layers.7.self_attn.out_proj.bias,model.encoder.layers.7.self_attn_layer_norm.weight,model.encoder.layers.7.self_attn_layer_norm.bias,model.encoder.layers.7.fc1.weight,model.encoder.layers.7.fc1.bias,model.encoder.layers.7.fc2.weight,model.encoder.layers.7.fc2.bias,model.encoder.layers.7.final_layer_norm.weight,model.encoder.layers.7.final_layer_norm.bias,model.encoder.layers.8.self_attn.k_proj.weight,model.encoder.layers.8.self_attn.k_proj.bias,model.encoder.layers.8.self_attn.v_proj.weight,model.encoder.layers.8.self_attn.v_proj.bias,model.encoder.layers.8.self_attn.q_proj.weight,model.encoder.layers.8.self_attn.q_proj.bias,model.encoder.layers.8.self_attn.out_proj.weight,model.encoder.layers.8.self_attn.out_proj.bias,model.encoder.layers.8.self_attn_layer_norm.weight,model.encoder.layers.8.self_attn_layer_norm.bias,model.encoder.layers.8.fc1.weight,model.encoder.layers.8.fc1.bias,model.encoder.layers.8.fc2.weight,model.encoder.layers.8.fc2.bias,model.encoder.layers.8.final_layer_norm.weight,model.encoder.layers.8.final_layer_norm.bias,model.encoder.layers.9.self_attn.k_proj.weight,model.encoder.layers.9.self_attn.k_proj.bias,model.encoder.layers.9.self_attn.v_proj.weight,model.encoder.layers.9.self_attn.v_proj.bias,model.encoder.layers.9.self_attn.q_proj.weight,model.encoder.layers.9.self_attn.q_proj.bias,model.encoder.layers.9.self_attn.out_proj.weight,model.encoder.layers.9.self_attn.out_proj.bias,model.encoder.layers.9.self_attn_layer_norm.weight,model.encoder.layers.9.self_attn_layer_norm.bias,model.encoder.layers.9.fc1.weight,model.encoder.layers.9.fc1.bias,model.encoder.layers.9.fc2.weight,model.encoder.layers.9.fc2.bias,model.encoder.layers.9.final_layer_norm.weight,model.encoder.layers.9.final_layer_norm.bias,model.encoder.layers.10.self_attn.k_proj.weight,model.encoder.layers.10.self_attn.k_proj.bias,model.encoder.layers.10.self_attn.v_proj.weight,model.encoder.layers.10.self_attn.v_proj.bias,model.encoder.layers.10.self_attn.q_proj.weight,model.encoder.layers.10.self_attn.q_proj.bias,model.encoder.layers.10.self_attn.out_proj.weight,model.encoder.layers.10.self_attn.out_proj.bias,model.encoder.layers.10.self_attn_layer_norm.weight,model.encoder.layers.10.self_attn_layer_norm.bias,model.encoder.layers.10.fc1.weight,model.encoder.layers.10.fc1.bias,model.encoder.layers.10.fc2.weight,model.encoder.layers.10.fc2.bias,model.encoder.layers.10.final_layer_norm.weight,model.encoder.layers.10.final_layer_norm.bias,model.encoder.layers.11.self_attn.k_proj.weight,model.encoder.layers.11.self_attn.k_proj.bias,model.encoder.layers.11.self_attn.v_proj.weight,model.encoder.layers.11.self_attn.v_proj.bias,model.encoder.layers.11.self_attn.q_proj.weight,model.encoder.layers.11.self_attn.q_proj.bias,model.encoder.layers.11.self_attn.out_proj.weight,model.encoder.layers.11.self_attn.out_proj.bias,model.encoder.layers.11.self_attn_layer_norm.weight,model.encoder.layers.11.self_attn_layer_norm.bias,model.encoder.layers.11.fc1.weight,model.encoder.layers.11.fc1.bias,model.encoder.layers.11.fc2.weight,model.encoder.layers.11.fc2.bias,model.encoder.layers.11.final_layer_norm.weight,model.encoder.layers.11.final_layer_norm.bias,model.encoder.layers.12.self_attn.k_proj.weight,model.encoder.layers.12.self_attn.k_proj.bias,model.encoder.layers.12.self_attn.v_proj.weight,model.encoder.layers.12.self_attn.v_proj.bias,model.encoder.layers.12.self_attn.q_proj.weight,model.encoder.layers.12.self_attn.q_proj.bias,model.encoder.layers.12.self_attn.out_proj.weight,model.encoder.layers.12.self_attn.out_proj.bias,model.encoder.layers.12.self_attn_layer_norm.weight,model.encoder.layers.12.self_attn_layer_norm.bias,model.encoder.layers.12.fc1.weight,model.encoder.layers.12.fc1.bias,model.encoder.layers.12.fc2.weight,model.encoder.layers.12.fc2.bias,model.encoder.layers.12.final_layer_norm.weight,model.encoder.layers.12.final_layer_norm.bias,model.encoder.layers.13.self_attn.k_proj.weight,model.encoder.layers.13.self_attn.k_proj.bias,model.encoder.layers.13.self_attn.v_proj.weight,model.encoder.layers.13.self_attn.v_proj.bias,model.encoder.layers.13.self_attn.q_proj.weight,model.encoder.layers.13.self_attn.q_proj.bias,model.encoder.layers.13.self_attn.out_proj.weight,model.encoder.layers.13.self_attn.out_proj.bias,model.encoder.layers.13.self_attn_layer_norm.weight,model.encoder.layers.13.self_attn_layer_norm.bias,model.encoder.layers.13.fc1.weight,model.encoder.layers.13.fc1.bias,model.encoder.layers.13.fc2.weight,model.encoder.layers.13.fc2.bias,model.encoder.layers.13.final_layer_norm.weight,model.encoder.layers.13.final_layer_norm.bias,model.encoder.layers.14.self_attn.k_proj.weight,model.encoder.layers.14.self_attn.k_proj.bias,model.encoder.layers.14.self_attn.v_proj.weight,model.encoder.layers.14.self_attn.v_proj.bias,model.encoder.layers.14.self_attn.q_proj.weight,model.encoder.layers.14.self_attn.q_proj.bias,model.encoder.layers.14.self_attn.out_proj.weight,model.encoder.layers.14.self_attn.out_proj.bias,model.encoder.layers.14.self_attn_layer_norm.weight,model.encoder.layers.14.self_attn_layer_norm.bias,model.encoder.layers.14.fc1.weight,model.encoder.layers.14.fc1.bias,model.encoder.layers.14.fc2.weight,model.encoder.layers.14.fc2.bias,model.encoder.layers.14.final_layer_norm.weight,model.encoder.layers.14.final_layer_norm.bias,model.encoder.layers.15.self_attn.k_proj.weight,model.encoder.layers.15.self_attn.k_proj.bias,model.encoder.layers.15.self_attn.v_proj.weight,model.encoder.layers.15.self_attn.v_proj.bias,model.encoder.layers.15.self_attn.q_proj.weight,model.encoder.layers.15.self_attn.q_proj.bias,model.encoder.layers.15.self_attn.out_proj.weight,model.encoder.layers.15.self_attn.out_proj.bias,model.encoder.layers.15.self_attn_layer_norm.weight,model.encoder.layers.15.self_attn_layer_norm.bias,model.encoder.layers.15.fc1.weight,model.encoder.layers.15.fc1.bias,model.encoder.layers.15.fc2.weight,model.encoder.layers.15.fc2.bias,model.encoder.layers.15.final_layer_norm.weight,model.encoder.layers.15.final_layer_norm.bias,model.encoder.layer_norm.weight,model.encoder.layer_norm.bias,model.decoder.embed_tokens.weight,model.decoder.embed_positions.weight,model.decoder.layers.0.self_attn.k_proj.weight,model.decoder.layers.0.self_attn.k_proj.bias,model.decoder.layers.0.self_attn.v_proj.weight,model.decoder.layers.0.self_attn.v_proj.bias,model.decoder.layers.0.self_attn.q_proj.weight,model.decoder.layers.0.self_attn.q_proj.bias,model.decoder.layers.0.self_attn.out_proj.weight,model.decoder.layers.0.self_attn.out_proj.bias,model.decoder.layers.0.self_attn_layer_norm.weight,model.decoder.layers.0.self_attn_layer_norm.bias,model.decoder.layers.0.encoder_attn.k_proj.weight,model.decoder.layers.0.encoder_attn.k_proj.bias,model.decoder.layers.0.encoder_attn.v_proj.weight,model.decoder.layers.0.encoder_attn.v_proj.bias,model.decoder.layers.0.encoder_attn.q_proj.weight,model.decoder.layers.0.encoder_attn.q_proj.bias,model.decoder.layers.0.encoder_attn.out_proj.weight,model.decoder.layers.0.encoder_attn.out_proj.bias,model.decoder.layers.0.encoder_attn_layer_norm.weight,model.decoder.layers.0.encoder_attn_layer_norm.bias,model.decoder.layers.0.fc1.weight,model.decoder.layers.0.fc1.bias,model.decoder.layers.0.fc2.weight,model.decoder.layers.0.fc2.bias,model.decoder.layers.0.final_layer_norm.weight,model.decoder.layers.0.final_layer_norm.bias,model.decoder.layers.1.self_attn.k_proj.weight,model.decoder.layers.1.self_attn.k_proj.bias,model.decoder.layers.1.self_attn.v_proj.weight,model.decoder.layers.1.self_attn.v_proj.bias,model.decoder.layers.1.self_attn.q_proj.weight,model.decoder.layers.1.self_attn.q_proj.bias,model.decoder.layers.1.self_attn.out_proj.weight,model.decoder.layers.1.self_attn.out_proj.bias,model.decoder.layers.1.self_attn_layer_norm.weight,model.decoder.layers.1.self_attn_layer_norm.bias,model.decoder.layers.1.encoder_attn.k_proj.weight,model.decoder.layers.1.encoder_attn.k_proj.bias,model.decoder.layers.1.encoder_attn.v_proj.weight,model.decoder.layers.1.encoder_attn.v_proj.bias,model.decoder.layers.1.encoder_attn.q_proj.weight,model.decoder.layers.1.encoder_attn.q_proj.bias,model.decoder.layers.1.encoder_attn.out_proj.weight,model.decoder.layers.1.encoder_attn.out_proj.bias,model.decoder.layers.1.encoder_attn_layer_norm.weight,model.decoder.layers.1.encoder_attn_layer_norm.bias,model.decoder.layers.1.fc1.weight,model.decoder.layers.1.fc1.bias,model.decoder.layers.1.fc2.weight,model.decoder.layers.1.fc2.bias,model.decoder.layers.1.final_layer_norm.weight,model.decoder.layers.1.final_layer_norm.bias,model.decoder.layers.2.self_attn.k_proj.weight,model.decoder.layers.2.self_attn.k_proj.bias,model.decoder.layers.2.self_attn.v_proj.weight,model.decoder.layers.2.self_attn.v_proj.bias,model.decoder.layers.2.self_attn.q_proj.weight,model.decoder.layers.2.self_attn.q_proj.bias,model.decoder.layers.2.self_attn.out_proj.weight,model.decoder.layers.2.self_attn.out_proj.bias,model.decoder.layers.2.self_attn_layer_norm.weight,model.decoder.layers.2.self_attn_layer_norm.bias,model.decoder.layers.2.encoder_attn.k_proj.weight,model.decoder.layers.2.encoder_attn.k_proj.bias,model.decoder.layers.2.encoder_attn.v_proj.weight,model.decoder.layers.2.encoder_attn.v_proj.bias,model.decoder.layers.2.encoder_attn.q_proj.weight,model.decoder.layers.2.encoder_attn.q_proj.bias,model.decoder.layers.2.encoder_attn.out_proj.weight,model.decoder.layers.2.encoder_attn.out_proj.bias,model.decoder.layers.2.encoder_attn_layer_norm.weight,model.decoder.layers.2.encoder_attn_layer_norm.bias,model.decoder.layers.2.fc1.weight,model.decoder.layers.2.fc1.bias,model.decoder.layers.2.fc2.weight,model.decoder.layers.2.fc2.bias,model.decoder.layers.2.final_layer_norm.weight,model.decoder.layers.2.final_layer_norm.bias,model.decoder.layers.3.self_attn.k_proj.weight,model.decoder.layers.3.self_attn.k_proj.bias,model.decoder.layers.3.self_attn.v_proj.weight,model.decoder.layers.3.self_attn.v_proj.bias,model.decoder.layers.3.self_attn.q_proj.weight,model.decoder.layers.3.self_attn.q_proj.bias,model.decoder.layers.3.self_attn.out_proj.weight,model.decoder.layers.3.self_attn.out_proj.bias,model.decoder.layers.3.self_attn_layer_norm.weight,model.decoder.layers.3.self_attn_layer_norm.bias,model.decoder.layers.3.encoder_attn.k_proj.weight,model.decoder.layers.3.encoder_attn.k_proj.bias,model.decoder.layers.3.encoder_attn.v_proj.weight,model.decoder.layers.3.encoder_attn.v_proj.bias,model.decoder.layers.3.encoder_attn.q_proj.weight,model.decoder.layers.3.encoder_attn.q_proj.bias,model.decoder.layers.3.encoder_attn.out_proj.weight,model.decoder.layers.3.encoder_attn.out_proj.bias,model.decoder.layers.3.encoder_attn_layer_norm.weight,model.decoder.layers.3.encoder_attn_layer_norm.bias,model.decoder.layers.3.fc1.weight,model.decoder.layers.3.fc1.bias,model.decoder.layers.3.fc2.weight,model.decoder.layers.3.fc2.bias,model.decoder.layers.3.final_layer_norm.weight,model.decoder.layers.3.final_layer_norm.bias,model.decoder.layers.4.self_attn.k_proj.weight,model.decoder.layers.4.self_attn.k_proj.bias,model.decoder.layers.4.self_attn.v_proj.weight,model.decoder.layers.4.self_attn.v_proj.bias,model.decoder.layers.4.self_attn.q_proj.weight,model.decoder.layers.4.self_attn.q_proj.bias,model.decoder.layers.4.self_attn.out_proj.weight,model.decoder.layers.4.self_attn.out_proj.bias,model.decoder.layers.4.self_attn_layer_norm.weight,model.decoder.layers.4.self_attn_layer_norm.bias,model.decoder.layers.4.encoder_attn.k_proj.weight,model.decoder.layers.4.encoder_attn.k_proj.bias,model.decoder.layers.4.encoder_attn.v_proj.weight,model.decoder.layers.4.encoder_attn.v_proj.bias,model.decoder.layers.4.encoder_attn.q_proj.weight,model.decoder.layers.4.encoder_attn.q_proj.bias,model.decoder.layers.4.encoder_attn.out_proj.weight,model.decoder.layers.4.encoder_attn.out_proj.bias,model.decoder.layers.4.encoder_attn_layer_norm.weight,model.decoder.layers.4.encoder_attn_layer_norm.bias,model.decoder.layers.4.fc1.weight,model.decoder.layers.4.fc1.bias,model.decoder.layers.4.fc2.weight,model.decoder.layers.4.fc2.bias,model.decoder.layers.4.final_layer_norm.weight,model.decoder.layers.4.final_layer_norm.bias,model.decoder.layers.5.self_attn.k_proj.weight,model.decoder.layers.5.self_attn.k_proj.bias,model.decoder.layers.5.self_attn.v_proj.weight,model.decoder.layers.5.self_attn.v_proj.bias,model.decoder.layers.5.self_attn.q_proj.weight,model.decoder.layers.5.self_attn.q_proj.bias,model.decoder.layers.5.self_attn.out_proj.weight,model.decoder.layers.5.self_attn.out_proj.bias,model.decoder.layers.5.self_attn_layer_norm.weight,model.decoder.layers.5.self_attn_layer_norm.bias,model.decoder.layers.5.encoder_attn.k_proj.weight,model.decoder.layers.5.encoder_attn.k_proj.bias,model.decoder.layers.5.encoder_attn.v_proj.weight,model.decoder.layers.5.encoder_attn.v_proj.bias,model.decoder.layers.5.encoder_attn.q_proj.weight,model.decoder.layers.5.encoder_attn.q_proj.bias,model.decoder.layers.5.encoder_attn.out_proj.weight,model.decoder.layers.5.encoder_attn.out_proj.bias,model.decoder.layers.5.encoder_attn_layer_norm.weight,model.decoder.layers.5.encoder_attn_layer_norm.bias,model.decoder.layers.5.fc1.weight,model.decoder.layers.5.fc1.bias,model.decoder.layers.5.fc2.weight,model.decoder.layers.5.fc2.bias,model.decoder.layers.5.final_layer_norm.weight,model.decoder.layers.5.final_layer_norm.bias,model.decoder.layers.6.self_attn.k_proj.weight,model.decoder.layers.6.self_attn.k_proj.bias,model.decoder.layers.6.self_attn.v_proj.weight,model.decoder.layers.6.self_attn.v_proj.bias,model.decoder.layers.6.self_attn.q_proj.weight,model.decoder.layers.6.self_attn.q_proj.bias,model.decoder.layers.6.self_attn.out_proj.weight,model.decoder.layers.6.self_attn.out_proj.bias,model.decoder.layers.6.self_attn_layer_norm.weight,model.decoder.layers.6.self_attn_layer_norm.bias,model.decoder.layers.6.encoder_attn.k_proj.weight,model.decoder.layers.6.encoder_attn.k_proj.bias,model.decoder.layers.6.encoder_attn.v_proj.weight,model.decoder.layers.6.encoder_attn.v_proj.bias,model.decoder.layers.6.encoder_attn.q_proj.weight,model.decoder.layers.6.encoder_attn.q_proj.bias,model.decoder.layers.6.encoder_attn.out_proj.weight,model.decoder.layers.6.encoder_attn.out_proj.bias,model.decoder.layers.6.encoder_attn_layer_norm.weight,model.decoder.layers.6.encoder_attn_layer_norm.bias,model.decoder.layers.6.fc1.weight,model.decoder.layers.6.fc1.bias,model.decoder.layers.6.fc2.weight,model.decoder.layers.6.fc2.bias,model.decoder.layers.6.final_layer_norm.weight,model.decoder.layers.6.final_layer_norm.bias,model.decoder.layers.7.self_attn.k_proj.weight,model.decoder.layers.7.self_attn.k_proj.bias,model.decoder.layers.7.self_attn.v_proj.weight,model.decoder.layers.7.self_attn.v_proj.bias,model.decoder.layers.7.self_attn.q_proj.weight,model.decoder.layers.7.self_attn.q_proj.bias,model.decoder.layers.7.self_attn.out_proj.weight,model.decoder.layers.7.self_attn.out_proj.bias,model.decoder.layers.7.self_attn_layer_norm.weight,model.decoder.layers.7.self_attn_layer_norm.bias,model.decoder.layers.7.encoder_attn.k_proj.weight,model.decoder.layers.7.encoder_attn.k_proj.bias,model.decoder.layers.7.encoder_attn.v_proj.weight,model.decoder.layers.7.encoder_attn.v_proj.bias,model.decoder.layers.7.encoder_attn.q_proj.weight,model.decoder.layers.7.encoder_attn.q_proj.bias,model.decoder.layers.7.encoder_attn.out_proj.weight,model.decoder.layers.7.encoder_attn.out_proj.bias,model.decoder.layers.7.encoder_attn_layer_norm.weight,model.decoder.layers.7.encoder_attn_layer_norm.bias,model.decoder.layers.7.fc1.weight,model.decoder.layers.7.fc1.bias,model.decoder.layers.7.fc2.weight,model.decoder.layers.7.fc2.bias,model.decoder.layers.7.final_layer_norm.weight,model.decoder.layers.7.final_layer_norm.bias,model.decoder.layers.8.self_attn.k_proj.weight,model.decoder.layers.8.self_attn.k_proj.bias,model.decoder.layers.8.self_attn.v_proj.weight,model.decoder.layers.8.self_attn.v_proj.bias,model.decoder.layers.8.self_attn.q_proj.weight,model.decoder.layers.8.self_attn.q_proj.bias,model.decoder.layers.8.self_attn.out_proj.weight,model.decoder.layers.8.self_attn.out_proj.bias,model.decoder.layers.8.self_attn_layer_norm.weight,model.decoder.layers.8.self_attn_layer_norm.bias,model.decoder.layers.8.encoder_attn.k_proj.weight,model.decoder.layers.8.encoder_attn.k_proj.bias,model.decoder.layers.8.encoder_attn.v_proj.weight,model.decoder.layers.8.encoder_attn.v_proj.bias,model.decoder.layers.8.encoder_attn.q_proj.weight,model.decoder.layers.8.encoder_attn.q_proj.bias,model.decoder.layers.8.encoder_attn.out_proj.weight,model.decoder.layers.8.encoder_attn.out_proj.bias,model.decoder.layers.8.encoder_attn_layer_norm.weight,model.decoder.layers.8.encoder_attn_layer_norm.bias,model.decoder.layers.8.fc1.weight,model.decoder.layers.8.fc1.bias,model.decoder.layers.8.fc2.weight,model.decoder.layers.8.fc2.bias,model.decoder.layers.8.final_layer_norm.weight,model.decoder.layers.8.final_layer_norm.bias,model.decoder.layers.9.self_attn.k_proj.weight,model.decoder.layers.9.self_attn.k_proj.bias,model.decoder.layers.9.self_attn.v_proj.weight,model.decoder.layers.9.self_attn.v_proj.bias,model.decoder.layers.9.self_attn.q_proj.weight,model.decoder.layers.9.self_attn.q_proj.bias,model.decoder.layers.9.self_attn.out_proj.weight,model.decoder.layers.9.self_attn.out_proj.bias,model.decoder.layers.9.self_attn_layer_norm.weight,model.decoder.layers.9.self_attn_layer_norm.bias,model.decoder.layers.9.encoder_attn.k_proj.weight,model.decoder.layers.9.encoder_attn.k_proj.bias,model.decoder.layers.9.encoder_attn.v_proj.weight,model.decoder.layers.9.encoder_attn.v_proj.bias,model.decoder.layers.9.encoder_attn.q_proj.weight,model.decoder.layers.9.encoder_attn.q_proj.bias,model.decoder.layers.9.encoder_attn.out_proj.weight,model.decoder.layers.9.encoder_attn.out_proj.bias,model.decoder.layers.9.encoder_attn_layer_norm.weight,model.decoder.layers.9.encoder_attn_layer_norm.bias,model.decoder.layers.9.fc1.weight,model.decoder.layers.9.fc1.bias,model.decoder.layers.9.fc2.weight,model.decoder.layers.9.fc2.bias,model.decoder.layers.9.final_layer_norm.weight,model.decoder.layers.9.final_layer_norm.bias,model.decoder.layers.10.self_attn.k_proj.weight,model.decoder.layers.10.self_attn.k_proj.bias,model.decoder.layers.10.self_attn.v_proj.weight,model.decoder.layers.10.self_attn.v_proj.bias,model.decoder.layers.10.self_attn.q_proj.weight,model.decoder.layers.10.self_attn.q_proj.bias,model.decoder.layers.10.self_attn.out_proj.weight,model.decoder.layers.10.self_attn.out_proj.bias,model.decoder.layers.10.self_attn_layer_norm.weight,model.decoder.layers.10.self_attn_layer_norm.bias,model.decoder.layers.10.encoder_attn.k_proj.weight,model.decoder.layers.10.encoder_attn.k_proj.bias,model.decoder.layers.10.encoder_attn.v_proj.weight,model.decoder.layers.10.encoder_attn.v_proj.bias,model.decoder.layers.10.encoder_attn.q_proj.weight,model.decoder.layers.10.encoder_attn.q_proj.bias,model.decoder.layers.10.encoder_attn.out_proj.weight,model.decoder.layers.10.encoder_attn.out_proj.bias,model.decoder.layers.10.encoder_attn_layer_norm.weight,model.decoder.layers.10.encoder_attn_layer_norm.bias,model.decoder.layers.10.fc1.weight,model.decoder.layers.10.fc1.bias,model.decoder.layers.10.fc2.weight,model.decoder.layers.10.fc2.bias,model.decoder.layers.10.final_layer_norm.weight,model.decoder.layers.10.final_layer_norm.bias,model.decoder.layers.11.self_attn.k_proj.weight,model.decoder.layers.11.self_attn.k_proj.bias,model.decoder.layers.11.self_attn.v_proj.weight,model.decoder.layers.11.self_attn.v_proj.bias,model.decoder.layers.11.self_attn.q_proj.weight,model.decoder.layers.11.self_attn.q_proj.bias,model.decoder.layers.11.self_attn.out_proj.weight,model.decoder.layers.11.self_attn.out_proj.bias,model.decoder.layers.11.self_attn_layer_norm.weight,model.decoder.layers.11.self_attn_layer_norm.bias,model.decoder.layers.11.encoder_attn.k_proj.weight,model.decoder.layers.11.encoder_attn.k_proj.bias,model.decoder.layers.11.encoder_attn.v_proj.weight,model.decoder.layers.11.encoder_attn.v_proj.bias,model.decoder.layers.11.encoder_attn.q_proj.weight,model.decoder.layers.11.encoder_attn.q_proj.bias,model.decoder.layers.11.encoder_attn.out_proj.weight,model.decoder.layers.11.encoder_attn.out_proj.bias,model.decoder.layers.11.encoder_attn_layer_norm.weight,model.decoder.layers.11.encoder_attn_layer_norm.bias,model.decoder.layers.11.fc1.weight,model.decoder.layers.11.fc1.bias,model.decoder.layers.11.fc2.weight,model.decoder.layers.11.fc2.bias,model.decoder.layers.11.final_layer_norm.weight,model.decoder.layers.11.final_layer_norm.bias,model.decoder.layers.12.self_attn.k_proj.weight,model.decoder.layers.12.self_attn.k_proj.bias,model.decoder.layers.12.self_attn.v_proj.weight,model.decoder.layers.12.self_attn.v_proj.bias,model.decoder.layers.12.self_attn.q_proj.weight,model.decoder.layers.12.self_attn.q_proj.bias,model.decoder.layers.12.self_attn.out_proj.weight,model.decoder.layers.12.self_attn.out_proj.bias,model.decoder.layers.12.self_attn_layer_norm.weight,model.decoder.layers.12.self_attn_layer_norm.bias,model.decoder.layers.12.encoder_attn.k_proj.weight,model.decoder.layers.12.encoder_attn.k_proj.bias,model.decoder.layers.12.encoder_attn.v_proj.weight,model.decoder.layers.12.encoder_attn.v_proj.bias,model.decoder.layers.12.encoder_attn.q_proj.weight,model.decoder.layers.12.encoder_attn.q_proj.bias,model.decoder.layers.12.encoder_attn.out_proj.weight,model.decoder.layers.12.encoder_attn.out_proj.bias,model.decoder.layers.12.encoder_attn_layer_norm.weight,model.decoder.layers.12.encoder_attn_layer_norm.bias,model.decoder.layers.12.fc1.weight,model.decoder.layers.12.fc1.bias,model.decoder.layers.12.fc2.weight,model.decoder.layers.12.fc2.bias,model.decoder.layers.12.final_layer_norm.weight,model.decoder.layers.12.final_layer_norm.bias,model.decoder.layers.13.self_attn.k_proj.weight,model.decoder.layers.13.self_attn.k_proj.bias,model.decoder.layers.13.self_attn.v_proj.weight,model.decoder.layers.13.self_attn.v_proj.bias,model.decoder.layers.13.self_attn.q_proj.weight,model.decoder.layers.13.self_attn.q_proj.bias,model.decoder.layers.13.self_attn.out_proj.weight,model.decoder.layers.13.self_attn.out_proj.bias,model.decoder.layers.13.self_attn_layer_norm.weight,model.decoder.layers.13.self_attn_layer_norm.bias,model.decoder.layers.13.encoder_attn.k_proj.weight,model.decoder.layers.13.encoder_attn.k_proj.bias,model.decoder.layers.13.encoder_attn.v_proj.weight,model.decoder.layers.13.encoder_attn.v_proj.bias,model.decoder.layers.13.encoder_attn.q_proj.weight,model.decoder.layers.13.encoder_attn.q_proj.bias,model.decoder.layers.13.encoder_attn.out_proj.weight,model.decoder.layers.13.encoder_attn.out_proj.bias,model.decoder.layers.13.encoder_attn_layer_norm.weight,model.decoder.layers.13.encoder_attn_layer_norm.bias,model.decoder.layers.13.fc1.weight,model.decoder.layers.13.fc1.bias,model.decoder.layers.13.fc2.weight,model.decoder.layers.13.fc2.bias,model.decoder.layers.13.final_layer_norm.weight,model.decoder.layers.13.final_layer_norm.bias,model.decoder.layers.14.self_attn.k_proj.weight,model.decoder.layers.14.self_attn.k_proj.bias,model.decoder.layers.14.self_attn.v_proj.weight,model.decoder.layers.14.self_attn.v_proj.bias,model.decoder.layers.14.self_attn.q_proj.weight,model.decoder.layers.14.self_attn.q_proj.bias,model.decoder.layers.14.self_attn.out_proj.weight,model.decoder.layers.14.self_attn.out_proj.bias,model.decoder.layers.14.self_attn_layer_norm.weight,model.decoder.layers.14.self_attn_layer_norm.bias,model.decoder.layers.14.encoder_attn.k_proj.weight,model.decoder.layers.14.encoder_attn.k_proj.bias,model.decoder.layers.14.encoder_attn.v_proj.weight,model.decoder.layers.14.encoder_attn.v_proj.bias,model.decoder.layers.14.encoder_attn.q_proj.weight,model.decoder.layers.14.encoder_attn.q_proj.bias,model.decoder.layers.14.encoder_attn.out_proj.weight,model.decoder.layers.14.encoder_attn.out_proj.bias,model.decoder.layers.14.encoder_attn_layer_norm.weight,model.decoder.layers.14.encoder_attn_layer_norm.bias,model.decoder.layers.14.fc1.weight,model.decoder.layers.14.fc1.bias,model.decoder.layers.14.fc2.weight,model.decoder.layers.14.fc2.bias,model.decoder.layers.14.final_layer_norm.weight,model.decoder.layers.14.final_layer_norm.bias,model.decoder.layers.15.self_attn.k_proj.weight,model.decoder.layers.15.self_attn.k_proj.bias,model.decoder.layers.15.self_attn.v_proj.weight,model.decoder.layers.15.self_attn.v_proj.bias,model.decoder.layers.15.self_attn.q_proj.weight,model.decoder.layers.15.self_attn.q_proj.bias,model.decoder.layers.15.self_attn.out_proj.weight,model.decoder.layers.15.self_attn.out_proj.bias,model.decoder.layers.15.self_attn_layer_norm.weight,model.decoder.layers.15.self_attn_layer_norm.bias,model.decoder.layers.15.encoder_attn.k_proj.weight,model.decoder.layers.15.encoder_attn.k_proj.bias,model.decoder.layers.15.encoder_attn.v_proj.weight,model.decoder.layers.15.encoder_attn.v_proj.bias,model.decoder.layers.15.encoder_attn.q_proj.weight,model.decoder.layers.15.encoder_attn.q_proj.bias,model.decoder.layers.15.encoder_attn.out_proj.weight,model.decoder.layers.15.encoder_attn.out_proj.bias,model.decoder.layers.15.encoder_attn_layer_norm.weight,model.decoder.layers.15.encoder_attn_layer_norm.bias,model.decoder.layers.15.fc1.weight,model.decoder.layers.15.fc1.bias,model.decoder.layers.15.fc2.weight,model.decoder.layers.15.fc2.bias,model.decoder.layers.15.final_layer_norm.weight,model.decoder.layers.15.final_layer_norm.bias,model.decoder.layer_norm.weight,model.decoder.layer_norm.bias,lm_head.weight].
All weights are initialized.
[2022-08-25 11:31:37,255 INFO] Using SimplePredict to predict...
22it [27:18, 74.47s/it]
print('Labeled samples:')
! tail -n 1 en_dev.tsv
print('\n\nPredicted results:')
! tail -n 1 en.preds.txt
Labeled samples:
Papa John's fell short of Wall Street's earnings estimates, but demand for its pizza remains high during the coronavirus pandemic. Worldwide, its same-store sales surged 15.5% in the quarter. Employee bonuses and higher commodity costs weighed on its profits in the latest period.  Papa John's on Thursday reported quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. The company's stock fell more than 7% in premarket trading. Here's what the company reported compared with what Wall Street was expecting, based on a survey of analysts by Refinitiv: The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million, or 18 cents per share, a year earlier. It spent $6 million, or 12 cents per share, in the fourth quarter on its strategic reorganization, including opening a Georgia office. The company also paid out $2.7 million in end-of-year bonuses for its restaurant workers, shaving off 6 cents per share. Increased commodity costs also hit profits during the quarter. Excluding reorganization costs, Papa John's earned 40 cents per share, missing the 46 cents per share expected by analysts surveyed by Refinitiv. Net sales  rose 12.5% to $469.8 million, beating expectations of $467.9 million. Worldwide, its same-store sales surged 15.5% in the quarter. North American same-store sales increased by 13.5%. Papa John's also raked in higher royalties from its franchisees because its operator assistance program, which began in the wake of the scandal that involved founder John Schnatter . International same-store sales climbed 21.4% in the quarter. Papa John's opened 40 net new locations, primarily due to international openings. As of Dec. 27, about 65 of the company's 5,400 locations were temporarily closed due to government restrictions, primarily in Latin America and Europe. The company also shared an update on its plans to open an office in Atlanta, saying it's on track to open by summer. Papa John's expects to spend $15 million to $20 million through 2021 related to the costs of adding the office, including employee severance, recruitment and relocation. Papa John's declined to provide an outlook for its financial targets during 2021, citing the uncertainty caused by the pandemic. Also Thursday, Domino's Pizza reported quarterly earnings that missed estimates .
Predicted results:
quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. Papa John's stock fell more than 7% in premarket trading. The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million a year earlier.  quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. Papa John's stock fell more than 7% in premarket trading. The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million a year earlier.||quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. Papa John's stock fell more than 7% in premarket trading. The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million.||quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. Papa John's stock fell more than 7% in premarket trading. The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million in the fourth quarter of last year.||quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. Papa John's stock fell more than 7% in premarket trading. The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million last year.||quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. Papa John's stock fell more than 7% in premarket trading. The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million in the same period a year earlier. Papa John's fell short of Wall Street's earnings estimates, but demand for its pizza remains high during the coronavirus pandemic. Worldwide, its same-store sales surged 15.5% in the quarter. Employee bonuses and higher commodity costs weighed on its profits in the latest period.  Papa John's on Thursday reported quarterly earnings that missed estimates as higher food costs, a new corporate office and employee bonuses weighed on profits despite high demand for its pizza during the pandemic. The company's stock fell more than 7% in premarket trading. Here's what the company reported compared with what Wall Street was expecting, based on a survey of analysts by Refinitiv: The pizza chain reported fiscal fourth-quarter net income of $13.2 million, or 28 cents per share, up from a net loss of $2.1 million, or 18 cents per share, a year earlier. It spent $6 million, or 12 cents per share, in the fourth quarter on its strategic reorganization, including opening a Georgia office. The company also paid out $2.7 million in end-of-year bonuses for its restaurant workers, shaving off 6 cents per share. Increased commodity costs also hit profits during the quarter. Excluding reorganization costs, Papa John's earned 40 cents per share, missing the 46 cents per share expected by analysts surveyed by Refinitiv. Net sales  rose 12.5% to $469.8 million, beating expectations of $467.9 million. Worldwide, its same-store sales surged 15.5% in the quarter. North American same-store sales increased by 13.5%. Papa John's also raked in higher royalties from its franchisees because its operator assistance program, which began in the wake of the scandal that involved founder John Schnatter . International same-store sales climbed 21.4% in the quarter. Papa John's opened 40 net new locations, primarily due to international openings. As of Dec. 27, about 65 of the company's 5,400 locations were temporarily closed due to government restrictions, primarily in Latin America and Europe. The company also shared an update on its plans to open an office in Atlanta, saying it's on track to open by summer. Papa John's expects to spend $15 million to $20 million through 2021 related to the costs of adding the office, including employee severance, recruitment and relocation. Papa John's declined to provide an outlook for its financial targets during 2021, citing the uncertainty caused by the pandemic. Also Thursday, Domino's Pizza reported quarterly earnings that missed estimates .

上面展示了数据集中的1条数据以及经过训练以后模型的预测结果。预测的标题为第一列,第二列为集束搜索(beam search)的5条结果,以||隔开。预测结果同时还包含原始新闻原文,相互之间以\t隔开。

一步执行

值得一提的是,上述所有训练/评估/预测代码,都已经被集成在EasyNLP/examples/appzoo_tutorials/sequence_generation/main.py中,此外,我们也预先编写好了多种可供直接执行的脚本。用户可以通过带参数运行上述main.py文件,或者直接执行脚本文件run_user_defined_local_en.sh的方式,一步执行上述所有训练/评估/预测操作。

mian.py文件一步执行

用户通过以下代码带参数执行main.py中的指令,可直接对模型进行训练/评估/预测操作。 训练代码指令如下。具体的参数解释可见上文,此处不再赘述。

模型训练代码如下:

! python main.py \
    --mode train \
    --app_name=sequence_generation \
    --worker_gpu=1 \
    --tables=./en_train.tsv,./en_dev.tsv  \
    --input_schema=title:str:1,content:str:1 \
    --first_sequence=content \
    --second_sequence=title \
    --label_name=title \
    --checkpoint_dir=./finetuned_en_model \
    --micro_batch_size=1 \
    --sequence_length=512 \
    --epoch_num 1 \
    --save_checkpoint_steps=500 \
    --export_tf_checkpoint_type none \
    --user_defined_parameters 'language=en pretrain_model_name_or_path=alibaba-pai/pegasus-summary-generation-en copy=false max_encoder_length=512 min_decoder_length=64 max_decoder_length=128 no_repeat_ngram_size=2 num_beams=5 num_return_sequences=5'

模型评估代码如下:

! python main.py \
    --mode=evaluate \
    --app_name=sequence_generation \
    --worker_gpu=1 \
    --tables=./en_dev.tsv  \
    --input_schema=title:str:1,content:str:1 \
    --output_schema=predictions,beams \
    --append_cols=title,content \
    --first_sequence=content \
    --second_sequence=title \
    --checkpoint_dir=./finetuned_en_model \
    --micro_batch_size 32 \
    --sequence_length 512 \
    --user_defined_parameters 'language=en copy=false max_encoder_length=512 min_decoder_length=64 max_decoder_length=128 no_repeat_ngram_size=2 num_beams=5 num_return_sequences=5'

模型预测代码如下:

! python main.py \
    --mode=predict \
    --app_name=sequence_generation \
    --worker_gpu=1 \
    --tables=./en_dev.tsv  \
    --outputs=./en.preds.txt \
    --input_schema=title:str:1,content:str:1 \
    --output_schema=predictions,beams \
    --append_cols=title,content \
    --first_sequence=content \
    --checkpoint_dir=./finetuned_en_model \
    --micro_batch_size 32 \
    --sequence_length 512 \
    --user_defined_parameters 'language=en copy=false max_encoder_length=512 min_decoder_length=64 max_decoder_length=128 no_repeat_ngram_size=2 num_beams=5 num_return_sequences=5'

利用bash文件命令行执行

我们在EasyNLP/examples/appzoo_tutorials/sequence_generation文件夹下封装好了多种可直接执行的脚本,用户同样可以通过带参数执行脚本文件的方式来一步完成模型的训练/评估/预测。以下以run_user_defined_local_en.sh脚本为例。该脚本文件需要传入两个参数,第一个参数为运行程序的GPU编号,一般为0;第二个参数代表模型的训练/评估/预测。

模型训练:

! bash examples/appzoo_tutorials/sequence_generation/run_user_defined_local_en.sh 0 train

模型评估:

! bash examples/appzoo_tutorials/sequence_generation/run_user_defined_local_en.sh 0 evaluate

模型预测:

! bash examples/appzoo_tutorials/sequence_generation/run_user_defined_local_en.sh 0 predict
相关实践学习
使用PAI-EAS一键部署ChatGLM及LangChain应用
本场景中主要介绍如何使用模型在线服务(PAI-EAS)部署ChatGLM的AI-Web应用以及启动WebUI进行模型推理,并通过LangChain集成自己的业务数据。
机器学习概览及常见算法
机器学习(Machine Learning, ML)是人工智能的核心,专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能,它是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。 本课程将带你入门机器学习,掌握机器学习的概念和常用的算法。
相关文章
|
机器学习/深度学习 JSON 自然语言处理
bert中文文本摘要代码(2)
bert中文文本摘要代码(2)
328 0
|
4月前
|
机器学习/深度学习 IDE 开发工具
ARTIST的中文文图生成模型问题之什么是PAI-DSW
ARTIST的中文文图生成模型问题之什么是PAI-DSW
|
存储 自然语言处理 PyTorch
bert中文文本摘要代码(1)
bert中文文本摘要代码(1)
146 0
|
存储 自然语言处理 并行计算
bert中文文本摘要代码(3)
bert中文文本摘要代码(3)
137 0
bert中文文本摘要代码(3)
|
机器学习/深度学习 人工智能 自然语言处理
【DSW Gallery】基于EasyNLP的中文信息抽取
EasyNLP提供多种模型的训练及预测功能,旨在帮助自然语言开发者方便快捷地构建模型并应用于生产。本文以中文信息抽取为例,为您介绍如何在PAI-DSW中基于EasyNLP快速使用K-Global Pointer算法进行中文信息抽取模型的训练、评估、推理。
【DSW Gallery】基于EasyNLP的中文信息抽取
|
缓存 自然语言处理 算法
【DSW Gallery】基于EasyNLP Transformer模型的中文文图生成
EasyNLP提供多种模型的训练及预测功能,旨在帮助自然语言开发者方便快捷地构建模型并应用于生产。本文简要介绍文图生成的技术,以及如何在PAI-DSW中基于EasyNLP轻松实现文图生成,带你秒变艺术家。
【DSW Gallery】基于EasyNLP Transformer模型的中文文图生成
|
缓存 自然语言处理 Shell
【DSW Gallery】基于CK-BERT的中文序列标注
EasyNLP提供多种模型的训练及预测功能,旨在帮助自然语言开发者方便快捷地构建模型并应用于生产。本文以序列标注(命名实体识别)为例,为您介绍如何在PAI-DSW中使用EasyNLP。
【DSW Gallery】基于CK-BERT的中文序列标注
|
缓存 自然语言处理 Shell
【DSW Gallery】基于预训练模型的多场景文本生成(以新闻标题生成为例)
EasyNLP提供多种模型的训练及预测功能,旨在帮助自然语言开发者方便快捷地构建模型并应用于生产。本文以中文文本生成为例,为您介绍如何在PAI-DSW中使用EasyNLP。
【DSW Gallery】基于预训练模型的多场景文本生成(以新闻标题生成为例)
|
算法 PyTorch 算法框架/工具
【DSW Gallery】基于EasyCV的视频分类示例
EasyCV是基于Pytorch,以自监督学习和Transformer技术为核心的 all-in-one 视觉算法建模工具,并包含图像分类,度量学习,目标检测,姿态识别等视觉任务的SOTA算法。本文以视频分类为例,为您介绍如何在PAI-DSW中使用EasyCV。
【DSW Gallery】基于EasyCV的视频分类示例
|
机器学习/深度学习 人工智能 编解码
【DSW Gallery】基于EasyNLP-Diffusion模型的中文文图生成
EasyNLP提供多种模型的训练及预测功能,旨在帮助自然语言开发者方便快捷地构建模型并应用于生产。本文简要介绍文图生成的技术,以及如何在PAI-DSW中基于EasyNLP使用diffusion model进行finetune和预测评估。
【DSW Gallery】基于EasyNLP-Diffusion模型的中文文图生成