同时我还创建了检查点,可以在训练期间保存最佳模型。当需要从停下来的地方继续训练时,这将有助于减少训练时间。创建检查点可以节省时间,以便从头开始进行重新训练。如果您对从最佳模型生成的输出感到满意,则不需要进一步的微调,则可以使用模型进行推断。
defload_ckp(checkpoint_fpath, model, optimizer): """checkpoint_path: path to save checkpointmodel: model that we want to load checkpoint parameters into optimizer: optimizer we defined in previous training"""#loadcheckpointcheckpoint=torch.load(checkpoint_fpath) #initializestate_dictfromcheckpointtomodelmodel.load_state_dict(checkpoint['state_dict']) #initializeoptimizerfromcheckpointtooptimizeroptimizer.load_state_dict(checkpoint['optimizer']) #initializevalid_loss_minfromcheckpointtovalid_loss_minvalid_loss_min=checkpoint['valid_loss_min'] #returnmodel, optimizer, epochvalue, minvalidationlossreturnmodel, optimizer, checkpoint['epoch'], valid_loss_min.item() importshutil, sysdefsave_ckp(state, is_best, checkpoint_path, best_model_path): """state: checkpoint we want to saveis_best: is this the best checkpoint; min validation losscheckpoint_path: path to save checkpointbest_model_path: path to save best model"""f_path=checkpoint_path#savecheckpointdatatothepathgiven, checkpoint_pathtorch.save(state, f_path) #ifitisabestmodel, minvalidationlossifis_best: best_fpath=best_model_path#copythatcheckpointfiletobestpathgiven, best_model_pathshutil.copyfile(f_path, best_fpath) deftrain_model(start_epochs, n_epochs, valid_loss_min_input, training_loader, validation_loader, model, optimizer, checkpoint_path, best_model_path): #initializetrackerforminimumvalidationlossvalid_loss_min=valid_loss_min_inputforepochinrange(start_epochs, n_epochs+1): train_loss=0valid_loss=0model.train() print('############# Epoch {}: Training Start #############'.format(epoch)) forbatch_idx, datainenumerate(training_loader): #print('yyy epoch', batch_idx) ids=data['ids'].to(device, dtype=torch.long) mask=data['mask'].to(device, dtype=torch.long) token_type_ids=data['token_type_ids'].to(device, dtype=torch.long) targets=data['targets'].to(device, dtype=torch.float) outputs=model(ids, mask, token_type_ids) optimizer.zero_grad() loss=loss_fn(outputs, targets) #ifbatch_idx%5000==0: #print(f'Epoch: {epoch}, Training Loss: {loss.item()}') optimizer.zero_grad() loss.backward() optimizer.step() #print('before loss data in training', loss.item(), train_loss) train_loss=train_loss+ ((1/ (batch_idx+1)) * (loss.item() -train_loss)) #print('after loss data in training', loss.item(), train_loss) print('############# Epoch {}: Training End #############'.format(epoch)) print('############# Epoch {}: Validation Start #############'.format(epoch)) #######################validatethemodel#######################model.eval() withtorch.no_grad(): forbatch_idx, datainenumerate(validation_loader, 0): ids=data['ids'].to(device, dtype=torch.long) mask=data['mask'].to(device, dtype=torch.long) token_type_ids=data['token_type_ids'].to(device, dtype=torch.long) targets=data['targets'].to(device, dtype=torch.float) outputs=model(ids, mask, token_type_ids) loss=loss_fn(outputs, targets) valid_loss=valid_loss+ ((1/ (batch_idx+1)) * (loss.item() -valid_loss)) val_targets.extend(targets.cpu().detach().numpy().tolist()) val_outputs.extend(torch.sigmoid(outputs).cpu().detach().numpy().tolist()) print('############# Epoch {}: Validation End #############'.format(epoch)) #calculateaveragelosses#print('before cal avg train loss', train_loss) train_loss=train_loss/len(training_loader) valid_loss=valid_loss/len(validation_loader) #printtraining/validationstatisticsprint('Epoch: {} \tAvgerage Training Loss: {:.6f} \tAverage Validation Loss: {:.6f}'.format( epoch, train_loss, valid_loss )) #createcheckpointvariableandaddimportantdatacheckpoint= { 'epoch': epoch+1, 'valid_loss_min': valid_loss, 'state_dict': model.state_dict(), 'optimizer': optimizer.state_dict() } #savecheckpointsave_ckp(checkpoint, False, checkpoint_path, best_model_path) ##TODO: savethemodelifvalidationlosshasdecreasedifvalid_loss<=valid_loss_min: print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(valid_loss_min,valid_loss)) #savecheckpointasbestmodelsave_ckp(checkpoint, True, checkpoint_path, best_model_path) valid_loss_min=valid_lossprint('############# Epoch {} Done #############\n'.format(epoch)) returnmodel
“train_model”被创建来训练模型,“checkpoint_path”是训练模型的参数将被保存为每个epoch,“best_model”是最好的模型将被保存的地方。
checkpoint_path='/content/drive/My Drive/NLP/ResearchArticlesClassification/checkpoint/current_checkpoint.pt'best_model='/content/drive/My Drive/NLP/ResearchArticlesClassification/best_model/best_model.pt'trained_model=train_model(1, 4, np.Inf, training_loader, validation_loader, model, optimizer,checkpoint_path,best_model)
训练结果如下:
#############Epoch1: TrainingStart##########################Epoch1: TrainingEnd##########################Epoch1: ValidationStart##########################Epoch1: ValidationEnd#############Epoch: 1AvgerageTrainingLoss: 0.000347AverageValidationLoss: 0.001765Validationlossdecreased (inf-->0.001765). Savingmodel ... #############Epoch1Done##########################Epoch2: TrainingStart##########################Epoch2: TrainingEnd##########################Epoch2: ValidationStart##########################Epoch2: ValidationEnd#############Epoch: 2AvgerageTrainingLoss: 0.000301AverageValidationLoss: 0.001831#############Epoch2Done##########################Epoch3: TrainingStart##########################Epoch3: TrainingEnd##########################Epoch3: ValidationStart##########################Epoch3: ValidationEnd#############Epoch: 3AvgerageTrainingLoss: 0.000263AverageValidationLoss: 0.001896#############Epoch3Done##########################Epoch4: TrainingStart##########################Epoch4: TrainingEnd##########################Epoch4: ValidationStart##########################Epoch4: ValidationEnd#############Epoch: 4AvgerageTrainingLoss: 0.000228AverageValidationLoss: 0.002048#############Epoch4Done#############
因为我只执行了4个epoch,所以完成得很快,我将threshold设置为0.5。你可以试试这个阈值,看看是否能提高结果。
val_preds= (np.array(val_outputs) >0.5).astype(int) val_predsarray([[0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], ..., [0, 0, 1, 0, 0, 0], [0, 1, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0]])
让我们将精确度和F1得分定义为模型性能的指标。F1将被用于评估。
accuracy=metrics.accuracy_score(val_targets, val_preds) f1_score_micro=metrics.f1_score(val_targets, val_preds, average='micro') f1_score_macro=metrics.f1_score(val_targets, val_preds, average='macro') print(f"Accuracy Score = {accuracy}") print(f"F1 Score (Micro) = {f1_score_micro}") print(f"F1 Score (Macro) = {f1_score_macro}")
使用混淆矩阵和分类报告,以可视化我们的模型如何正确/不正确地预测每个单独的目标。
fromsklearn.metricsimportmultilabel_confusion_matrixasmcm, classification_reportcm_labels= ['Computer Science', 'Physics', 'Mathematics', 'Statistics', 'Quantitative Biology', 'Quantitative Finance'] cm=mcm(val_targets, val_preds) print(classification_report(val_targets, val_preds))
模型预测的准确率为76%。F1得分低的原因是有六个类的预测,通过结合“TITLE”和“ABSTRACT”或者只使用“ABSTRACT”来训练可以提高它。我对这两个案例都进行了训练,发现“ABSTRACT”特征本身的F1分数比标题和标题与抽象相结合要好得多。在没有进行超参数优化的情况下,我使用测试数据进行推理,并在private score中获得0.82分。
有一些事情可以做,以提高F1成绩。一个是微调模型的超参数,你可能想要实验改变学习速率,退出率和时代的数量。在对模型微调的结果满意之后,我们可以使用整个训练数据集,而不是分成训练和验证集,因为训练模型已经看到了所有可能的场景,使模型更好地执行。
你可以在谷歌Colab查看这个项目源代码
https://colab.research.google.com/drive/1SPxxEW9okgnbMdk1ORlfSQI4rjV2tVW_#scrollTo=EJQRHd7VVMap