1. datasets包
datasets包的官方GitHub项目:huggingface/datasets: 🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
datasets包的建构参考了TFDS项目:tensorflow/datasets: TFDS is a collection of datasets ready to use with TensorFlow, Jax, …
1.1 datasets包的安装
pip install datasets
1.2 datasets简易入门
- 所有可用的数据集:all_available_datasets=datasets.list_datasets()
输出:['assin', 'ar_res_reviews', 'ambig_qa', 'bianet', 'ag_news']
- 加载数据集:datasets.load_dataset(dataset_name, **kwargs)
- 所有可用的指标:datasets.list_metrics()
- 加载指标:datasets.load_metric(metric_name, **kwargs)
- 查看数据集:
6. 获取数据集中的样本:dataset['train'][123456]
7. 查看数据词典:dataset['train'].features
1.3 Yelp Reviews数据集的加载和预处理
数据集在huggingface上的官方网址:yelp_review_full · Datasets at Hugging Face
提取自Yelp Dataset Challenge 2015数据集。出自该论文:Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015)
from datasets import load_dataset dataset = load_dataset("yelp_review_full") dataset["train"][100]
{'label': 0, 'text': 'My expectations for McDonalds are t rarely high. But for one to still fail so spectacularly...that takes something special!\\nThe cashier took my friends\'s order, then promptly ignored me. I had to force myself in front of a cashier who opened his register to wait on the person BEHIND me. I waited over five minutes for a gigantic order that included precisely one kid\'s meal. After watching two people who ordered after me be handed their food, I asked where mine was. The manager started yelling at the cashiers for \\"serving off their orders\\" when they didn\'t have their food. But neither cashier was anywhere near those controls, and the manager was the one serving food to customers and clearing the boards.\\nThe manager was rude when giving me my order. She didn\'t make sure that I had everything ON MY RECEIPT, and never even had the decency to apologize that I felt I was getting poor service.\\nI\'ve eaten at various McDonalds restaurants for over 30 years. I\'ve worked at more than one location. I expect bad days, bad moods, and the occasional mistake. But I have yet to have a decent experience at this store. It will remain a place I avoid unless someone in my party needs to avoid illness from low blood sugar. Perhaps I should go back to the racially biased service of Steak n Shake instead!'} ———————————————— 版权声明:本文为CSDN博主「诸神缄默不语」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。 原文链接:https://blog.csdn.net/PolarisRisingWar/article/details/123939061
from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("mypath/bert-base-cased") def tokenize_function(examples): return tokenizer(examples["text"],padding="max_length",truncation=True,max_length=512) tokenized_datasets = dataset.map(tokenize_function, batched=True)
Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no padding. Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Traceback (most recent call last): File "mypath/huggingfacedatasets1.py", line 47, in <module> trainer.train() File "myenv/lib/python3.8/site-packages/transformers/trainer.py", line 1396, in train for step, inputs in enumerate(epoch_iterator): File "myenv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in __next__ data = self._next_data() File "myenv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 557, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "myenv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "myenv/lib/python3.8/site-packages/transformers/data/data_collator.py", line 66, in default_data_collator return torch_default_data_collator(features) File "myenv/lib/python3.8/site-packages/transformers/data/data_collator.py", line 130, in torch_default_data_collator batch[k] = torch.tensor([f[k] for f in features]) ValueError: expected sequence of length 72 at dim 1 (got 118)
但是这个实验结果没有在整个的实验上重做,因为我觉得应该没有必要,因为这两个情况是一样的(可以参考我之前撰写的博文huggingface.transformers术语表_诸神缄默不语的博客-CSDN博客2.2节,tokenizer(batch_sentences, padding='max_length', truncation=True, max_length=512)意为所有sequence都固定为512长度,tokenizer(batch_sentences, padding='max_length', truncation=True)意为所有sequence都固定为模型max_length长度,当模型max_length就是512时,两种情况等价)。事实上我觉得手动加max_length入参可能更好,更适宜于控制代码。
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000)) small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
1.4 将自定义数据集转换为datasets的数据集格式
example_dataset=datasets.Dataset.from_dict(example_dict) example_dataset
Dataset({ features: ['label', 'text'], num_rows: 1000 })
2. 使用Trainer(以PyTorch为后端框架)进行微调
2.1 定义分类模型
from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("mypath/bert-base-cased", num_labels=5)
Some weights of the model checkpoint at mypath/bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight'] - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of BertForSequenceClassification were not initialized from the model checkpoint at mypath/bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
(对该输出的解释可参考我之前写的博文:Some weights of the model checkpoint at mypath/bert-base-chinese were not used when initializing Ber_诸神缄默不语的博客-CSDN博客)
from transformers import AutoConfig,AutoModelForSequenceClassification model_path="mypath/bert-base-cased" config=AutoConfig.from_pretrained(model_path,num_labels=5) model=AutoModelForSequenceClassification.from_pretrained(model_path,config=config)
2.2 训练超参数
from transformers import TrainingArguments training_args = TrainingArguments(output_dir="test_trainer")
2.3 指标
accuracy(准确率)指标的huggingface官方网页:Hugging Face – The AI community building the future.
import numpy as np metric=datasets.load_metric("accuracy")
def compute_metrics(eval_pred): logits, labels = eval_pred predictions = np.argmax(logits, axis=-1) return metric.compute(predictions=predictions, references=labels)
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")
2.4 Trainer
from transformers import Trainer trainer = Trainer( model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset, compute_metrics=compute_metrics, )
The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message. myenv/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( ***** Running training ***** Num examples = 1000 Num Epochs = 3 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 32 Gradient Accumulation steps = 1 Total optimization steps = 96 0%| | 0/96 [00:00<?, ?it/s]myenv/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all ' 33%|████████████████████████▎ | 32/96 [00:19<00:23, 2.73it/s]The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message. ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 {'eval_loss': 1.219325304031372, 'eval_accuracy': 0.487, 'eval_runtime': 5.219, 'eval_samples_per_second': 191.609, 'eval_steps_per_second': 6.131, 'epoch': 1.0} 33%|████████████████████████▎ | 32/96 [00:24<00:23, 2.73it/smyenv/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all ' 67%|████████████████████████████████████████████████▋ | 64/96 [00:37<00:11, 2.87it/s]The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message. ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 {'eval_loss': 1.0443027019500732, 'eval_accuracy': 0.57, 'eval_runtime': 5.1937, 'eval_samples_per_second': 192.539, 'eval_steps_per_second': 6.161, 'epoch': 2.0} 67%|████████████████████████████████████████████████▋ | 64/96 [00:42<00:11, 2.87it/smyenv/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all ' 100%|█████████████████████████████████████████████████████████████████████████| 96/96 [00:55<00:00, 2.87it/s]The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message. ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 {'eval_loss': 0.9776290655136108, 'eval_accuracy': 0.598, 'eval_runtime': 5.2137, 'eval_samples_per_second': 191.803, 'eval_steps_per_second': 6.138, 'epoch': 3.0} 100%|█████████████████████████████████████████████████████████████████████████| 96/96 [01:00<00:00, 2.87it/s] Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 60.8009, 'train_samples_per_second': 49.341, 'train_steps_per_second': 1.579, 'train_loss': 1.0931960741678874, 'epoch': 3.0} 100%|█████████████████████████████████████████████████████████████████████████| 96/96 [01:00<00:00, 1.58it/s]
jupyter notebook的输出效果,看起来比脚本输出更清晰一些:
The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message. myenv/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( ***** Running training ***** Num examples = 1000 Num Epochs = 3 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 32 Gradient Accumulation steps = 1 Total optimization steps = 96 myenv/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all '
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message. ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 myenv/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all ' The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message. ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 myenv/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all ' The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message. ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Training completed. Do not forget to share your model on huggingface.co/models =) TrainOutput(global_step=96, training_loss=1.1009167830149333, metrics={'train_runtime': 60.9212, 'train_samples_per_second': 49.244, 'train_steps_per_second': 1.576, 'total_flos': 789354427392000.0, 'train_loss': 1.1009167830149333, 'epoch': 3.0})
The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message. /usr/local/lib/python3.7/dist-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning FutureWarning, ***** Running training ***** Num examples = 1000 Num Epochs = 3 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 8 Gradient Accumulation steps = 1 Total optimization steps = 375
The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message. ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message. ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message. ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Training completed. Do not forget to share your model on huggingface.co/models =) TrainOutput(global_step=375, training_loss=1.2140440266927084, metrics={'train_runtime': 780.671, 'train_samples_per_second': 3.843, 'train_steps_per_second': 0.48, 'total_flos': 789354427392000.0, 'train_loss': 1.2140440266927084, 'epoch': 3.0})
(注意这里还有一点在于torch.nn.parallel的报错,colab运行时没有报错,我怀疑要么是因为colab只有一张卡,要么是因为torch版本的问题(我本地用的是PyTorch 1.8.1,colab是PyTorch 1.10)。但是这玩意不好验证,我就猜猜)
2.5 完整的脚本代码
import datasets import numpy as np from transformers import AutoTokenizer,AutoModelForSequenceClassification,TrainingArguments,Trainer dataset=datasets.load_from_disk("datasets/yelp_full_review_disk") tokenizer = AutoTokenizer.from_pretrained("pretrained_models/bert-base-cased") def tokenize_function(examples): return tokenizer(examples["text"],padding="max_length",truncation=True,max_length=512) tokenized_datasets = dataset.map(tokenize_function, batched=True) small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000)) small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000)) model = AutoModelForSequenceClassification.from_pretrained("pretrained_models/bert-base-cased", num_labels=5) training_args = TrainingArguments(output_dir="pt_save_pretrained",evaluation_strategy="epoch") metric=datasets.load_metric('datasets/accuracy.py') def compute_metrics(eval_pred): logits, labels = eval_pred predictions = np.argmax(logits, axis=-1) return metric.compute(predictions=predictions, references=labels) trainer = Trainer( model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset, compute_metrics=compute_metrics, ) trainer.train()
3. 使用原生PyTorch进行微调
这一部分的理解可参考我之前写的博文60分钟闪击速成PyTorch(Deep Learning with PyTorch: A 60 Minute Blitz)学习笔记_诸神缄默不语的博客-CSDN博客
一个training loop:
del model del trainer torch.cuda.empty_cache()
3.1 数据集
from torch.utils.data import DataLoader tokenized_datasets = tokenized_datasets.remove_columns(["text"]) #删除模型不用的text列 tokenized_datasets = tokenized_datasets.rename_column("label", "labels") #改名label列为labels,因为AutoModelForSequenceClassification的入参键名为label #我不知道为什么dataset直接叫label就可以啦…… tokenized_datasets.set_format("torch") #将值转换为torch.Tensor对象 small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000)) small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000)) #抽样出一部分数据来,快速完成教程 train_dataloader = DataLoader(small_train_dataset, shuffle=True, batch_size=8) eval_dataloader = DataLoader(small_eval_dataset, batch_size=8) #将数据集转为DataLoader,就是键-值相对应的形式,后文可以看出是通过**batch的形式将数据传入模型的
print(type(example_dict['labels'])) print(example_dict['labels'][12345]) print(type(example_dict['text'])) print(example_dict['text'][12345])
<class 'list'> 2 <class 'list'> I went here in search of a crepe with Nutella and I got a really good crepe. I wouldn't exactly say this place is authentic French because you've got Americans cooking the food, but my crepe was still good. \n\nIt doesn't taste like the ones I had in France, Carmon's puts a twist on (or maybe it was just overcooked) theirs by making the crepe more firm. \n\nThe whipped cream was also made fresh and delightful. The prices were horrid though.\n\nCrepes don't cost that much to make, so they're clearly overpricing here. Price is the only reason I won't come back so often.
- 使用torch的DataSet和DataLoader类(跟上面将datasets.Dataset最后得到的东西相当于是一样的):
from torch.utils.data import Dataset,DataLoader #定义DataSet class YelpDataset(Dataset): def __init__(self,dict_data) -> None: """ dict_data: dict格式的data,键labels对应标签列表(元素是数值),键text对应文本列表 """ super(YelpDataset,self).__init__() self.data=dict_data def __getitem__(self, index): return [self.data['text'][index],self.data['labels'][index]] #返回一个列表,第一个元素是文本,第二个元素是标签 def __len__(self): return len(self.data['text']) #定义collate函数 def collate_fn(batch): pt_batch=tokenizer([b[0] for b in batch],padding=True,truncation=True,max_length=512, return_tensors='pt') labels=torch.tensor([b[1] for b in batch]) return {'labels':labels,'input_ids':pt_batch['input_ids'],'token_type_ids':pt_batch['token_type_ids'], 'attention_mask':pt_batch['attention_mask']} train_data=YelpDataset(example_dict) train_dataloader=DataLoader(train_data,batch_size=8,shuffle=True,collate_fn=collate_fn)
在每个training loop中,如此遍历(大多数变量我觉得都能看名字就看出来什么意思,就不做详细介绍了):
#训练部分 #(验证部分差不多) train_data_length=len(example_dict['labels']) if train_data_length%batch_size==0: batch_num=int(train_data_length/batch_size) else: batch_num=int(train_data_length/batch_size)+1 for b in range(batch_num): index_begin=b*batch_size index_end=min(train_data_length,index_begin+batch_size) this_batch_text=example_dict['text'][index_begin:index_end] this_batch_labels=example_dict['labels'][index_begin:index_end] pt_batch=tokenizer(this_batch_text,padding=True,truncation=True,max_length=512,return_tensors='pt') #pt_batch我就懒得按键拆开了,以下运行训练部分代码,和用DataLoader的类似,一目了然不言而喻,略
3.2 神经网络模型
from transformers import AutoModelForSequenceClassification model=AutoModelForSequenceClassification.from_pretrained("mypath/bert-base-cased", num_labels=5)
3.3 优化器和learning rate scheduler
FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
from torch.optim import AdamW optimizer = AdamW(model.parameters(), lr=5e-5) 1 2 从Trainer创建默认的learning rate scheduler: from transformers import get_scheduler num_epochs = 3 num_training_steps = num_epochs * len(train_dataloader) lr_scheduler = get_scheduler( name="linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=num_training_steps )
3.4 运行设备
import torch device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") model.to(device)
3.5 Training Loop
from tqdm.auto import tqdm progress_bar = tqdm(range(num_training_steps)) model.train() for epoch in range(num_epochs): for batch in train_dataloader: batch = {k: v.to(device) for k, v in batch.items()} outputs = model(**batch) loss = outputs.loss loss.backward() optimizer.step() lr_scheduler.step() optimizer.zero_grad() progress_bar.update(1)
3.6 指标
metric = load_metric("accuracy") model.eval() for batch in eval_dataloader: batch = {k: v.to(device) for k, v in batch.items()} with torch.no_grad(): outputs = model(**batch) logits = outputs.logits predictions = torch.argmax(logits, dim=-1) metric.add_batch(predictions=predictions, references=batch["labels"]) metric.compute()
输出:{'accuracy': 0.588}
3.7 完整的脚本代码
from tqdm.auto import tqdm import torch from torch.utils.data import DataLoader from torch.optim import AdamW import datasets from transformers import AutoTokenizer,AutoModelForSequenceClassification,get_scheduler dataset=datasets.load_from_disk("datasets/yelp_full_review_disk") tokenizer = AutoTokenizer.from_pretrained("pretrained_models/bert-base-cased") def tokenize_function(examples): return tokenizer(examples["text"], padding="max_length",truncation=True,max_length=512) tokenized_datasets = dataset.map(tokenize_function, batched=True) #Postprocess dataset tokenized_datasets = tokenized_datasets.remove_columns(["text"]) #删除模型不用的text列 tokenized_datasets = tokenized_datasets.rename_column("label", "labels") #改名label列为labels,因为AutoModelForSequenceClassification的入参键名为label #我不知道为什么dataset直接叫label就可以啦…… tokenized_datasets.set_format("torch") #将值转换为torch.Tensor对象 small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000)) small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000)) train_dataloader = DataLoader(small_train_dataset, shuffle=True, batch_size=8) eval_dataloader = DataLoader(small_eval_dataset, batch_size=8) model=AutoModelForSequenceClassification.from_pretrained\ ("pretrained_models/bert-base-cased",num_labels=5) optimizer = AdamW(model.parameters(), lr=5e-5) num_epochs = 3 num_training_steps = num_epochs * len(train_dataloader) lr_scheduler = get_scheduler( name="linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=num_training_steps ) device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") model.to(device) progress_bar = tqdm(range(num_training_steps)) model.train() for epoch in range(num_epochs): for batch in train_dataloader: batch = {k: v.to(device) for k, v in batch.items()} outputs = model(**batch) loss = outputs.loss loss.backward() optimizer.step() lr_scheduler.step() optimizer.zero_grad() progress_bar.update(1) metric=datasets.load_metric('datasets/accuracy.py') model.eval() for batch in eval_dataloader: batch = {k: v.to(device) for k, v in batch.items()} with torch.no_grad(): outputs = model(**batch) logits = outputs.logits predictions = torch.argmax(logits, dim=-1) metric.add_batch(predictions=predictions, references=batch["labels"]) print(metric.compute())