开发者社区 > ModelScope模型即服务 > 自然语言处理 > 正文

模型nlp_mt5_zero-shot-augment_chinese-base的文本分类模块的微调

已解决

nlp_mt5_zero-shot-augment_chinese-base模型进行微调,其中pytorch_model.bin文件一直没有输出,并且报错:
MemoryError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "F:\study\graduationProject\vue2_mt5\vue_flask\finetune.py", line 61, in
trainer.train()
File "D:\tool\Anaconda\anaconda3\envs\modelscope\lib\site-packages\modelscope\trainers\trainer.py", line 711, in train
self.train_loop(self.train_dataloader)
File "D:\tool\Anaconda\anaconda3\envs\modelscope\lib\site-packages\modelscope\trainers\trainer.py", line 1243, in train_loop
self.invoke_hook(TrainerStages.after_train_epoch)
File "D:\tool\Anaconda\anaconda3\envs\modelscope\lib\site-packages\modelscope\trainers\trainer.py", line 1395, in invoke_hook
getattr(hook, fn_name)(self)
File "D:\tool\Anaconda\anaconda3\envs\modelscope\lib\site-packages\modelscope\trainers\hooks\checkpoint\checkpoint_hook.py", line 177, in after_train_epoch
self._do_save(trainer, CheckpointStrategy.by_epoch)
File "D:\tool\Anaconda\anaconda3\envs\modelscope\lib\site-packages\modelscope\trainers\hooks\checkpoint\checkpoint_hook.py", line 160, in _do_save
self._save_checkpoint(trainer, prefix)
File "D:\tool\Anaconda\anaconda3\envs\modelscope\lib\site-packages\modelscope\trainers\hooks\checkpoint\checkpoint_hook.py", line 224, in _save_checkpoint
self.processor.save_checkpoints(trainer, checkpoint_path_prefix,
File "D:\tool\Anaconda\anaconda3\envs\modelscope\lib\site-packages\modelscope\trainers\hooks\checkpoint\checkpoint_processor.py", line 126, in save_checkpoints
self.save_trainer_state(trainer, model, _train_state_file, meta,
File "D:\tool\Anaconda\anaconda3\envs\modelscope\lib\site-packages\modelscope\trainers\hooks\checkpoint\checkpoint_processor.py", line 192, in save_trainer_state
save_checkpoint(
File "D:\tool\Anaconda\anaconda3\envs\modelscope\lib\site-packages\modelscope\utils\checkpoint.py", line 114, in save_checkpoint
torch.save(checkpoint, f)
File "D:\tool\Anaconda\anaconda3\envs\modelscope\lib\site-packages\torch\serialization.py", line 620, in save
return
File "D:\tool\Anaconda\anaconda3\envs\modelscope\lib\site-packages\torch\serialization.py", line 482, in exit
self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:424] . unexpected pos 1237267904 vs 1237267856

展开
收起
游客ymliuhcefik54 2024-06-07 08:28:50 22 0
1 条回答
写回答
取消 提交回答
  • 北京阿里云ACE会长
    采纳回答

    减小批处理大小:尝试减小训练时的批处理大小(batch size),以减少内存消耗。

    training_args = TrainingArguments(
        ...,
        per_device_train_batch_size=8,  # 调整批处理大小
        ...
    )
    

    文件系统或路径问题 检查磁盘空间:确保磁盘空间足够。

    df -h # 查看磁盘空间使用情况

    2024-06-07 09:11:24
    赞同 展开评论 打赏

包含命名实体识别、文本分类、分词、关系抽取、问答、推理、文本摘要、情感分析、机器翻译等多个领域

相关产品

  • 自然语言处理
  • 热门讨论

    热门文章

    相关电子书

    更多
    Machine Translation at Alibaba 立即下载
    自然语言处理的十个发展趋势 立即下载
    深度学习与自然语言处理 立即下载