报错FloatingPointError: Loss became infinite or NaN at iteration=88!

简介: 报错FloatingPointError: Loss became infinite or NaN at iteration=88!

项目场景:


Traceback (most recent call last):
  File "/home/yuan/桌面/shenchunhua/CondInst-master/train_net.py", line 255, in <module>
    args=(args,),
  File "/home/yuan/anaconda3/envs/AdelaiNet/lib/python3.7/site-packages/detectron2/engine/launch.py", line 62, in launch
    main_func(*args)
  File "/home/yuan/桌面/shenchunhua/CondInst-master/train_net.py", line 235, in main
    return trainer.train()
  File "/home/yuan/桌面/shenchunhua/CondInst-master/train_net.py", line 118, in train
    self.train_loop(self.start_iter, self.max_iter)
  File "/home/yuan/桌面/shenchunhua/CondInst-master/train_net.py", line 107, in train_loop
    self.run_step()
  File "/home/yuan/anaconda3/envs/AdelaiNet/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 232, in run_step
    self._detect_anomaly(losses, loss_dict)
  File "/home/yuan/anaconda3/envs/AdelaiNet/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 245, in _detect_anomaly
    self.iter, loss_dict
FloatingPointError: Loss became infinite or NaN at iteration=88!
loss_dict = {'loss_fcos_cls': tensor(nan, device='cuda:0', grad_fn=<DivBackward0>), 'loss_fcos_loc': tensor(0.5552, device='cuda:0', grad_fn=<DivBackward0>), 'loss_fcos_ctr': tensor(0.7676, device='cuda:0', grad_fn=<DivBackward0>), 'loss_mask': tensor(0.8649, device='cuda:0', grad_fn=<DivBackward0>), 'data_time': 0.0022056670004531043}


20200805075812593.png


原因分析:


学习率的问题,导致损失爆炸了,可以把学习调整一下!

目录
相关文章
|
4月前
|
机器学习/深度学习
Epoch、Batch 和 Iteration 的区别详解
【8月更文挑战第23天】
662 0
|
4月前
|
PyTorch 算法框架/工具
【Pytorch】解决Fan in and fan out can not be computed for tensor with fewer than 2 dimensions
本文提供了两种解决PyTorch中由于torchtext版本问题导致的“Fan in and fan out can not be computed for tensor with fewer than 2 dimensions”错误的方法。
101 2
|
7月前
|
Linux Windows
【已解决】ValueError: num_samples should be a positive integer value, but got num_samples=0
【已解决】ValueError: num_samples should be a positive integer value, but got num_samples=0
成功解决ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
成功解决ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
|
7月前
|
机器学习/深度学习 算法 定位技术
神经网络epoch、batch、batch size、step与iteration的具体含义介绍
神经网络epoch、batch、batch size、step与iteration的具体含义介绍
401 1
|
机器学习/深度学习 算法框架/工具
【问题记录与解决】KeyError: ‘acc‘ plt.plot(N[150:], H.history[“acc“][150:], label=“train_acc“) # KeyError: ‘
【问题记录与解决】KeyError: ‘acc‘ plt.plot(N[150:], H.history[“acc“][150:], label=“train_acc“) # KeyError: ‘
【问题记录与解决】KeyError: ‘acc‘ plt.plot(N[150:], H.history[“acc“][150:], label=“train_acc“) # KeyError: ‘
|
PyTorch 算法框架/工具
Please ensure they have the same size. return F.mse_loss(input, target, reduction=self.reduction) 怎么解决?
这个通常是由于 input 和 target 张量的维度不匹配导致的,因此可以通过调整它们的维度来解决。
339 0
UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
743 0
Can not squeeze dim[1], expected a dimension of 1, got 21
Can not squeeze dim[1], expected a dimension of 1, got 21
481 0