The reason is simple: I just use too much gpu memory:
1. 2018-09-26 18:50:05.489980: W T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:279] ********__*_____________________*_**_____________________*____*_**********************************xx 2. 2018-09-26 18:50:05.490391: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1275] OP_REQUIRES failed at conv_ops.cc:636 : Resource exhausted: OOM when allocating tensor with shape[32,32,417,417] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc 3. Traceback (most recent call last): 4. 5. callbacks=[logging, checkpoint]) 6. File "D:\Anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper 7. return func(*args, **kwargs) 8. File "D:\Anaconda3\lib\site-packages\keras\engine\training.py", line 1415, in fit_generator 9. initial_epoch=initial_epoch) 10. File "D:\Anaconda3\lib\site-packages\keras\engine\training_generator.py", line 213, in fit_generator 11. class_weight=class_weight) 12. File "D:\Anaconda3\lib\site-packages\keras\engine\training.py", line 1215, in train_on_batch 13. outputs = self.train_function(ins) 14. File "D:\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 2666, in __call__ 15. return self._call(inputs) 16. File "D:\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 2636, in _call 17. fetched = self._callable_fn(*array_vals) 18. File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1382, in __call__ 19. run_metadata_ptr) 20. File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 519, in __exit__ 21. c_api.TF_GetCode(self.status.status)) 22. tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,32,417,417] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc 23. [[Node: conv2d_2/convolution = Conv2D[T=DT_FLOAT, _class=["loc:@batch_normalization_2/cond/FusedBatchNorm/Switch"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](zero_padding2d_1/Pad, conv2d_2/kernel/read)]] 24. Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. 25. 26. [[Node: yolo_loss/while_1/LoopCond/_2963 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_6607_yolo_loss/while_1/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_1/strided_slice_1/stack_2/_2805)]] 27. Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
1. 2018-09-26 18:50:05.482286: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 1 Chunks of size 16384 totalling 16.0KiB 2. 2018-09-26 18:50:05.482594: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 8 Chunks of size 21504 totalling 168.0KiB 3. 2018-09-26 18:50:05.482884: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 2 Chunks of size 32768 totalling 64.0KiB 4. 2018-09-26 18:50:05.483090: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 8 Chunks of size 43008 totalling 336.0KiB 5. 2018-09-26 18:50:05.483276: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 5 Chunks of size 65024 totalling 317.5KiB 6. 2018-09-26 18:50:05.483457: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 2 Chunks of size 73728 totalling 144.0KiB 7. 2018-09-26 18:50:05.483656: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 8 Chunks of size 86016 totalling 672.0KiB 8. 2018-09-26 18:50:05.483844: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 3 Chunks of size 129792 totalling 380.3KiB 9. 2018-09-26 18:50:05.484411: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 11 Chunks of size 131072 totalling 1.38MiB 10. 2018-09-26 18:50:05.484719: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 1 Chunks of size 196608 totalling 192.0KiB 11. 2018-09-26 18:50:05.484902: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 5 Chunks of size 259584 totalling 1.24MiB 12. 2018-09-26 18:50:05.485216: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 3 Chunks of size 294912 totalling 864.0KiB 13. 2018-09-26 18:50:05.485494: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 1 Chunks of size 454400 totalling 443.8KiB 14. 2018-09-26 18:50:05.485748: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 3 Chunks of size 519168 totalling 1.49MiB 15. 2018-09-26 18:50:05.486063: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 11 Chunks of size 524288 totalling 5.50MiB 16. 2018-09-26 18:50:05.486245: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 1 Chunks of size 786432 totalling 768.0KiB 17. 2018-09-26 18:50:05.486419: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 4 Chunks of size 1038336 totalling 3.96MiB 18. 2018-09-26 18:50:05.486590: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 12 Chunks of size 1179648 totalling 13.50MiB 19. 2018-09-26 18:50:05.486764: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 1 Chunks of size 1817088 totalling 1.73MiB 20. 2018-09-26 18:50:05.486934: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 3 Chunks of size 2076672 totalling 5.94MiB 21. 2018-09-26 18:50:05.487432: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 7 Chunks of size 2097152 totalling 14.00MiB 22. 2018-09-26 18:50:05.487719: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 12 Chunks of size 4718592 totalling 54.00MiB 23. 2018-09-26 18:50:05.487982: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 1 Chunks of size 7268352 totalling 6.93MiB 24. 2018-09-26 18:50:05.488284: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 8 Chunks of size 18874368 totalling 144.00MiB 25. 2018-09-26 18:50:05.488560: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 1 Chunks of size 431485952 totalling 411.50MiB 26. 2018-09-26 18:50:05.488842: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 1 Chunks of size 712249344 totalling 679.25MiB 27. 2018-09-26 18:50:05.489097: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:678] Sum Total of in-use chunks: 1.32GiB 28. 2018-09-26 18:50:05.489374: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:680] Stats: 29. Limit: 3211594956 30. InUse: 1415122432 31. MaxInUse: 2420054016 32. NumAllocs: 1707 33. MaxAllocSize: 712249344
The most expedient way is probably to reduce the batch size. It'll run slower, but use less memory.
I change the batch_size from 128 to 32 then the problem is resolved!
AIEarth是一个由众多领域内专家博主共同打造的学术平台,旨在建设一个拥抱智慧未来的学术殿堂!【平台地址:https://devpress.csdn.net/aiearth】 很高兴认识你!加入我们共同进步!