【Deepin 20系统】解决Check failed: err == cudaSuccess || err == cudaErrorInvalidValue Unexpected CUDA erro

简介: 本文介绍了在使用Nvidia RTX 2070 GPU和TensorFlow 2时,解决GPU内存不足错误的方法,包括杀死占用内存的进程、重置GPU以及重启设备等方案。

问题

环境

Nvidia RTX 2070
Tensorflow 2

在利用GPU运行了程序。中断重新运行程序,就会报错,GPU内存不够,显然是被占用了
F tensorflow/stream_executor/cuda/cuda_driver.cc:175] Check failed: err == cudaSuccess || err == cudaErrorInvalidValue Unexpected CUDA error: out of memory

解决办法

因为程序虽然没有运行了,但进程还在后台运行的,占用着GPU的。可以通过杀死进程为GPU腾出内存空间来。或者重启设备即可
(1)解决办法一:杀死进程

sudo fuser -v /dev/nvidia*

查看当前占用GPU的进程,可以看到两个Python进程占着Nvidia显卡

1.png


杀死进程
sudo kill -9 9388
sudo kill -9 5944

(2)解决办法二:多个GPU的话,可以通过命令重启占用的显卡

nvidia-smi --gpu-reset

(3)解决办法三:重启设配

相关实践学习
部署Stable Diffusion玩转AI绘画(GPU云服务器)
本实验通过在ECS上从零开始部署Stable Diffusion来进行AI绘画创作,开启AIGC盲盒。
目录
相关文章
|
6月前
|
Linux Windows
Installing, this may take a few minutes...WslRegisterDistribution failed with error: 0x80370114Err
Installing, this may take a few minutes...WslRegisterDistribution failed with error: 0x80370114Err
701 3
|
7月前
|
开发者 iOS开发
no identity found Command CodeSign failed with a nonzero exit code
no identity found Command CodeSign failed with a nonzero exit code
86 0
|
7月前
|
应用服务中间件 Python 容器
ERROR [ntContainer#0-1] o.s.a.r.l.SimpleMessageListenerContainer 1917: Failed to check/redeclare aut
ERROR [ntContainer#0-1] o.s.a.r.l.SimpleMessageListenerContainer 1917: Failed to check/redeclare aut
266 0
|
7月前
|
C语言 C++
关于DEV中collect2.exe [Error] ld returned 1 exit status的问题解决!!!
关于DEV中collect2.exe [Error] ld returned 1 exit status的问题解决!!!
|
7月前
|
Kubernetes 容器
【kubernetes】解决k8s1.28.4:"command failed" err="failed to parse kubelet flag: unknown flag: --c...
【kubernetes】解决k8s1.28.4:"command failed" err="failed to parse kubelet flag: unknown flag: --c...
1191 0
|
编译器 Serverless Go
Fail to start function, Code:1
Fail to start function, Code:1
69 2
|
7月前
|
计算机视觉 Python
error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'
error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'
182 0
|
Unix Linux 异构计算
成功解决 ERROR: An error occurred while performing the step: “Building kernel modules“. See /var/log/nv
成功解决 ERROR: An error occurred while performing the step: “Building kernel modules“. See /var/log/nv
成功解决 ERROR: An error occurred while performing the step: “Building kernel modules“. See  /var/log/nv
|
关系型数据库 MySQL C++
Error:fatal error C1010: unexpected end of file while looking for precompiled head
Error:fatal error C1010: unexpected end of file while looking for precompiled head
122 0
|
关系型数据库 MySQL C++
Error:error C2601: ‘b‘ : local function definitions are illegal error C2063: ‘b‘ : not a function
Error:error C2601: ‘b‘ : local function definitions are illegal error C2063: ‘b‘ : not a function
191 0