众所周知,之前飞桨一直不支持ubuntu22.04,只有比较老旧的ubuntu18.04, 16.04等老系统。
今天为了部署,特意安装ubuntu系统,装完ubuntu18.04才赫然发现,已经支持CUDA11.7了,突然想到这个版本是支持ubuntu22.04的,随即立刻下载新版本ubuntu,重新再次安装。
总结,今天安装了2次ubuntu。
- ubuntu18.04 不自带显卡驱动,3060显卡下分辨率800*600很难搞。
- ubuntu22.04 默认情况下不支持我的博通网卡(当年为了黑苹果特意买的)
0.硬件情况
- 3060显卡一枚
- 9400cpu一枚
- 分区固态500Gb
- 其他硬件
1.安装前系统准备
- 装完系统,网卡不支持,通过手机USB网络共享上网(android系统自带)
- 点击软件及更新,设置更新源为 阿里源 ,估计更新速度会快
- 点击 “Additional Drivers”,选择nvidia-driver-520(proprietary)(英伟达显卡专用驱动)
- 此外,选择Broadcom Wireless Network Adapter(无线网卡驱动)
更新ing,持续4个多小时,我的小水管流量心疼啊。。。。。。
2.安装 miniconda
- 打开 tuna.moe 找到 miniconda ,下载并安装。
- 打开 tuna.moe/oh-my-tuna ,按指示操作换源(conda、pip)
3.安装 PaddlePaddle-GPU
打开 www.paddlepaddle.org.cn/ 官网,选择 conda、linux、2.4、cuda 11.7 。
3.1创建虚拟环境
首先根据具体的 Python 版本创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 安装支持 3.6 - 3.10 版本的 Python 安装环境。
conda create -n paddle_env python=YOUR_PY_VER
3.2 进入 Anaconda 虚拟环境
conda activate paddle_env
3.3 安装paddlepaddle-gpu=
- 对于
CUDA 11.7
,需要搭配 cuDNN 8.4.1(多卡环境下 NCCL>=2.7),安装命令为:
conda install paddlepaddle-gpu==2.4.0 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
切记:
不要按照官网:
conda install paddlepaddle-gpu==2.4.0 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
操作,要删掉 -c conda-forge
, 否则安装会特别慢,说不定还会中断,切记。。。。。。
4.排除bug操作
4.1 ImportError: libpython3.9.so.1.0: cannot open shared object file: No such file or directory
import paddle
>>> import paddle Error: Can not import paddle core while this file exists: /home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/fluid/libpaddle.so Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/__init__.py", line 25, in <module> from .framework import monkey_patch_variable File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/framework/__init__.py", line 17, in <module> from . import random # noqa: F401 File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/framework/random.py", line 16, in <module> import paddle.fluid as fluid File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/fluid/__init__.py", line 36, in <module> from . import framework File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/fluid/framework.py", line 37, in <module> from . import core File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/fluid/core.py", line 304, in <module> raise e File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/fluid/core.py", line 249, in <module> from . import libpaddle ImportError: libpython3.9.so.1.0: cannot open shared object file: No such file or directory
处理办法:
如下所示,各位根据自己路径进行修改。
(p2) livingbody@gaint:~/miniconda3/envs/p2/lib$ sudo cp libpython3.9.so.1.0 /usr/lib [sudo] livingbody 的密码: (p2) livingbody@gaint:~/miniconda3/envs/p2/lib$ sudo cp libpython3.9.so.1.0 /usr/lib64
4.2 PreconditionNotMetError: Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion.
PreconditionNotMetError: Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion.
>>> paddle.utils.run_check() Running verify PaddlePaddle program ... Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/utils/install_check.py", line 269, in run_check _run_static_single(use_cuda, use_xpu, use_npu) File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/utils/install_check.py", line 173, in _run_static_single exe.run(startup_prog) File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/executor.py", line 1463, in run six.reraise(*sys.exc_info()) File "/home/livingbody/miniconda3/lib/python3.9/site-packages/six.py", line 703, in reraise raise value File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/executor.py", line 1450, in run res = self._run_impl(program=program, File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/executor.py", line 1661, in _run_impl return new_exe.run(scope, list(feed.keys()), fetch_list, File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/executor.py", line 631, in run tensors = self._new_exe.run(scope, feed_names, RuntimeError: In user code: File "<stdin>", line 1, in <module> File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/utils/install_check.py", line 269, in run_check _run_static_single(use_cuda, use_xpu, use_npu) File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/utils/install_check.py", line 159, in _run_static_single input, out, weight = _simple_network() File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/utils/install_check.py", line 33, in _simple_network weight = paddle.create_parameter( File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/layers/tensor.py", line 152, in create_parameter return helper.create_parameter(attr, shape, convert_dtype(dtype), is_bias, File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/layer_helper_base.py", line 381, in create_parameter self.startup_program.global_block().create_parameter( File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/framework.py", line 3965, in create_parameter initializer(param, self) File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/initializer.py", line 56, in __call__ return self.forward(param, block) File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/initializer.py", line 184, in forward op = block.append_op(type="fill_constant", File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/framework.py", line 4017, in append_op op = Operator( File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/framework.py", line 2858, in __init__ for frame in traceback.extract_stack(): PreconditionNotMetError: Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion. [Hint: cudnn_d_handle should not be null.] (at /paddle/paddle/phi/backends/dynload/cudnn.cc:60) [operator < fill_constant > error]
解决办法:根据命令所知,需要的cuda、cudnn都已经安装,出现这个问题是找不到对应的动态库,所以要针对性处理。
4.2.1 mkdir
创建存放动态库的文件夹
mkdir /usr/local/cuda/lib64 -rf
4.2.2 拷贝 cuda 的 lib
拷贝动态库到lib
~/miniconda3/pkgs/cudatoolkit-11.7.0-hd8887f6_10/lib$ sudo cp * /usr/local/cuda/lib64 -rf
4.2.3 拷贝 cudnn 的 lib
覆盖性拷贝,同手动安装cudnn操作
~/miniconda3/pkgs/cudnn-8.4.1.50-hed8a83a_0/lib$ sudo cp * /usr/local/cuda/lib64/ -rf
4.2.4 设置 LD_LIBRARY_PATH 环境变量
编辑 .bahsrc
gedit ~/.bashrc
末尾添加
export LD_LIBRARY_PATH="/usr/local/cuda/lib64"
5.安装完毕
如果大家觉得有用,欢迎来个赞