ChatGLM3-6B微调Could not find cuda drivers

ChatGLM3-6B 微调，
环境：

使用modelscope提供的模型，微调使用ChatGLM官方提供的微调代码ChatGLM3/finetune_demo/lora_finetune.ipynb
报错：没有找到cuda驱动
但cuda驱动显示如下：
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

详细微调报错如下：
2024-02-02 10:41:22.129186: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-02 10:41:22.176394: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-02 10:41:22.176431: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-02 10:41:22.176460: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-02 10:41:22.185307: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-02 10:41:22.185657: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-02 10:41:23.415865: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/opt/conda/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0

展开

收起

又农 2024-02-02 11:05:55 606 版权

5 条回答

写回答

取消提交回答

又农

信息技术（信息安全）高级工程师

感谢各位热心的回答，初学者太多的问题。现在自己也慢慢摸索了一点，https://www.zhihu.com/column/c_1758152537000435713，欢迎指教。

2024-04-26 17:58:46

赞同展开评论
小周sir

面对过去，不要迷离；面对未来，不必彷徨；活在今天，你只要把自己完全展示给别人看。
根据报错信息，您的系统上的NVIDIA驱动版本过旧（找到的版本为11040），请更新您的GPU驱动程序。您可以通过以下步骤进行更新：
1. 访问NVIDIA官方网站下载页面：http://www.nvidia.com/Download/index.aspx
2. 根据您的显卡型号和操作系统选择合适的驱动程序版本进行下载。
3. 安装下载的驱动程序并重启计算机。
2024-02-04 13:52:21

赞同展开评论
Skyund

您的环境中没有正确安装或者系统无法找到兼容的NVIDIA CUDA驱动程序。CUDA驱动是支持GPU加速计算的关键组件，对于运行大型语言模型如ChatGLM3-6B至关重要。

2024-02-02 19:01:44

赞同展开评论
听风de歌
虽然您显示的 CUDA 编译器版本为 12.3，但是 TensorFlow 报告找不到 CUDA 驱动。这可能是因为环境变量设置不正确，CUDA 库路径没有添加到系统 PATH 中。请检查以下内容：
- 确保在运行微调代码之前设置了 CUDA 和 cuDNN 的环境变量。
- 查看 TensorFlow 是否正确识别了 CUDA 设备，可以通过在 Python 中运行如下代码查看：
  
  import tensorflow as tf tf.config.list_physical_devices('GPU')
  
  如果仍然存在问题，请确保 TensorFlow 和 PyTorch 版本与 CUDA 版本兼容，并按照官方指南配置环境。
2024-02-02 16:25:58

赞同展开评论
1941623231718325
在尝试运行ChatGLM3-6B或其它基于GPU的深度学习模型时，如果遇到“Could not find cuda drivers”的错误提示，这意味着运行环境未能正确检测到CUDA驱动程序或者CUDA驱动版本与使用的深度学习框架或模型不兼容。以下是几个可能的解决步骤：
1. 确认CUDA安装：
  
  确保CUDA Toolkit已经在服务器或本地计算机上正确安装，并且版本与模型要求的CUDA版本相匹配。例如，一些深度学习库可能需要特定版本的CUDA，如CUDA 10.x或更高版本。
2. 检查驱动程序版本：
  
  更新或安装正确的NVIDIA GPU驱动程序，确保驱动程序与CUDA版本配套。可以访问NVIDIA官网下载对应显卡型号的最新驱动程序。
3. 环境变量设置：
  
  确认系统环境变量PATH包含了CUDA的bin目录路径，例如在Linux系统中，通常需要将/usr/local/cuda/bin添加到PATH中。
4. 检查CUDA设备可见性：
  
  运行nvidia-smi命令确认GPU是否被操作系统识别，并且CUDA驱动可以正常工作。
5. 与深度学习框架兼容性：
  
  确保你使用的深度学习框架（如PyTorch或TensorFlow）版本与CUDA版本兼容。有时即使CUDA驱动安装无误，也可能由于框架版本问题导致找不到CUDA驱动。
6. 容器环境中的CUDA支持：
  
  如果你在Docker容器中运行ChatGLM3-6B，需要确保构建镜像时包含了CUDA运行时库，并且在运行容器时通过--runtime=nvidia等标志启用NVIDIA GPU支持。
7. 重新启动服务：
  
  有时候更改环境变量或安装新的驱动程序后，可能需要重启计算机或重启与CUDA相关的服务，以便更改生效。
2024-02-02 12:13:34

赞同展开评论

ChatGLM3-6B微调Could not find cuda drivers

自然语言处理

相关文章

热门讨论

热门文章