马春杰杰 Exit Reader Mode

[mcj]Ubuntu16.04安装NVIDIA驱动+CUDA10.1+cudnn7.5.0附详细步骤

这篇文章之前写的时候是装的390.25驱动+CUDA9.0+CUDNN 7.0.5,这里更新到目前我使用的版本,即418.56+CUDA10.1+CUDNN 7.6.5

cuda下载地址:https://developer.nvidia.com/cuda-toolkit-archive

cudnn下载地址:https://developer.nvidia.com/rdp/cudnn-archive

NVIDIA驱动下载地址:https://www.nvidia.cn/Download/Find.aspx?lang=cn

或者用天翼云盘,速度快,下载地址见文末

1 安装NVIDIA显卡驱动

在安装显卡驱动之前,要确定我们要装的显卡驱动和CUDA版本的对应问题。可以参考下表:

这里推荐装个版本高点的驱动,不然以后升级挺麻烦的。下面我都以418.56为例。

我安装的是NVIDIA-Linux-x86_64-418.56.run

首先找个目录,比如/home/ubuntu/nvidia/,然后把run文件放到这个目录里,先给权限

cd /home/ubuntu/nvidia
chmod 777 NVIDIA-Linux-x86_64-418.56.run

接着ctrl+alt+f1~f6进入控制台,登陆之后,关闭lightdm

sudo service lightdm stop

如果你的系统是中文的话,这里会出现乱码,不过没关系,输入密码就好

然后禁用nouveau,打开/etc/modprobe.d/blacklist.conf,在最后添加:

blacklist nouveau
options nouveau modeset=0

接着更新一下:

sudo update-initramfs -u

接着重启电脑,然后输入lsmod | grep nouveau,测试一下nouveau是否正确关闭,如果什么都没有输出则代表已经正确关闭了。

然后进入init 3模式并安装驱动:

sudo init 3 # 或者 sudo service lightdm stop
sudo ./NVIDIA-Linux-x86_64-418.56.run
# 注意这里有可能这条命令不好使,如果已经进入init 3了,还是说x服务正在运行,那么运行下面这条:
sudo ./NVIDIA-Linux-x86_64-418.56.run -no-x-check -no-nouveau-check -no-opengl-files

-no-x-check 安装驱动时不检查x服务
-no-nouveau-check 安装驱动时不检查nouveau
-no-opengl-files 只安装驱动文件,不安装openGl文件

了解更多init 启动级别相关,见:

接下来就是一堆选项,问你是不是同意,直接全部同意即可,安装完成之后,重启。

这个时候输入命令查看驱动nvidia-smi

(base) mcj@mcj2080:~$ nvidia-smi
Mon Jan 18 16:42:28 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:04:00.0  On |                  N/A |
| 27%   34C    P8    21W / 250W |     53MiB / 10986MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      4591      G   /usr/lib/xorg/Xorg                            51MiB |
+-----------------------------------------------------------------------------+

这样显卡驱动就装好了,如果想要卸载的话,可以执行:

sudo ./NVIDIA-Linux-x86_64-418.56.run --uninstall

2 安装CUDA 10.1

先下载CUDA,我使用的是cuda_10.1.105_418.39_linux

还是放在之前的目录下(只是为了方便),然后继续之前的步骤,进行安装:

cd /home/ubuntu/nvidia
chmod 777 cuda_10.1.105_418.39_linux
sudo ./cuda_10.1.105_418.39_linux

CUDA10.X之前:

稍等几秒,会出现一个百分比的协议界面,我们只需要按q即可跳过,接着选择accept,当问及是否需要安装驱动的时候,我们选择N,因为我们之前已经安装过了。其余的我们一律选择是,安装目录也都选择默认即可。

CUDA10.X之后:

注意了,10.X跟之前的不同。这里我们选择10.1,运行上面的命令之后,会出现:

然后我们输入accept,回车

这里把第一个选项取消了,按空格键可以取消。

取消之后选择Install,回车

选择Yes,回车

出现上面这个就可以了。

接下来选择添加环境变量,不然nvcc -V 没法用:

sudo nano /etc/profile
# 在最后加入
export PATH=/usr/local/cuda-10.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH
# ctrl + x 保存
source /etc/profile
sudo nano /etc/ld.so.conf 
# 在最后加入
/usr/local/cuda-10.1/lib64
# ctrl + x 保存
sudo ldconfig

OK,检查一下,输入ldconfig -v|grep cuda,我们可以看到:

ubuntu@mcj:~$ ldconfig -v|grep cuda
/sbin/ldconfig.real: Can't stat /usr/local/lib/x86_64-linux-gnu: No such file or directory
/sbin/ldconfig.real: Can't stat /lib32: No such file or directory
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.27.so is the dynamic linker, ignoring

        libcuda.so.1 -> libcuda.so.418.56
        libicudata.so.55 -> libicudata.so.55.1
        libcuda.so.1 -> libcuda.so.418.56
/sbin/ldconfig.real: Can't create temporary cache file /etc/ld.so.cache~: Permission denied
/usr/local/cuda-10.1/lib64:
        libcudart.so.10.1 -> libcudart.so.10.1

代表我们的动态链接库已经设置好了,如果我们要编译samples的话,还需要安装一些必须的工具:注意:如果你是升级CUDA,那么到这里已经结束了,下面的步骤不需要了。直接跳到CUDNN安装即可。点击到达

sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install freeglut3-dev libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

要注意最后一行命令的安装,其中freeglut3-devlibglu1-mesa-dev可能会报错,别担心,我们可以分别安装这两个,单独安装是没问题的。安装之后,我们进入samples目录,测试一下CUDA是否安装成功。

我们可以选择全部编译,也可以只编译其中一个,这里,我们测试一下NVIDIA_CUDA-10.1_Samples/1_Utilities/deviceQuery这个例子。

cd /NVIDIA_CUDA-10.1_Samples/1_Utilities/deviceQuery
sudo make

稍等一会,会提示编译成功。下面是之前9.0的结果,10.x类似。

ubuntu@mcj:~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ sudo make 
"/usr/local/cuda-9.0"/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o deviceQuery.o -c deviceQuery.cpp
"/usr/local/cuda-9.0"/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o deviceQuery deviceQuery.o 
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release

然后执行一下:

ubuntu@mcj:~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          9.1 / 9.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11175 MBytes (11718230016 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1582 MHz (1.58 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          9.1 / 9.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11178 MBytes (11721506816 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1582 MHz (1.58 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from GeForce GTX 1080 Ti (GPU0) -> GeForce GTX 1080 Ti (GPU1) : Yes
> Peer access from GeForce GTX 1080 Ti (GPU1) -> GeForce GTX 1080 Ti (GPU0) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.0, NumDevs = 2
Result = PASS

出现这个界面说明你的 CUDA已经安装成功了。

3 CUDNN7.0

这里,CUDA和cudnn的版本要注意对应,我选择的是:cudnn-10.1-linux-x64-v7.6.5.32.tgz,还是放在原来的目录,先解压一下:

cd /home/ubuntu/nvidia
tar -xvf cudnn-10.1-linux-x64-v7.6.5.32.tgz
cd cuda
sudo cp ./include/cudnn.h /usr/local/cuda-10.1/include
# 如果是cudnn8.x,上面这条命令需要改为:sudo cp ./include/cudnn* /usr/local/cuda-11.1/include
sudo cp -a ./lib64/libcudnn* /usr/local/cuda-10.1/lib64

OK,这样就装好了。

不过要注意的是,这样安装的cudnn其实相当于做了软连接,为了防止以后误删,我建议把它们直接放到对应的目录下。

测试是否安装成功及安装版本:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
# 对于8.x,需要用以下命令验证:
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

这代表装的是7.0.5版本的cudnn。如果是7.6.5,结果类似。

或者通过编译测试例子来证明,参考:

https://blog.csdn.net/caicaiatnbu/article/details/87626491

4 其他版本

如果要安装其他版本的CUDAcudnn,也是一样的道理,一个更简单的方法是全部解压到指定目录,然后在/etc/profileldconfig中加入路径即可。

比如/etc/profile

# CUDN配置
export PATH=$PATH:/A-pool/cuda/cuda-9.0/bin:/A-pool/cuda/cuda-9.0/include
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/A-pool/cuda/cuda-9.0/lib64
export CUDA_HOME=/A-pool/cuda/cuda-9.0
# cuDNN配置
export PATH=$PATH:/A-pool/cudnn/cuda-9.0-cudnn-7.0/include
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/A-pool/cudnn/cuda-9.0-cudnn-7.0/lib6

/etc/ld.so.conf:

/A-pool/cuda/cuda-9.0/lib64
/A-pool/cudnn/cuda-9.0-cudnn-7.0/lib64

天翼云下载


本文最后更新于2022年4月4日,已超过 1 年没有更新,如果文章内容或图片资源失效,请留言反馈,我们会及时处理,谢谢!