马春杰杰 Exit Reader Mode

[mcj]LXD容器中无法运行CUDAsamples:cudaGetDeviceCount returned 30解决方法

我想用cuda,然后我在主机上安装了NVIDIA390,cuda9.0,然后我在容器中安装了相同的版本。当我在容器中运行cuda的样本时,我收到一个错误:

I want to use cuda, then I installed NVIDIA390, cuda9.0 on the host machine, and then I installed the same version in the container. When I run cuda’s samples in the container, I get an error:

root@jty:~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery# ./deviceQuery   
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 30-> 
unknown error
Result = FAIL

经过多方查找原因之后,发现大多数的解决方法都是在解决宿主机上发生这种情况的问题,而对于LXD虚拟容器很少有人提及,最终确定原因是宿主机没有/dev/nvidia-uvm设备。

After many reasons, I found that most of the solutions are to solve this problem on the host. However, there is very little mention of the LXD virtual container. The final reason is that the host does not have /dev/nvidia-uvm. device.

解决方法:

sudo /sbin/modprobe nvidia-uvm
D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
sudo mknod -m 666 /dev/nvidia-uvm c $D 0

上面的操作时在宿主机上进行的,进行完之后,将设备挂在到容器中:

The above operation is performed on the host machine. After the operation is finished, the device is hung in the container:
挂载方法如下:

lxc config device add yourContainerName nvidia-uvm unix-char path=/dev/nvidia-uvm

这样就行了。

或者新建一个脚本auto.sh,里面填入:

sudo /sbin/modprobe nvidia-uvm
D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
sudo mknod -m 666 /dev/nvidia-uvm c $D 0
ls /dev/nvidia*

这样就行了。。

最近有人说执行了也不行,在这里有个注意点:

如果不行就关闭容器再试~
本文最后更新于2020年5月15日,已超过 1 年没有更新,如果文章内容或图片资源失效,请留言反馈,我们会及时处理,谢谢!