lxc容器重启时出现问题,提示:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | Name: pang Remote: unix:// Architecture: x86_64 Created: 2022/11/02 06:47 UTC Status: Stopped Type: persistent Profiles: default Log: lxc pangyindong 20221105051756.396 ERROR    conf - conf.c:mount_entry:2019 - No such device or address - Failed to mount "/var/lib/lxd/devices/pangyindong/unix.gpu0.dev-nvidia-caps" on "/usr/lib/x86_64-linux-gnu/lxc/dev/nvidia-caps" lxc pangyindong 20221105051756.396 ERROR    conf - conf.c:lxc_setup:3611 - Failed to setup mount entries lxc pangyindong 20221105051756.396 ERROR    start - start.c:do_start:1263 - Failed to setup container "pangyindong" lxc pangyindong 20221105051756.396 ERROR    sync - sync.c:__sync_wait:62 - An error occurred in another process (expected sequence number 5) lxc pangyindong 20221105051756.396 WARN     network - network.c:lxc_delete_network_priv:2589 - Operation not permitted - Failed to remove interface "eth0" with index 18 lxc pangyindong 20221105051756.396 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:842 - Received container state "ABORTING" instead of "RUNNING" lxc pangyindong 20221105051756.396 ERROR    start - start.c:__lxc_start:1939 - Failed to spawn container "pangyindong" lxc 20221105051756.397 WARN     commands - commands.c:lxc_cmd_rsp_recv:132 - Connection reset by peer - Failed to receive response for command "get_state" | 
原因是容器启动时试图挂载unix.gpu0.dev-nvidia-caps,但是这个是无法挂载的,所以出错。
我们可以ls一下/dev/nvidia*看看:
| 1 2 3 4 | /dev/nvidia0  /dev/nvidia1  /dev/nvidiactl  /dev/nvidia-modeset  /dev/nvidia-uvm  /dev/nvidia-uvm-tools /dev/nvidia-caps: nvidia-cap1  nvidia-cap2 | 
目前的解决方案是手动移除/dev/nvidia-caps这个文件夹。当然,安全起见,把这个文件夹换个名字吧。
本文最后更新于2022年11月5日,已超过 1 年没有更新,如果文章内容或图片资源失效,请留言反馈,我们会及时处理,谢谢!
		 马春杰杰
马春杰杰
 
					 
		