马春杰杰 Exit Reader Mode

Autodl私有云启动实例失败:OCI runtime create failed

Autodl自建了私有云,但是偶尔会报错:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown

这是NVIDIA Container Toolkit 正在尝试用 cgroup/eBPF 做设备过滤,但当前的运行环境( LXD 容器里跑 Docker)不允许这么做,于是容器启动在 prestart hook 阶段失败。

因此,我们只需要把 NVIDIA cgroup/eBPF 设备过滤关掉即可。

vi /etc/nvidia-container-runtime/config.toml

修改以下字段:

[nvidia-container-cli]
no-cgroups = true

然后:sudo systemctl restart docker

然后就可以了~