[mcj]cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version解决

马春杰杰

6年 ago

目前在用anaconda进行各种包的管理，确实很方便，不过使用中还是有一些小问题。

关于anaconda管理cuda的问题：

anaconda的方便之处在于它很方便的管理各种包，比如我们在不同的项目使用不同的cuda版本，这时就可以建立不同的环境来管理。

一次安装完环境对tensorflow进行GPU测试的时候，报了这么个错误：

(keras) ubuntu@mcj:~$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-04-22 08:41:25.041984: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-22 08:41:25.049150: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3606595000 Hz
2019-04-22 08:41:25.050151: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55bbf1a285d0 executing computations on platform Host. Devices:
2019-04-22 08:41:25.050197: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-04-22 08:41:26.633021: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55bbf3ff98b0 executing computations on platform CUDA. Devices:
2019-04-22 08:41:26.633075: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-04-22 08:41:26.633091: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-04-22 08:41:26.634210: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 10.92GiB freeMemory: 10.33GiB
2019-04-22 08:41:26.635144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-04-22 08:41:26.636197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/anaconda3/envs/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/ubuntu/anaconda3/envs/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__
    self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version

原因：

出现这个错误是因为环境中所使用的cuda版本比系统中安装的要高。

在这里解释一下，anaconda虽然可以安装各种cudatoolkit，但是关于cuda最重要的内核驱动它并没有安装，无论安装哪个版本的cudatoolkit，都会调用系统的cuda内核，因此环境中安装的cuda版本不能超过系统中存在的内核版本，比如我系统安装的内核是V9.0的cuda，在环境中安装的cuda版本最高也就是9.0了，如果想安装更高版本的cuda，只能先升级系统中cuda的版本。

解决：

知道了原因，解决起来就很简单了，直接降低cuda版本即可。

conda install cudatoolkit=8.0

安装之后再测试一下，成功运行。

(keras) ubuntu@mcj:~$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-04-22 08:55:08.274263: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-22 08:55:09.114223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 10.92GiB freeMemory: 10.33GiB
2019-04-22 08:55:10.083110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-04-22 08:55:10.084350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2019-04-22 08:55:10.084428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 
2019-04-22 08:55:10.084447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y Y 
2019-04-22 08:55:10.084458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   Y Y 
2019-04-22 08:55:10.084492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-04-22 08:55:10.084508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1
2019-04-22 08:55:10.319426: I tensorflow/core/common_runtime/direct_session.cc:299] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1

本文最后更新于2019年5月21日，已超过 1 年没有更新，如果文章内容或图片资源失效，请留言反馈，我们会及时处理，谢谢！

赞一个 (268)收藏 (0)