目前在用anaconda进行各种包的管理,确实很方便,不过使用中还是有一些小问题。
关于anaconda管理cuda的问题:
anaconda的方便之处在于它很方便的管理各种包,比如我们在不同的项目使用不同的cuda版本,这时就可以建立不同的环境来管理。
一次安装完环境对tensorflow进行GPU测试的时候,报了这么个错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
(keras) ubuntu@mcj:~$ python Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) 2019-04-22 08:41:25.041984: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-04-22 08:41:25.049150: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3606595000 Hz 2019-04-22 08:41:25.050151: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55bbf1a285d0 executing computations on platform Host. Devices: 2019-04-22 08:41:25.050197: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-04-22 08:41:26.633021: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55bbf3ff98b0 executing computations on platform CUDA. Devices: 2019-04-22 08:41:26.633075: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1 2019-04-22 08:41:26.633091: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1 2019-04-22 08:41:26.634210: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:01:00.0 totalMemory: 10.92GiB freeMemory: 10.33GiB 2019-04-22 08:41:26.635144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:02:00.0 totalMemory: 10.92GiB freeMemory: 10.76GiB 2019-04-22 08:41:26.636197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ubuntu/anaconda3/envs/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__ super(Session, self).__init__(target, graph, config=config) File "/home/ubuntu/anaconda3/envs/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__ self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts) tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version |
原因:
出现这个错误是因为环境中所使用的cuda版本比系统中安装的要高。
在这里解释一下,anaconda虽然可以安装各种cudatoolkit,但是关于cuda最重要的内核驱动它并没有安装,无论安装哪个版本的cudatoolkit,都会调用系统的cuda内核,因此环境中安装的cuda版本不能超过系统中存在的内核版本,比如我系统安装的内核是V9.0的cuda,在环境中安装的cuda版本最高也就是9.0了,如果想安装更高版本的cuda,只能先升级系统中cuda的版本。
解决:
知道了原因,解决起来就很简单了,直接降低cuda版本即可。
1 |
conda install cudatoolkit=8.0 |
安装之后再测试一下, 成功运行。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
(keras) ubuntu@mcj:~$ python Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) 2019-04-22 08:55:08.274263: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-04-22 08:55:09.114223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:01:00.0 totalMemory: 10.92GiB freeMemory: 10.33GiB 2019-04-22 08:55:10.083110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:02:00.0 totalMemory: 10.92GiB freeMemory: 10.76GiB 2019-04-22 08:55:10.084350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix 2019-04-22 08:55:10.084428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 2019-04-22 08:55:10.084447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0: Y Y 2019-04-22 08:55:10.084458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1: Y Y 2019-04-22 08:55:10.084492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) 2019-04-22 08:55:10.084508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1) Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1 /job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1 2019-04-22 08:55:10.319426: I tensorflow/core/common_runtime/direct_session.cc:299] Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1 /job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1 |
本文最后更新于2019年5月21日,已超过 1 年没有更新,如果文章内容或图片资源失效,请留言反馈,我们会及时处理,谢谢!
我系统里装的11.0,conda装的10.1,运行后报错:libcublas.so.10: cannot open shared object file: No such file or directory
您好,如果我不在系统里面装cuda,只用anaconda的虚拟环境中装cuda,这样会有什么影响吗?会不会跑程序就慢了
@ganas 这样是不行的,因为anaconda中的cuda是需要调用系统中的cuda进行的。anaconda中的cuda可以比系统的低,但系统不能没有。
你好,请问我直接用conda在某个环境里安装了pytorch-gpu版本,conda自动安装了cuda和cudnn,cuda版本是10.1,但是我运行nvcc -V提示不是内部信息,我看我C盘里的NVIDIA Corporation文件夹里是v10.1,但是我的NVIDIA GPU COMPUTING Toolkits文件夹里是v11,而且该文件夹里只有一些cudnn文件,请问这是怎么回事?另外,如果我现在直接在系统里再次从官网下载安装cuda,会不会和原cuda冲突?
@wsg 你只是在conda中安装了cuda,系统环境中是不是没有安装?
必须现在系统中安装cuda之后,nvcc命令才能用,关于安装可以参考博客之前的文章。
@马春杰杰 我现在直接在cmd运行openpose.exe cuda也能用,不知道再在系统里安装了会不会冲突?
@wsg 噢噢,你用的是win环境啊?不好意思啊,我对win环境的配置不太熟悉 😐