85

从零开始搭建深度学习服务器: 1080TI四卡并行(Ubuntu16.04+CUDA9.2+cuDNN7.1+TensorF...

 5 years ago
source link: http://www.52nlp.cn/深度学习服务器-1080ti-ubuntu16-04-cuda9-2-cudnn7-1-tensorflow-keras?amp%3Butm_medium=referral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

QRbm2y2.png!web

这个系列写了好几篇文章,这是相关文章的索引,仅供参考:

最近公司又弄了一套4卡1080TI机器,配置基本上和之前是一致的,只是显卡换成了技嘉的伪公版1080TI:技嘉GIGABYTE GTX1080Ti 涡轮风扇108TTURBO-11GD

部件	型号	价格	链接	备注
CPU	<a href="http://union-click.jd.com/jdc?e=0&p=AyIHZR1aFQIWBFYfXCUCEg5VHloUAxUPUisfSlpMWGVCHlBDGRlLQx5BXg1cAAQJS14MB1USWxADEwZSE1wKW1dbCCteaXYUYxB%2FXnZCSn1SHihxRVJdCXArGQ4iAFAcXx0FGgZlHl0XAyI3VRprVGwbBlASWiUHEg9VE1gVBhs3VR9aHAUXD1ETUh0LFTdSKw1FQVpGHUsEQzIiN2U%3D&t=W1dCFBBFC1pXUwkEAEAdQFkJBVsVCxICVBpaEgoVGAxeB0g%3D" rel="noopener" target="_blank">英特尔(Intel)酷睿六核i7-6850K 盒装CPU处理器</a> 	4599	<a href="http://union-click.jd.com/jdc?e=0&p=AyIHZR1aFQIWBFYfXCUCEg5VHloUAxUPUisfSlpMWGVCHlBDGRlLQx5BXg1cAAQJS14MB1USWxADEwZSE1wKW1dbCCteaXYUYxB%2FXnZCSn1SHihxRVJdCXArGQ4iAFAcXx0FGgZlHl0XAyI3VRprVGwbBlASWiUHEg9VE1gVBhs3VR9aHAUXD1ETUh0LFTdSKw1FQVpGHUsEQzIiN2U%3D&t=W1dCFBBFC1pXUwkEAEAdQFkJBVsVCxICVBpaEgoVGAxeB0g%3D" rel="noopener" target="_blank">http://item.jd.com/11814000696.html</a>	
散热器	<a href="http://union-click.jd.com/jdc?e=0&p=AyIHZRprHAcRBVISa1FdSlkKKwJQR1MMSwUDUFZOGA5OREdcThlcHlgXBRsYDF4HSDJxHTdMGXN3SGc9Rw0QehAZVGw5QgRyC1krXBAFFg9SE1olBxQFVCtrFQMiUTsbWhQDEwZUHFMTMhcHXRtTFgIWDmUbXxQLFQJQH14UBRoBZRxrQ1JRTxRTC0pUIjdlKw%3D%3D&t=W1dCFBBFC1pXUwkEAEAdQFkJBVIQARAAXAQCUF5P" rel="noopener" target="_blank">美商海盗船 H55 水冷</a>	449	<a href="http://union-click.jd.com/jdc?e=0&p=AyIHZRprHAcRBVISa1FdSlkKKwJQR1MMSwUDUFZOGA5OREdcThlcHlgXBRsYDF4HSDJxHTdMGXN3SGc9Rw0QehAZVGw5QgRyC1krXBAFFg9SE1olBxQFVCtrFQMiUTsbWhQDEwZUHFMTMhcHXRtTFgIWDmUbXxQLFQJQH14UBRoBZRxrQ1JRTxRTC0pUIjdlKw%3D%3D&t=W1dCFBBFC1pXUwkEAEAdQFkJBVIQARAAXAQCUF5P" rel="noopener" target="_blank">https://item.jd.com/10850633518.html</a>	
主板	华硕(ASUS)华硕 X99-E WS/USB 3.1工作站主板	4759	
内存	<a href="http://union-click.jd.com/jdc?e=0&p=AyIHZRprFQoaBlEdWCVGTV8LRGtMR1dGXgVFTUdGW0pADgpQTFtLG1MdAxYBVgQCUF5PNxEBLUdXUXgofjljWhVYIBIOQmNBewMXVyUFFwBRE1wdAyICUxlaJTISBmVNNRUDEwZUGloSChQ3UBtTFQoRB1ESaxUGEw5SHl4TCxAPUBJrEjJEVxZTGl1STVFlK2sl&t=W1dCFBBFC1pXUwkEAEAdQFkJBVsdChMDUxhETEdOWg%3D%3D" rel="noopener" target="_blank">美商海盗船(USCORSAIR) 复仇者LPX DDR4 3000 32GB(16Gx4条)</a>  	2799 * 2	<a href="http://union-click.jd.com/jdc?e=0&p=AyIHZRprFQoaBlEdWCVGTV8LRGtMR1dGXgVFTUdGW0pADgpQTFtLG1MdAxYBVgQCUF5PNxEBLUdXUXgofjljWhVYIBIOQmNBewMXVyUFFwBRE1wdAyICUxlaJTISBmVNNRUDEwZUGloSChQ3UBtTFQoRB1ESaxUGEw5SHl4TCxAPUBJrEjJEVxZTGl1STVFlK2sl&t=W1dCFBBFC1pXUwkEAEAdQFkJBVsdChMDUxhETEdOWg%3D%3D" rel="noopener" target="_blank">https://item.jd.com/1990572.html</a>	
SSD	<a href="http://union-click.jd.com/jdc?e=0&p=AyIHZRprFwQQD1QTXSVGTV8LRGtMR1dGXgVFTUdGW0pADgpQTFtLGV0XChMPUwQCUF5PN1ZoDwtBFFgjfgtBdkR%2BIWAFVWBLfQMXVyUFFwBRE1wdAyICUxlaJTISBmVNNRUDEwZUHV8dCxI3UBtTFQoRB1ESaxUGEw5SHlscBxMBVR9rEjJEVxZTGl1STVFlK2sl&t=W1dCFBBFC1pXUwkEAEAdQFkJBVkTABoGXR1ETEdOWg%3D%3D" rel="noopener" target="_blank">三星(SAMSUNG) 960 EVO 250G M.2 NVMe 固态硬盘</a>	599	<a href="http://union-click.jd.com/jdc?e=0&p=AyIHZRprFwQQD1QTXSVGTV8LRGtMR1dGXgVFTUdGW0pADgpQTFtLGV0XChMPUwQCUF5PN1ZoDwtBFFgjfgtBdkR%2BIWAFVWBLfQMXVyUFFwBRE1wdAyICUxlaJTISBmVNNRUDEwZUHV8dCxI3UBtTFQoRB1ESaxUGEw5SHlscBxMBVR9rEjJEVxZTGl1STVFlK2sl&t=W1dCFBBFC1pXUwkEAEAdQFkJBVkTABoGXR1ETEdOWg%3D%3D" rel="noopener" target="_blank">https://item.jd.com/3739097.html</a>		
硬盘	<a href="http://union-click.jd.com/jdc?e=0&p=AyIHZRprEAERBlYfXSVGTV8LRGtMR1dGXgVFTUdGW0pADgpQTFtLHlgWAxEDUwQCUF5PNxMYJV0DR1ELeCB3RG1OA34hfFFsVwMXVyUFFwBRE1wdAyICUxlaJTISBmVNNRUDEwZUGloTAhM3UBtTFQoRB1ESaxUGEw5SHlMdBhcHVBprEjJEVxZTGl1STVFlK2sl&t=W1dCFBBFC1pXUwkEAEAdQFkJBV4WARMEUR1ETEdOWg%3D%3D" rel="noopener" target="_blank">希捷(SEAGATE)酷鱼系列 4TB 5900转 台式机机械硬盘 * 2</a> 	629 * 2	<a href="http://union-click.jd.com/jdc?e=0&p=AyIHZRprEAERBlYfXSVGTV8LRGtMR1dGXgVFTUdGW0pADgpQTFtLHlgWAxEDUwQCUF5PNxMYJV0DR1ELeCB3RG1OA34hfFFsVwMXVyUFFwBRE1wdAyICUxlaJTISBmVNNRUDEwZUGloTAhM3UBtTFQoRB1ESaxUGEw5SHlMdBhcHVBprEjJEVxZTGl1STVFlK2sl&t=W1dCFBBFC1pXUwkEAEAdQFkJBV4WARMEUR1ETEdOWg%3D%3D" rel="noopener" target="_blank">https://item.jd.com/4220257.html</a>	
电源	<a href="http://union-click.jd.com/jdc?d=ARYd0B" rel="noopener" target="_blank">美商海盗船 AX1500i 全模组电源 80Plus金牌</a>	3699	<a href="http://union-click.jd.com/jdc?d=ARYd0B" rel="noopener" target="_blank">https://item.jd.com/10783917878.html</a>
机箱	<a href="http://union-click.jd.com/jdc?d=1MlxaL" rel="noopener" target="_blank">美商海盗船 AIR540 USB3.0 </a>	949	<a href="http://union-click.jd.com/jdc?d=1MlxaL" rel="noopener" target="_blank">http://item.jd.com/12173900062.html</a>
显卡	<a href="http://union-click.jd.com/jdc?d=kNxgkP" rel="noopener" target="_blank">技嘉(GIGABYTE) GTX1080Ti 11GB 非公版高端游戏显卡深度学习涡轮</a> * 4 7400 * 4   <a href="http://union-click.jd.com/jdc?d=kNxgkP"> https://item.jd.com/10583752777.html</a>

这台深度学习主机大概是这样的:

Yj6Bzu2.jpg!web

安装完Ubuntu16.04之后,我又开始了CUDA、cuDnn等深度学习环境和工具的安装之旅,时隔大半年,又有了很多变化,特别是CUDA9.x和cuDnn7.x已经成了标配,这里记录一下。

安装CUDA9.x

依然从英伟达官方下载当前的 CUDA版本 ,我选择了最新的CUDA9.2:

77rQRjR.png!web

点选完对应Ubuntu16.04的CUDA9.2 deb版本之后,英伟达官方主页会给出安装提示:

Installation Instructions:

`sudo dpkg -i cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64.deb`

`sudo apt-key add /var/cuda-repo-

/7fa2af80.pub`

`sudo apt-get update`

`sudo apt-get install cuda`

在下载完大概1.2G的cuda deb版本之后,实际安装命令是这样的:

sudo dpkg -i cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64.deb
sudo apt-key add /var/cuda-repo-9-2-local/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda

官方CUDA下载下载页面还附带了一个cuBLAS 9.2 Patch更新,官方强烈建议安装:

This update includes fix to cublas GEMM APIs on V100 Tensor Core GPUs when used with default algorithm CUBLAS_GEMM_DEFAULT_TENSOR_OP. We strongly recommend installing this update as part of CUDA Toolkit 9.2 installation.

可以用如下方式安装这个Patch更新:

sudo dpkg -i cuda-repo-ubuntu1604-9-2-local-cublas-update-1_1.0-1_amd64.deb 
sudo apt-get update  
sudo apt-get upgrade cuda

CUDA9.2安装完毕之后,1080TI的显卡驱动也附带安装了,可以重启机器,然后用 nvidia-smi 命令查看一下:

ammi6jb.png!web

最后在在 ~/.bashrc 中设置环境变量:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda

运行 source ~/.bashrc 使其生效。

安装cuDNN7.x

同样去英伟达官网的cuDNN下载页面: https://developer.nvidia.com/rdp/cudnn-download ,最新版本是cuDNN7.1.4,有三个版本可以选择,分别面向CUDA8.0, CUDA9.0, CUDA9.2:

uAfUJfQ.png!web

下载完cuDNN7.1的压缩包之后解压,然后将相关文件拷贝到cuda的系统路径下即可:

tar -zxvf cudnn-9.2-linux-x64-v7.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ -d 
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

安装TensorFlow 1.8

TensorFlow的安装变得越来越简单,现在TensorFlow的官网也有中文安装文档了: https://www.tensorflow.org/install/install_linux?hl=zh-cn , 我们Follow这个文档,用Virtualenv的安装方式进行TensorFlow的安装,不过首先要配置一下基础环境。

首先在Ubuntu16.04里安装 libcupti-dev 库:

这是 NVIDIA CUDA 分析工具接口。此库提供高级分析支持。要安装此库,请针对 CUDA 工具包 8.0 或更高版本发出以下命令:

$ sudo apt-get install cuda-command-line-tools

并将其路径添加到您的 LD_LIBRARY_PATH 环境变量中:

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

对于 CUDA 工具包 7.5 或更低版本,请发出以下命令:

$ sudo apt-get install libcupti-dev

然而我运行“sudo apt-get install cuda-command-line-tools”命令后得到的却是:

E: 无法定位软件包 cuda-command-line-tools

Google后发现其实在安装CUDA9.2的时候,这个包已经安装了,在CUDA的路径下这个库已经有了:

/usr/local/cuda/extras/CUPTI/lib64$ ls
libcupti.so  libcupti.so.9.2  libcupti.so.9.2.88

现在只需要将其加入到环境变量中,在~/.bashrc中添加如下声明并令source ~/.bashrc另其生效即可:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

剩下的就更简单了:

sudo apt-get install python-pip python-dev python-virtualenv 
virtualenv --system-site-packages tensorflow1.8
source tensorflow1.8/bin/activate
easy_install -U pip
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade tensorflow-gpu

强烈建议将 清华的pip源 写到配置文件里,这样就更方便快捷了。

最后测试一下TensorFlow1.8:

Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2018-06-17 12:15:34.158680: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-06-17 12:15:34.381812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:05:00.0
totalMemory: 10.91GiB freeMemory: 5.53GiB
2018-06-17 12:15:34.551451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:06:00.0
totalMemory: 10.92GiB freeMemory: 5.80GiB
2018-06-17 12:15:34.780350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 2 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:09:00.0
totalMemory: 10.92GiB freeMemory: 5.80GiB
2018-06-17 12:15:34.959199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 3 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:0a:00.0
totalMemory: 10.92GiB freeMemory: 5.80GiB
2018-06-17 12:15:34.966403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1, 2, 3
2018-06-17 12:15:36.373745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-17 12:15:36.373785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 1 2 3 
2018-06-17 12:15:36.373798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N Y Y Y 
2018-06-17 12:15:36.373804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1:   Y N Y Y 
2018-06-17 12:15:36.373808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 2:   Y Y N Y 
2018-06-17 12:15:36.373814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 3:   Y Y Y N 
2018-06-17 12:15:36.374516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5307 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
2018-06-17 12:15:36.444426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 5582 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1)
2018-06-17 12:15:36.506340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 5582 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1)
2018-06-17 12:15:36.614736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 5582 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute capability: 6.1
2018-06-17 12:15:36.689345: I tensorflow/core/common_runtime/direct_session.cc:284] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute capability: 6.1

安装Keras2.1.x

Keras的后端支持TensorFlow, Theano, CNTK,在安装完TensorFlow GPU版本之后,继续安装Keras非常简单,在TensorFlow的虚拟环境中,直接"pip install keras"即可,安装的版本是Keras2.1.6:

Installing collected packages: h5py, scipy, pyyaml, keras  Successfully installed h5py-2.7.1 keras-2.1.6 pyyaml-3.12 scipy-1.1.0

测试一下:

Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import keras
Using TensorFlow backend.

注:原创文章,转载请注明出处及保留链接“我爱自然语言处理”: http://www.52nlp.cn

本文链接地址: 从零开始搭建深度学习服务器: 1080TI四卡并行(Ubuntu16.04+CUDA9.2+cuDNN7.1+TensorFlow+Keras) http://www.52nlp.cn/?p=10334


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK