简化版本的Ubuntu深度学习GPU环境搭建
source link: http://www.flyml.net/2018/08/10/simple-deep-learning-gpu-env-setup/?amp%3Butm_medium=referral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
前言:
之前曾经有一篇文章, 详细讲述了如何一步步手动的安装配置环境.
包括:
- 驱动程序 driver
- cuda
- cudnn
- nividia-docker
但是现在安装相比之前已经简化了非常非常多了. 现在前面3个事情, 一个 apt
命令就可以搞定. 废话不多说, 开始进入正文.
删除\卸载以前跟NVIDIA相关的东西
sudo apt pruge nvidia*
这个会卸载包括驱动以及 nvidia-docker
命令
安装显卡相关的驱动
写这一篇文章的时候, 当前大版本是390. 相应的命令如下:
sudo add-apt-repository ppa:graphics-drivers sudo apt-get update sudo apt install nvidia-390 # 未测试的命令: apt install nvidia-current
以前手动下载驱动跟CUDA, 再一步步的安装的方法, 已经完全过时啦!
安装完成之后, 运行 nvidia-smi
可以看到运行的结果:
Fri Aug 10 10:54:49 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.77 Driver Version: 390.77 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 106... Off | 00000000:01:00.0 Off | N/A | | 0% 40C P5 16W / 120W | 0MiB / 6077MiB | 2% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
安装完成之后, 最好重启一次, 确保驱动生效.
安装最新版本的Docker-CE
因为后面还需要安装命令行工具 nvidia-docker2
所以需要安装新版本的 docker-ce
之前通过APT直接安装的 docker-io
已经不能用 不兼容了.
注意, 首先要卸载历史版本:
sudo apt-get remove docker docker-engine docker.io
然后就按照官网的命令一个个输入进去好了:
官网地址: https://docs.docker.com/v17.12/install/linux/docker-ce/ubuntu/#install-docker-ce-1
在这里就不赘述了.
安装nvidia-docker2
安装这个的原因是, 因为我们需要在docker之中使用GPU 与 CUDA Toolkit
官网: https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
-
首先卸载老版本
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f sudo apt-get purge nvidia-docker
-
增加APT Repository
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \ sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update
-
安装
sudo apt-get install nvidia-docker2
-
安装完成之后看看版本
$ nvidia-docker version NVIDIA Docker: 2.0.3 Client: Version: 18.06.0-ce API version: 1.38 Go version: go1.10.3 Git commit: 0ffa825 Built: Wed Jul 18 19:11:02 2018 OS/Arch: linux/amd64 Experimental: false Server: Engine: Version: 18.06.0-ce API version: 1.38 (minimum version 1.12) Go version: go1.10.3 Git commit: 0ffa825 Built: Wed Jul 18 19:09:05 2018 OS/Arch: linux/amd64 Experimental: false
-
安装完成之后, 检查一下环境
sudo cat /etc/docker/daemon.json { "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } }
-
重启docker service 确保生效
sudo systemctl daemon-reload sudo systemctl restart docker
启动Docker镜像, 使用GPU
首先进行一个简单的测试:
# Sample 1: nvidia-docker run --rm nvidia/cuda nvidia-smi # Sample 2: docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
运行结果:
Fri Aug 10 04:50:35 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.77 Driver Version: 390.77 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 106... Off | 00000000:01:00.0 Off | N/A | | 0% 39C P0 25W / 120W | 0MiB / 6077MiB | 2% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
使用TensorFlow-GPU镜像
# 拉取Tensorflow-GPU的镜像 docker pull tensorflow/tensorflow:latest-gpu-py3 # 启动 sudo nvidia-docker run --name wenjun-tf-gpu-py3\ -it -p 8888:8888 -p 6006:6006\ -v /data/notebooks:/notebooks\ -v /data/jupyter-notebook-dataset:/dataset\ tensorflow/tensorflow:latest-gpu-py3
到docker container之中确认GPU正常工作
import tensorflow as tf # Creates a graph. a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) # Creates a session with log_device_placement set to True. sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) # Runs the op. print(sess.run(c))
这是我的运行的结果:
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0 2018-08-10 05:08:25.640901: I tensorflow/core/common_runtime/placer.cc:935] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0 a: (Const): /job:localhost/replica:0/task:0/device:GPU:0 2018-08-10 05:08:25.640934: I tensorflow/core/common_runtime/placer.cc:935] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0 b: (Const): /job:localhost/replica:0/task:0/device:GPU:0 2018-08-10 05:08:25.640950: I tensorflow/core/common_runtime/placer.cc:935] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0 [[22. 28.] [49. 64.]]
可以看到, GPU显示出来了.
本文原创, 转载需要注明出处:
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK