deepin 15.11 安装 tensorflow or pytorch GPU 版本

主机：台式机

CPU: amd 3600 GPU: GTX 2060

不同电脑（尤指双显卡的笔记本）的显卡驱动安装方式可能不太一样，但安装显卡后后续步骤应该通用。

本文仅供参考~~

安装成功后的最终版本如下，仅供参考:

nvidia 驱动:  430.50
tensorflow: 2.0
Cuda: 10.1
cuDNN: 7.6.4

参考连接:

deepin15.8+NVIDIA_390.87+cuda9.0+cudnn7.4+tensorflow-gpu_1.9配置血泪史 deepin 15.10.2 安装 Python3.6.9 deepin 15.10.2 安装 Jupyter-notebook

安装显卡驱动

此处有参考这里 Deepin 下安装 Nvidia 驱动

在 https://www.geforce.cn/drivers 找到合适的驱动并下载, 下载完好放到在主目录（NVIDIA-Linux-x86_64-430.50.run）

禁用nouveau驱动

# 先安装一个pluma编辑器，或者你可以手动进目录去编辑
sudo apt-get install pluma
sudo pluma /etc/modprobe.d/blacklist.conf
 
## 或者通过文件夹右键管理员打开，然后手动打开对应的文件(可能需要新建blacklist.conf)
## 然后在文件中写入内容如下---：
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

接下来需要把刚才更改的这个生效

sudo update-initramfs -u

重启系统，再次进入系统

安装显卡驱动

关闭用户操作界面

sudo service lightdm stop

命令行模式下输入账号密码登录后，需要进入字符命令模式

sudo init 3

给与目标nvidia驱动可执行权限--注意路径一定要正确

chmod 777 ./NVI.............run

安装显卡驱动, 这里需要注意的是，安装过程中会出现很多弹框提示，如果懂的话，按照步骤操作即可，如果不懂的话，一路选择 YES 即可

sudo ./NVI.............run

不出意外的话，这里是能够安装成功的。如果失败的话也没关系，继续开启下面的用户界面，再寻找其他教程安装显卡驱动吧。显卡驱动下面的步骤依然适用 :)

开启用户界面

sudo service lightdm start

判断显卡驱动是否安装成功

第一种：安装成功之后，系统分辨率应该是变成你显示器支持的最大分辨率的。第二种：命令行输入 nvidia-smi, 出现以下类似界面

jansora@jansora-PC:~$ nvidia-smi 
Fri Oct 18 15:37:06 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:08:00.0  On |                  N/A |
| 34%   33C    P8    21W / 165W |     91MiB /  5931MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      4279      G   /usr/lib/xorg/Xorg                            60MiB |
|    0      4733      G   kwin_x11                                      17MiB |
+-----------------------------------------------------------------------------+

安装 cuda 10.1

请确保你的显卡驱动支持 cuda10.1 (CUDA 10.1 requires 418.x or higher.)

下载 cuda 10.1

wget https://developer.nvidia.com/cuda-10.1-download-archive-base?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal install-cuda-10.1.jpg

赋予执行权限

chmod 755 cuda_10.1.243_418.87.00_linux.run

开始安装 cuda10.1

deepin 15.11 安装cuda10.1不能使用sudo 执行root权限来安装，否则会抛出跟 /var/log/nvidia/.uninstallManifests 相关的 error，这里不过多赘述该原因了，

可以通过安装到用户目录下后，再移动到/usr/local方式来绕过这个error，详情请看以下步骤

创建安装到的文件夹

cd ~
mkdir cuda-10.1

执行安装文件, 安装到 ~/cuda-10.1 目录下.

执行后会有阅读指南，按 [[q]] 跳过指南. 输入 [[accept]] 开始安装

 ./cuda_10.1.243_418.87.00_linux.run  --toolkitpath=$HOME/cuda-10.1 --defaultroot=$HOME/cuda-10.1

选择 CUDA Toolkit 10.1 即可，其他都去掉 [[X]] 号

┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Installer                                                               │
│ - [ ] Driver                                                                 │
│      [ ] 418.87.00                                                           │
│ + [X] CUDA Toolkit 10.1                                                      │
│   [ ] CUDA Samples 10.1                                                      │
│   [ ] CUDA Demo Suite 10.1                                                   │
│   [ ] CUDA Documentation 10.1                                                │
│   Options                                                                    │
│   Install                                                                    │
│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘

不出意外的话，这里是能够安装成功的

移动到 /usr/local 下

 sudo mv cuda-10.1 /usr/local/

配置软连接

 sudo ln -sv /usr/local/cuda-10.1/ /usr/local/cuda

配置Cuda环境变量

配置到 ~/.bashrc 或 /etc/profile 都可以, 建议配置到 /etc/profile sudo vim /etc/profile , 加入以下内容

export LD_LIBRARY_PATH=/usr/local/cuda/lib64/:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH

使环境变量配置生效

source /etc/profile

检测安装是否成功

nvcc -V

出现以下类似信息，即安装成功。

nvcc: NVIDIA (R) Cuda compiler driver

安装cuDNN 7.6

下载 cuDNN 7.6

需要登陆账号才能下载，选择使用QQ登陆就好了

下载地址 https://developer.nvidia.com/rdp/cudnn-download 选择 cuDNN Library for Linux 下载即可，如图所示

tar xvf cudnn-*.tgz

cd cuda
sudo cp include/* /usr/local/cuda/include/ 
sudo cp lib64/libcudnn.so.7.6.4 lib64/libcudnn_static.a /usr/local/cuda/lib64/ 
cd /usr/lib/x86_64-linux-gnu 
sudo ln -s libcudnn.so.7.6.4 libcudnn.so.7
sudo ln -s libcudnn.so.7 libcudnn.so

配置环境变量

配置到 ~/.bashrc 或 /etc/profile 都可以 sudo vim /etc/profile , 加入以下内容

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda
export PATH="$CUDA_HOME/bin:$PATH"

安装 NCCL 2.4.8

下载 NCCL 2.4.8

https://developer.nvidia.com/nccl/nccl-download

tar xvf nccl_2.4.8-1+cuda10.1_x86_64.txz
cd nccl_2.4.8-1+cuda10.1_x86_64
sudo mkdir -p /usr/local/cuda/nccl/lib /usr/local/cuda/nccl/include 
sudo cp *.txt /usr/local/cuda/nccl 
sudo cp include/*.h /usr/include/ 
sudo cp lib/libnccl.so.2.4.8 lib/libnccl_static.a /usr/lib/x86_64-linux-gnu/ 
sudo ln -s /usr/include/nccl.h /usr/local/cuda/nccl/include/nccl.h 
cd /usr/lib/x86_64-linux-gnu 
sudo ln -s libnccl.so.2.4.8 libnccl.so.2 
sudo ln -s libnccl.so.2 libnccl.so 
for i in libnccl*; do sudo ln -s /usr/lib/x86_64-linux-gnu/$i /usr/local/cuda/nccl/lib/$i; done

如果不需要手动编译 tensorflow, JDK, Babel无需安装

安装JDK8

sudo apt install openjdk-8-jdk

安装babel 0.26.1

babel 版本不能高于 0.26.1，否则会提示

Please downgrade your bazel installation to version 0.26.1 or lower to build TensorFlow! To downgrade: download the installer for the old version (from https://github.com/bazelbuild/bazel/releases) then run the installer.

下载 babel 0.26.1

https://github.com/bazelbuild/bazel/releases/download/0.26.1/bazel-0.26.1-installer-linux-x86_64.sh

安装 babel 0.26.1

bazel 安装的时候不能放在中文文件夹下

sudo chmod 755 ./bazel-0.26.1-installer-linux-x86_64.sh 
 ./bazel-0.26.1-installer-linux-x86_64.sh --user

配置环境变量

编辑脚本 sudo vim ~/.bashrc
追加以下内容：

export PATH="$PATH:$HOME/bin" #放在文件末尾

使配置生效 source ~/.bashrc

检测babel 安装成功

bazel version

出现以下内容就算成功

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".

Build label: 0.26.1 Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Thu Jun 6 11:05:05 2019 (1559819105) Build timestamp: 1559819105 Build timestamp as int: 1559819105

编译安装 tensorflow 2.0

不建议手动编译pip包, 因为国内的网络问题, download github 文件时基本会失败

下载 tensorflow 2.0

https://github.com/tensorflow/tensorflow/archive/r2.0.zip

你可能还需要安装解压 zip 文件的软件, 执行该命令安装 sudo apt install unzip

unzip  tensorflow-r2.0.zip
cd tensorflow-r2.0

configure

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.26.1 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/local/bin/python3

Found possible Python library paths:
  /usr/local/lib/python3.8/site-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.8/site-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: 
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: 
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Do you wish to build TensorFlow with TensorRT support? [y/N]: 
No TensorRT support will be enabled for TensorFlow.

Found CUDA 10.1 in:
    /usr/local/cuda/lib64
    /usr/local/cuda/include
Found cuDNN 7 in:
    /usr/local/cuda/lib64
    /usr/local/cuda/include


Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 7.5


Do you want to use clang as CUDA compiler? [y/N]: 
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 


Do you wish to build TensorFlow with MPI support? [y/N]: 
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=monolithic     # Config for mostly static monolithic build.
        --config=gdr            # Build with GDR support.
        --config=verbs          # Build with libverbs support.
        --config=ngraph         # Build with Intel nGraph support.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
        --config=v2             # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=noaws          # Disable AWS S3 filesystem support.
        --config=nogcp          # Disable GCP support.
        --config=nohdfs         # Disable HDFS support.
        --config=noignite       # Disable Apache Ignite support.
        --config=nokafka        # Disable Apache Kafka support.
        --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished

手动编译 pip 包

bazel build --config=opt --config=cuda --config=v2 //tensorflow/tools/pip_package:build_pip_package

pip 安装 tensorflow-gpu

截止本文发表日期时, tensorflow2.0 尚不支持GPU版本

pip3 instal tensorflow-gpu

pip 安装 pytorch

pip3 install torch torchvision

GPU版本tensorflow pytorch 安装完毕

deepin 15.11 安装 tensorflow or pytorch GPU 版本

deepin 15.11 安装 tensorflow or pytorch GPU 版本

安装显卡驱动

禁用nouveau驱动

安装显卡驱动

判断显卡驱动是否安装成功

安装 cuda 10.1

下载 cuda 10.1

赋予执行权限

开始安装 cuda10.1

移动到 /usr/local 下

配置软连接

配置Cuda环境变量

检测安装是否成功

安装cuDNN 7.6

下载 cuDNN 7.6

配置环境变量

安装 NCCL 2.4.8

下载 NCCL 2.4.8

如果不需要手动编译 tensorflow, JDK, Babel无需安装

安装JDK8

安装babel 0.26.1

下载 babel 0.26.1

安装 babel 0.26.1

配置环境变量

检测babel 安装成功

编译安装 tensorflow 2.0

下载 tensorflow 2.0

configure

手动编译 pip 包

pip 安装 tensorflow-gpu

pip 安装 pytorch

GPU版本tensorflow pytorch 安装完毕

Recommend

About Joyk