6

通过 Rodinia Benchmark 测试 GPU 性能并使用 GPGPU-Sim 仿真

 2 years ago
source link: https://wu-kan.cn/2022/01/29/%E9%80%9A%E8%BF%87-Rodinia-Benchmark-%E6%B5%8B%E8%AF%95-GPU-%E6%80%A7%E8%83%BD%E5%B9%B6%E4%BD%BF%E7%94%A8-GPGPU-Sim-%E4%BB%BF%E7%9C%9F/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

通过 Rodinia Benchmark 测试 GPU 性能并使用 GPGPU-Sim 仿真

29 Jan 2022 2498字 9分
CC BY 4.0 (除特别声明或转载文章外)
如果这篇博客帮助到你,可以请我喝一杯咖啡~

Rodinia Benchmark 是评价 GPU 性能的经典测试集,本文介绍基于 spack 包管理器快速运行该基准测试集的方式,同时给出与 GPGPU-Sim 模拟器交互的方式。

基于 spack 的安装方式

虽然 spack 包管理器已经提供了 Rodinia 的安装脚本,但不幸的是,该安装脚本:

  1. cuda@11: 后,编译 cfd 时找不到 helper_cuda.h 导致编译错误。
  2. 在编译部分测试项时在编译部分测试项时 libcudart 有时动态链接,有时静态链接,给测试带来不便。
  3. 在编译部分测试项时并没有接受正确的 cuda_arch 参数,这对于基于 PTX 进行仿真的模拟器来说带来干扰。
  4. 编译结果不包含数据集,需要自行生成。

我重新写了一个安装脚本,修复了上述问题,并增加选项指定 libcudart 的链接方式。通过注释里的代码可引入我的 repo 并安装 gpgpu-sim 模拟器,并一键安装 Rodinia。

# git clone https://github.com/SYSU-SCC/sysu-scc-spack-repo
# spack repo add --scope=site sysu-scc-spack-repo
# spack install gpgpu-sim%[email protected] ^ mesa~llvm ^ [email protected]

spack install rodinia%[email protected] cuda_arch=70 cudart=shared ^ mesa~llvm ^ [email protected] # 本机使用的 GPU 是 V100,因此 cuda_arch=70

此处以运行 gaussian 为例,也可 ls $(spack location -i rodinia)/bin 查看其它评测项。

$ spack load rodinia
$ gaussian -f $(spack location -i rodinia)/data/gaussian/matrix3.txt | head -n 19
WG size of kernel 1 = 512, WG size of kernel 2= 4 X 4
Read file from /GPUFS/sysu_hpcedu_302/admin/env/v5/spack/opt/spack/linux-centos7-skylake_avx512/gcc-7.5.0/rodinia-3.1-7ukl3wapv7tw3rjnymhqpan27cywuhs6/data/gaussian/matrix3.txt 
Matrix m is: 
    0.00     0.00     0.00 
    1.00     0.00     0.00 
    1.00    -0.33     0.00 

Matrix a is: 
    1.00     1.00     1.00 
    0.00    -3.00     1.00 
    0.00    -0.00    -1.67 

Array b is: 
0.00 4.00 3.33 

The final solution is: 
4.00 -2.00 -2.00 


Time total (including memory transfers) 0.537338 sec
Time for CUDA kernels:  0.000157 sec

使用 GPGPU-Sim 仿真

导入 GPGPU-Sim 的环境。

spack load rodinia cudart=shared
spack load [email protected]
spack load gpgpu-sim
cd $(spack location -i gpgpu-sim)/gpgpu-sim_distribution
source setup_environment release

可以通过指定环境变量 LD_PRELOAD,从而使用 GPGPU-Sim 版本的 libcudart.so(请确认这个地址下存在这个库)。

cp -r $(spack location -i gpgpu-sim)/gpgpu-sim_distribution/configs/tested-cfgs/SM7_QV100 ~
cd ~/SM7_QV100
LD_PRELOAD=$(spack location -i gpgpu-sim)/gpgpu-sim_distribution/lib/gcc-7.5.0/cuda-11000/release/libcudart.so gaussian -f $(spack location -i rodinia)/data/gaussian/matrix3.txt > simulate.log

截取输出的最后 32 行看一下,成功仿真,撒花~

$ cat simulate.log | tail -n 32
Reply_Network_out_buffer_full_per_cycle =       0.0000
Reply_Network_out_buffer_avg_util =       0.0000
----------------------------END-of-Interconnect-DETAILS-------------------------


gpgpu_simulation_time = 0 days, 0 hrs, 0 min, 12 sec (12 sec)
gpgpu_simulation_rate = 1176 (inst/sec)
gpgpu_simulation_rate = 1942 (cycle/sec)
gpgpu_silicon_slowdown = 582904x
GPGPU-Sim: synchronize waiting for inactive GPU simulation
GPGPU-Sim API: Stream Manager State
GPGPU-Sim: detected inactive GPU simulation thread
Matrix m is: 
    0.00     0.00     0.00 
    1.00     0.00     0.00 
    1.00    -0.33     0.00 

Matrix a is: 
    1.00     1.00     1.00 
    0.00    -3.00     1.00 
    0.00     0.00    -1.67 

Array b is: 
0.00 4.00 3.33 

The final solution is: 
4.00 -2.00 -2.00 


Time total (including memory transfers) 11.100054 sec
Time for CUDA kernels:  11.036677 sec
GPGPU-Sim: *** exit detected ***

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK