10

MKL blas golang 对比 gonum

 4 years ago
source link: https://blog.csdn.net/oqqYuan1234567890/article/details/106628041
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

MKL blas golang 对比 gonum

皿小草 2020-06-08 20:55:51 233

这里对比的对象主要是gonum的gonum.org/v1/gonum/floats包,这个包做科学运算还是比较快的,里面直接调汇编代码,比go写的原生计算代码要高效一些。

首先,需要安装mkl,下载地址

然后安装gosl 这个包 github.com/cpmech/gosl

blas里面,第一层是vector与vector的操作,验证一个点乘算子

package main

import (
	"fmt"
	"github.com/cpmech/gosl/la/mkl"
	"gonum.org/v1/gonum/floats"
	"time"
)

func main() {
	sizes := []int{1e5, 1e7, 1e7, 1e8, 4 * 1e8}

	for _, v := range sizes {
		x := make([]float64, int(v))
		for i := 0; i < int(v); i++ {
			x[i] = 1
		}
		rawDot(x, x)
		for j := 1; j < 5; j++ {
			mklDot(j, x, x)
		}
	}
}

func mklDot(threadNum int, x, y []float64) {
	n, incx, incy := len(x), 1, 1
	start := time.Now()
	mkl.SetNumThreads(threadNum)
	res := mkl.Ddot(n, x, incx, y, incy)

	fmt.Printf("mkl threadNum:%v len:%v duration:%v res:%v\n", threadNum, len(x), time.Now().Sub(start), res)

}

func rawDot(x, y []float64) {
	start := time.Now()
	res := floats.Dot(x, y)
	fmt.Printf("raw len:%v duration:%v res:%v\n", len(x), time.Now().Sub(start), res)
}

代码做的功能很简单,raw代表gonum的算子操作,与mkl的对比。由于mkl的编程思想是fork-join,可以启用多线程,所以这里验证了同样一个mkl算子,1到4个线程的差异。

raw len:100000 duration:34.479µs res:100000
mkl threadNum:1 len:100000 duration:663.039µs res:100000
mkl threadNum:2 len:100000 duration:105.323µs res:100000
mkl threadNum:3 len:100000 duration:40.387µs res:100000
mkl threadNum:4 len:100000 duration:45.641µs res:100000
raw len:1000000 duration:538.054µs res:1e+06
mkl threadNum:1 len:1000000 duration:513.524µs res:1e+06
mkl threadNum:2 len:1000000 duration:319.566µs res:1e+06
mkl threadNum:3 len:1000000 duration:253.524µs res:1e+06
mkl threadNum:4 len:1000000 duration:185.843µs res:1e+06
raw len:10000000 duration:5.57392ms res:1e+07
mkl threadNum:1 len:10000000 duration:3.649214ms res:1e+07
mkl threadNum:2 len:10000000 duration:2.811791ms res:1e+07
mkl threadNum:3 len:10000000 duration:2.545393ms res:1e+07
mkl threadNum:4 len:10000000 duration:2.385943ms res:1e+07
raw len:100000000 duration:59.743135ms res:1e+08
mkl threadNum:1 len:100000000 duration:44.084916ms res:1e+08
mkl threadNum:2 len:100000000 duration:33.835634ms res:1e+08
mkl threadNum:3 len:100000000 duration:32.16158ms res:1e+08
mkl threadNum:4 len:100000000 duration:30.032124ms res:1e+08
raw len:400000000 duration:1.610460541s res:4e+08
mkl threadNum:1 len:400000000 duration:194.624908ms res:4e+08
mkl threadNum:2 len:400000000 duration:141.571815ms res:4e+08
mkl threadNum:3 len:400000000 duration:126.421845ms res:4e+08
mkl threadNum:4 len:400000000 duration:118.767545ms res:4e+08```

日志看有点乱,变成表格就会清晰很多

1e51e61e71e84 * 1e8gonumraw34us538us5.5ms59.7ms1610msmkl 1 thread663us513us3.6ms44ms194msmkl 2 thread105us319us2.8ms33.8ms141msmkl 3 thread40us253us2.5ms32.1ms126msmkl 4 thread45us185us3.8ms30ms118ms

可以看到:
单线程情况下:

  • 计算量在1e5这个数量级,gonum具有优势,
  • 在1e8这个数量级,mkl几乎是10倍的优势,而且线性趋势依旧能够保持
  • gonum的计算,在1e8这个数量级,已经出现非线性的趋势

多线程情况下:
mkl在多线程上并非是线性的,1个线程与4线程差别并不大


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK