

MKL blas golang 对比 gonum
source link: https://blog.csdn.net/oqqYuan1234567890/article/details/106628041
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

这里对比的对象主要是gonum的gonum.org/v1/gonum/floats
包,这个包做科学运算还是比较快的,里面直接调汇编代码,比go写的原生计算代码要高效一些。
首先,需要安装mkl,下载地址
然后安装gosl 这个包 github.com/cpmech/gosl
blas里面,第一层是vector与vector的操作,验证一个点乘算子
package main
import (
"fmt"
"github.com/cpmech/gosl/la/mkl"
"gonum.org/v1/gonum/floats"
"time"
)
func main() {
sizes := []int{1e5, 1e7, 1e7, 1e8, 4 * 1e8}
for _, v := range sizes {
x := make([]float64, int(v))
for i := 0; i < int(v); i++ {
x[i] = 1
}
rawDot(x, x)
for j := 1; j < 5; j++ {
mklDot(j, x, x)
}
}
}
func mklDot(threadNum int, x, y []float64) {
n, incx, incy := len(x), 1, 1
start := time.Now()
mkl.SetNumThreads(threadNum)
res := mkl.Ddot(n, x, incx, y, incy)
fmt.Printf("mkl threadNum:%v len:%v duration:%v res:%v\n", threadNum, len(x), time.Now().Sub(start), res)
}
func rawDot(x, y []float64) {
start := time.Now()
res := floats.Dot(x, y)
fmt.Printf("raw len:%v duration:%v res:%v\n", len(x), time.Now().Sub(start), res)
}
代码做的功能很简单,raw代表gonum的算子操作,与mkl的对比。由于mkl的编程思想是fork-join,可以启用多线程,所以这里验证了同样一个mkl算子,1到4个线程的差异。
raw len:100000 duration:34.479µs res:100000
mkl threadNum:1 len:100000 duration:663.039µs res:100000
mkl threadNum:2 len:100000 duration:105.323µs res:100000
mkl threadNum:3 len:100000 duration:40.387µs res:100000
mkl threadNum:4 len:100000 duration:45.641µs res:100000
raw len:1000000 duration:538.054µs res:1e+06
mkl threadNum:1 len:1000000 duration:513.524µs res:1e+06
mkl threadNum:2 len:1000000 duration:319.566µs res:1e+06
mkl threadNum:3 len:1000000 duration:253.524µs res:1e+06
mkl threadNum:4 len:1000000 duration:185.843µs res:1e+06
raw len:10000000 duration:5.57392ms res:1e+07
mkl threadNum:1 len:10000000 duration:3.649214ms res:1e+07
mkl threadNum:2 len:10000000 duration:2.811791ms res:1e+07
mkl threadNum:3 len:10000000 duration:2.545393ms res:1e+07
mkl threadNum:4 len:10000000 duration:2.385943ms res:1e+07
raw len:100000000 duration:59.743135ms res:1e+08
mkl threadNum:1 len:100000000 duration:44.084916ms res:1e+08
mkl threadNum:2 len:100000000 duration:33.835634ms res:1e+08
mkl threadNum:3 len:100000000 duration:32.16158ms res:1e+08
mkl threadNum:4 len:100000000 duration:30.032124ms res:1e+08
raw len:400000000 duration:1.610460541s res:4e+08
mkl threadNum:1 len:400000000 duration:194.624908ms res:4e+08
mkl threadNum:2 len:400000000 duration:141.571815ms res:4e+08
mkl threadNum:3 len:400000000 duration:126.421845ms res:4e+08
mkl threadNum:4 len:400000000 duration:118.767545ms res:4e+08```
日志看有点乱,变成表格就会清晰很多
可以看到:
单线程情况下:
- 计算量在1e5这个数量级,gonum具有优势,
- 在1e8这个数量级,mkl几乎是10倍的优势,而且线性趋势依旧能够保持
- gonum的计算,在1e8这个数量级,已经出现非线性的趋势
多线程情况下:
mkl在多线程上并非是线性的,1个线程与4线程差别并不大
Recommend
-
102
Write scientific code with Gonum’s numeric libraries for Go September 25, 2017
-
131
Gonum Installation The core packages of the Gonum suite are written in pure Go with some assembly. Installation is done using go get. go get -u gonum.org/v1/gonum/... Suppo...
-
51
GPU accelerated Javascript. Numerical computing in your browser with performance
-
48
-
42
Golang和Python都是目前在各自领域最流行的开发语言之一。 Golang其高效而又友好的语法,赢...
-
34
系统中有多个任务同时存在称之为“并发”,并发设计已然成为大规模集群框架的必要特征,本文简单的介绍Scala和golang的并发模型的设计,重点在于比较Scala和Golang在并发实现上的差异。 一、Scala和Golang的并发实现原理 S...
-
5
MKL.NET.Matrix 0.6.3 Performance and memory optimised matrix algebra library based on cross platform MKL.NET....
-
9
几款Golang IDE对比 rudyn · 2018-12-15 13:31:33 · 11109 次点击 · 预计阅读时间 2 分钟 · 大约8小时之前 开始浏览
-
13
V2EX › Go 编程语言 golang template 模板性能怎样呢?有相关对比的文章吗?与静态 html 文件比起来相差多少呢?
-
3
NumPy vs BLAS: Losing 90% of ThroughputMarch 12, 2024 · 8 min · 1669 words · Ash VardanianDownloaded over 5 Billion times, NumPy is the most po...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK