High-performance ahead-of-time compiler and optimizer for ML

triNNity DNN tools

The triNNity DNN toolkit (compiler, optimizer, and primitive library)

triNNity primitive library

triNNity is a header-only C++17 template library with over 80 DNN convolution algorithms. It’s a collaborative effort with several other people in our research group to collect as many DNN convolution algorithms as possible in one place, and give them clean, simple, and performant implementations. It is also a testbed for algorithm design for DNN convolution.

The library implements normal dense convolution (both direct and GEMM-based), strided convolution, dilated convolution, group convolution, sparse convolution, Winograd convolution, FFT convolution, and more, including super high performance specialized algorithms for cases like 1x1 convolution.

Many libraries and frameworks present algorithms like im2col , fft , and others, as monolithic operations, but there are in fact dozens of algorithmic variants of these approaches, all of which are better suited to some kinds of convolutions than others. Our paper inASAP 2017 details many of these algorithms.

Under the hood, the library uses BLAS, OpenMP multithreading, SIMD vectorization, and more, without any programmer intervention required. It can also run completely standalone, without any, or with only a subset, of these components enabled. We currently support x86_64 and aarch64 , but support for more platforms is planned. Since the library is released as header-only C++, all that’s really required to bring up a new platform is a working compiler supporting the C++17 standard.

We have working, well-tested integration with the Intel MKL, OpenBLAS, ARM Compute Library, FFTW, and libxsmm, among others, as back-end libraries providing specific functionality (such as optimized GEMM routines).

The library is released under the BSD3 license, and is accompanied by an extensive performance benchmark suite .

triNNity DNN compiler and optimizer

We’ve developed a sophisticated ahead-of-time optimization framework for DNNs, based on the PBQP formulation, which uses profiled layer timings from performance benchmarking to build a cost model which can statically choose from among the 70+ convolution algorithms in the primitive library to produce a provably-optimal instantiation of a full CNN.

Our compiler turns your Caffe deploy.prototxt directly into highly efficient native code, which can be run standalone to perform inference.

You can obtain the compiler and optimizer from our public BitBucket, and there is also a demonstration project with benchmarking workflows: demos .

Our paper on the DNN optimizer appeared atCGO 2018.

Performance

We’ve run some performance comparisons with Intel’s native MKL-DNN framework:

eumyUrY.png!web

triNNity DNN tools

The triNNity DNN toolkit (compiler, optimizer, and primitive library)

triNNity primitive library

triNNity DNN compiler and optimizer

Performance

Recommend

Consuming Twitter Streaming API With Spring Integration

PHP Tutorial : Beginner’s Guide to PHP

MongoDB Backup: How and When To Use PSMDB hotbackup and mongodb_consistent_backu...

智能支付稳定性测试实战

The Current State of Blockchain - Panel Discussion (Part 1)

Golang中的一些常用的简单算法

Golang学习笔记之互斥锁(Mutex)

使用Gin框架构建一个简单的注册登录后台（Go语言）

How Unix programmers at restaurants search menus for their favorite plate

BOOM Open Source RISC-V Core Runs on Amazon EC2 F1 Instances

About Joyk