Speed-up inference with Batch Normalization Folding

Introduction

Apr 20 ·4min read

Batch Normalization is a technique which takes care of normalizing the input of each layer to make the training process faster and more stable. In practice, it is an extra layer that we generally add after the computation layer and before the non-linearity.

It consists of 2 steps:

Normalize the batch by first subtracting its mean μ, then dividing it by its standard deviation σ.
Further scale by a factor γ and shift by a factor β. Those are the parameters of the batch normalization layer, required in case of the network not needing the data to have a mean of 0 and a standard deviation of 1.

naInEzy.png!web

Due to its efficiency for training neural networks, batch normalization is now widely used. But how useful is it at inference time?

Once the training has ended, each batch normalization layer possesses a specific set of γ and β, but also μ and σ, the latter being computed using an exponentially weighted average during training. It means that during inference, the batch normalization acts as a simple linear transformation of what comes out of the previous layer, often a convolution.

As a convolution is also a linear transformation, it also means that both operations can be merged into a single linear transformation!

This would remove some unnecessary parameters but also reduce the number of operations to be performed at inference time.

Introduction

Recommend

Comprehending Principal Component Analysis

How to Find & Change Weak Reused Passwords to Stronger Ones More Easily in i...

GitHub - pemistahl/grex: A command-line tool and library for generating regular...

GitHub - CodeExplainedRepo/Carousel-JavaScript: Carousel in JavaScript

沈义人称将休息调养一段时间 CEO陈明永：早日归队

Uninitialized Memory Disclosures in Web Applications

分布式存储名词解析 – 一致性

Ask YC: We applied to S2020 and nobody looked at our videos?

关于 Paxos 论文中的迷惑之处

The Intensional Spreadsheet

About Joyk