Cyclical Learning Rates — The ultimate guide for choosing learning rates for Neu... - JOYK Joy of Geek, Geek News, Link all geek

In this quick yet important post, we will discuss a new phenomenal technique for choosing learning rates described by Leslie N. Smith in his paper Cyclical Learning Rates for Training Neural Networks .

Learning Rate

It is one of the most important hyper-parameter for training neural network and is the key to effective and faster training of the network. Learning rate decides how much of the loss gradient is to be applied to our current weights to move them in the direction of lower loss.

new_weight = current_weight - learning_rate * gradient

NOTE : For rest of the article, I will use LR instead of learning rate .

Source: Jeremy Jordan’s blogpost

What is Cyclical Learning Rate?

A technique to set and change and tweak LR during training.

This methodology aims to train neural network with a LR that changes in a cyclical way for each batch, instead of a non-cyclic LR that is either constant or changes on every epoch. The learning rate schedule varies between two bounds.

When using a cyclical LR, we have to calculate two things :

1) The bounds between which the learning rate will vary — base_lr and max_lr .

2) The step_size — in how many epochs the learning rate will reach from one bound to the other.

Why it works?

We have always learnt that we should keep decreasing LR as training progresses so that we converge with time.

In CLR, we vary the LR between a lower and higher threshold. The logic is that periodic higher learning rates within each epoch helps to come out of any saddle points or l ocal minima if it encounters into one. If saddle point happens to be an elaborated plateau, lower learning rates will probably never generate enough gradient to come out of it, resulting in difficulty in minimising the loss.

Objective

Pick a learning rate and change it on each iteration(batch) to make the training process performant — which means -:

Achieve the maximum possible accuracy in order to get best prediction results.
Speed up the training process by achieving above in minimum number of epochs .

Important Terms

Epoch

One epoch is completed when an entire dataset is passed forward and backward only once through the neural network.

Batch Size

Number of training examples to utilise in one iteration.

Batch or Iteration

A training set of 1000 examples, with a batch size of 20 will take 50 iterations/batches to complete one epoch.

Cycle

Number of iterations we want for our learning rate to go from lower bound to upper bound, and then back to lower bound.

Step size

Number of iterations to complete half of a cycle.

Setting base_lr and max_lr

The loss plot will see a decrease in loss as we increase the learning rate, but will start increasing again at a point. Note the LR at which loss starts to decrease, and also the LR when it starts stagnating. These are good points to set as base_lr and max_lr.

Alternatively, you can note the LR where accuracy peaks, and use that as max_lr . Set base_lr as 1⁄3 or 1⁄4 of this.

Variations of CLR

Other than triangular profile used above, Lesley Smith also suggested some other forms of CLR.

Triangular2 :Here the max_lr is halved after every cycle.

Exponential Range :Here max_lr is reduced exponentially with each iteration.

Conclusion

Cyclical Learning Rate is an amazing technique setting and controlling learning rates for training a neural network to achieve maximum accuracy, in a very efficient way.

Cyclical Learning Rates — The ultimate guide for choosing learning rates for Neu...

Learning Rate

What is Cyclical Learning Rate?

Why it works?

Objective

Important Terms

Setting base_lr and max_lr

Variations of CLR

Conclusion

References

Other Readings

Recommend

1000x Faster Data Augmentation

Good Database Design Starts Here

Effectively boost your (mobile) website performance!

Redis 作者：开源维护者的挣扎和无奈

Quick Start Guide to SonarQube for Static Code Analysis

WEBSITE MADE USING RUBY ON RAILS

How pt-online-schema-change Handles Foreign Keys

Percona XtraDB and MongDB Operators for Kubernetes

Journal 2019.20 - Clojure 1.10.1, jira migration

Kdenlive 19.04.2 is out

About Joyk