Tuning Neural Networks

Practical considerations for setting up and getting neural networks to perform how you want them to

Nov 4 ·5min read

This guide obviously is not applicable to *all* architectures. This is a general guide for tuning neural networks to perform how you want them to. This guide assumes there are no model breaking bugs in your code and that your errors or poor performance are from things like data imbalance, model choice, overfitting, underfitting, etc. It also assumes you know a thing or two about neural networks. The intended reader can probably quickly identify the neural network architecture below… If not, thisprimer might be useful.

aIBFNfy.png!web

LeNet style architecture…

Starting Point

A good start before you begin tuning or attempting to solve problems in your data is to benchmark your model against known results. Use CIFAR-10, MNIST, Fashion MNIST, or some other common benchmark to make sure there aren’t bugs or major mistakes in construction of your model. Once you’ve got it working on a benchmark and are getting similar results to researchers, then reapply to your own data and take steps such as the ones explained below…

A quick summary of the content so you can skip ahead as necessary:

Solving Underfitting
Solving Overfitting
Balancing Underfitting & Overfitting

Bias-Variance Tradeoff

As we navigate the training and testing of our neural network, it’s important we understand the bias-variance tradeoff and are very purposeful about how we proceed.

Bias: high bias will cause us to miss the relevant expression between features and outputs; this is a case of underfitting ; in the case of underfitting, even our training error will be far too high for our goal error.

Variance: high variance will cause us to model noise and overfit to our training set, generalizing poorly. New data very similar to our training set will perform just fine, but new data with minor changes from our training data may not work well; our model will not generalize.

yARZzeN.jpg!web

Can our overfit model tell this is a cat? Or does it only work with cats in the upright position?

Think of overfitting like a classification problem for animals. Maybe your model correctly classifies small cats, but any time it sees a house cat with some plump it calls it a lion. Or when it sees a cat on its side like in the image above it has no clue. This is likely a result of overfitting. Our overfit image classification model also might struggle with different backgrounds, colors, skew, blurriness, etc.

Solving Underfitting

Goal: 3% error

Training error: 19%

Validation error: 25%

Test error: 27%

In this case, our training error is much higher than our desired error and quite similar to our validation error. It is very clear we are in an underfitting scenario.

What are our options? We need to reduce bias…

Beefier model: in this case, we increase the number of layers and neurons to get more expressive power and reduce bias
Model architecture: upgrade to a more state of the art model
Increase learning rate: but not too much!
Weight initialization
Increase batch size
Experiment with different optimizers

Solving Overfitting

Goal: 3% error

Training error: 12%

Validation error: 25%

Test error: 27%

In this case, we’re still underfitting a bit, but our big issue is clear overfitting , as evidenced by the huge difference between our training error and our validation error.

What are our options to solve overfitting?

More data is almost always the solution to solving overfitting

Let’s stick on this point for a second because it’s important. If at all possible, you need to acquire more quality data to solve your overfitting problem. With more training data we can increase the generalization of our model and really start to understand more cases for each class or necessary output mapping.

Data Augmentation is another possible solution. Data augmentation affords us with more generalization as we manipulate our dataset to effectively train on new data throughout the training process. Typically not as good of a solution as just acquiring more relevant data, but augmentation can help in many cases. Make sure you only do useful augmentation. Different forms of augmentation are applicable to different problems. If the only distinction between two classes is color you might not want to augment a specific class with different colors; this is a clear situation where augmentation is not the solution.

Here are some examples of augmentation for images:

Original

Flip Vertically

Flip Horizontally

VJRJZry.png!web

Skew

Change Colors

Transparency

Blurry

And these are just some of the possible forms of augmentation. Explore the relevant augmentation for your specific problem…

Regularization is another solution: dropout, decay, etc
Normalization (batch or layer norm)
Reduce amount of layers and neurons

Balancing Overfitting and Underfitting

What we’ll notice is that as we solve our overfitting problem, our underfitting problem will pop back up! This is the tradeoff and we must continually refine until we hit a sweet spot relevant to our problem.

Keep tuning and keep adding data as possible! If you’re just unable to solve this problem and really need more data, try deploying your model to production and putting systems in place to check performance and at the same time add this new data to your dataset. Obviously don’t rely on an insufficient model to do some essential task, but allow your model to test its performance and gather data as your model lives in its new environment.

Keep adding good data to reduce overfitting and then return to the underfitting steps. Repeat this process until you’re in a sweet spot of low training error and a low enough difference between training, val, and testing error.

Happy building!

Starting Point

Bias-Variance Tradeoff

Solving Underfitting

Solving Overfitting

Balancing Overfitting and Underfitting

Recommend

Teach Yourself SQL — Part I

Python最被低估的库，用好了效率提升10倍！

我花10个小时，写出了小白也能看懂的阿里数据中台分析

java lambda 深入浅出

区块链到底是个什么链？五分钟彻底读懂

Deep Learning for Natural Language Processing Using word2vec-keras

饿了么监控系统 EMonitor 与美团点评 CAT 的对比

Predicting the Future of the Web - Richard Feldman at ReactiveConf 2019

Apollo Client and Local State Management

Golang实现拓扑排序-DFS算法版

About Joyk