22

Tuning Neural Networks

 4 years ago
source link: https://www.tuicool.com/articles/7bEFreY
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Practical considerations for setting up and getting neural networks to perform how you want them to

Nov 4 ·5min read

This guide obviously is not applicable to *all* architectures. This is a general guide for tuning neural networks to perform how you want them to. This guide assumes there are no model breaking bugs in your code and that your errors or poor performance are from things like data imbalance, model choice, overfitting, underfitting, etc. It also assumes you know a thing or two about neural networks. The intended reader can probably quickly identify the neural network architecture below… If not, thisprimer might be useful.

aIBFNfy.png!web

LeNet style architecture…

Starting Point

A good start before you begin tuning or attempting to solve problems in your data is to benchmark your model against known results. Use CIFAR-10, MNIST, Fashion MNIST, or some other common benchmark to make sure there aren’t bugs or major mistakes in construction of your model. Once you’ve got it working on a benchmark and are getting similar results to researchers, then reapply to your own data and take steps such as the ones explained below…

A quick summary of the content so you can skip ahead as necessary:

  • Solving Underfitting
  • Solving Overfitting
  • Balancing Underfitting & Overfitting

Bias-Variance Tradeoff

As we navigate the training and testing of our neural network, it’s important we understand the bias-variance tradeoff and are very purposeful about how we proceed.

Bias: high bias will cause us to miss the relevant expression between features and outputs; this is a case of underfitting ; in the case of underfitting, even our training error will be far too high for our goal error.

Variance: high variance will cause us to model noise and overfit to our training set, generalizing poorly. New data very similar to our training set will perform just fine, but new data with minor changes from our training data may not work well; our model will not generalize.

yARZzeN.jpg!web

Can our overfit model tell this is a cat? Or does it only work with cats in the upright position?

Think of overfitting like a classification problem for animals. Maybe your model correctly classifies small cats, but any time it sees a house cat with some plump it calls it a lion. Or when it sees a cat on its side like in the image above it has no clue. This is likely a result of overfitting. Our overfit image classification model also might struggle with different backgrounds, colors, skew, blurriness, etc.

Solving Underfitting

Goal: 3% error

Training error: 19%

Validation error: 25%

Test error: 27%

In this case, our training error is much higher than our desired error and quite similar to our validation error. It is very clear we are in an underfitting scenario.

What are our options? We need to reduce bias…

  • Beefier model: in this case, we increase the number of layers and neurons to get more expressive power and reduce bias
  • Model architecture: upgrade to a more state of the art model
  • Increase learning rate: but not too much!
  • Weight initialization
  • Increase batch size
  • Experiment with different optimizers

Solving Overfitting

Goal: 3% error

Training error: 12%

Validation error: 25%

Test error: 27%

In this case, we’re still underfitting a bit, but our big issue is clear overfitting , as evidenced by the huge difference between our training error and our validation error.

What are our options to solve overfitting?

More data is almost always the solution to solving overfitting

Let’s stick on this point for a second because it’s important. If at all possible, you need to acquire more quality data to solve your overfitting problem. With more training data we can increase the generalization of our model and really start to understand more cases for each class or necessary output mapping.

  • Data Augmentation is another possible solution. Data augmentation affords us with more generalization as we manipulate our dataset to effectively train on new data throughout the training process. Typically not as good of a solution as just acquiring more relevant data, but augmentation can help in many cases. Make sure you only do useful augmentation. Different forms of augmentation are applicable to different problems. If the only distinction between two classes is color you might not want to augment a specific class with different colors; this is a clear situation where augmentation is not the solution.

Here are some examples of augmentation for images:

2MVzInN.jpg!web
Original
vQVrair.jpg!web
Flip Vertically
yai6far.jpg!web
Flip Horizontally

VJRJZry.png!web

Skew
BfqmumU.jpg!web
Change Colors
AnqqM3n.png!web
Transparency
ERV3eqj.jpg!web
Blurry

And these are just some of the possible forms of augmentation. Explore the relevant augmentation for your specific problem…

  • Regularization is another solution: dropout, decay, etc
  • Normalization (batch or layer norm)
  • Reduce amount of layers and neurons

Balancing Overfitting and Underfitting

What we’ll notice is that as we solve our overfitting problem, our underfitting problem will pop back up! This is the tradeoff and we must continually refine until we hit a sweet spot relevant to our problem.

Keep tuning and keep adding data as possible! If you’re just unable to solve this problem and really need more data, try deploying your model to production and putting systems in place to check performance and at the same time add this new data to your dataset. Obviously don’t rely on an insufficient model to do some essential task, but allow your model to test its performance and gather data as your model lives in its new environment.

Keep adding good data to reduce overfitting and then return to the underfitting steps. Repeat this process until you’re in a sweet spot of low training error and a low enough difference between training, val, and testing error.

Happy building!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK