20

Understanding Deep Learning Requires Rethinking Generalization

 3 years ago
source link: https://mc.ai/understanding-deep-learning-requires-rethinking-generalization/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The results look quite interesting as the model can perfectly fit the noisy Gaussian samples. It also perfectly fits the training data with completely random labels although it takes a bit more time. This shows that a deep neural network with enough parameters could completely memorize some random inputs. This result is quite counter-intuitive as it is a widely accepted theory that Deep Learning usually discovers lower level features, middle-level features, and higher-level features and if a model can memorize any random inputs then what’s the guarantee that the model will try to learn some constructive features instead of simply memorizing the input data.

Results of Regularization Tests

The first diagram shows the effect of different explicit regularization on the training and testing accuracy. Here, the key takeaway is that there is not a very significant difference in the generalization performance between using regularization and not using regularization.

The second diagram shows the effect of batch normalization(implicit regularization) on the training and testing accuracy. We can see that the training with batch normalization is quite smooth but it doesn’t improve the test accuracy.

From the experiments, the authors concluded that both explicit and implicit regularizers could help to improve generalization performance. However, it is unlikely that regularizers are the fundamental reason for generalization .

FINITE-SAMPLE EXPRESSIVITY

The authors also proved the following theorem:

There exists a two-layer neural network with ReLU activations and 2n+d weights that can represent any function on a sample of size n in d dimensions.

which is basically an extension of the Universal Approximation Theorem . The prove is quite heavy if interested refer to Section C in the appendix of the paper .

IMPLICIT REGULARIZATION: AN APPEAL TO LINEAR MODELS

In the final section, the authors show that SGD-based learning imparts a regularization effect as the SGD converges to the solution with minimum L2 norm. Their experiments also show that a minimum norm doesn’t ensure a better generalization performance.

Final Conclusion

  • The effective capacity of several successful neural network architectures is large enough to shatter the training data.
  • Traditional measures of model complexity is not sufficient for a deep neural network.
  • Optimization continues to be easy even when generalization is poor.
  • SGD may be performing implicit regularization by converging to solutions with minimum L2 -norm.

A subsequent paper “ A Closer Look at Memorization in Deep Networks ” has challenged some of the views pointed out in this paper. They have convincingly demonstrating qualitative differences in learning random noise vs. learning actual data.

The above experiment shows a deep neural net attempting to memorize random noise takes significantly longer to learn relative to the actual dataset. It also shows fitting some random noise results in a more complex function(more number of hidden units per layer).

This experiment shows that regularizers do control the speed at which DNNs memorize.

Thus to conclude, a deep neural network first tries to discover patterns, not brute force memorization, to fit real data. However, if it doesn’t find any patterns(like in case of random noise), the network is capable of simply optimizing in a way that just memorizes the training data. As both, the paper suggests we need to find some better tools to control the degree of generalization and memorization, and tools like regularization, batch normalization, and dropout are not perfect.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK