21

The Many Uses of Input Gradient Regularization

 4 years ago
source link: https://mc.ai/the-many-uses-of-input-gradient-regularization/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Steering network attention

The most natural way of restricting the input gradient is to tell the model which areas of the input are important and which to ignore. Ross et al. implemented a simple penalty preventing a neural network from looking at certain parts of the input.

Penalty term to be added to the loss. A is the mask to be penalized. It is applied to the gradient of the loss wrt. x . There are N data points of D dimensions, leading to a K-dimensional output y .

In image processing applications, noise or spurious correlations can potentially be avoided by training the network to only attend to foreground objects and having zero gradient on background elements.

If successfully applied, this regularization can potentially improve generalization (ignore spurious correlations) and learn faster (using only important parts of the input).

Adversarial robustness

Another interesting application is to use input regularization for adversarial robustness. It has been shown that an L2 regularization of the input gradient can avoid large gradients on the inputs — precisely what makes networks vulnerable to adversarial attacks.

This regularization term is similar to the previous one, but there is no mask to be applied and the cross-entropy loss term is expressed differently.

One can think of this regularizer as changing the networks perception of the input in such a way that it has to pay attention to many features a bit, rather than relying on a few features a lot. In that sense, this regularization has a similar motivation to dropout.

Interestingly, adversarial attacks created for an input gradient-regularized network can successfully be transferred to other networks trained on the same dataset. This makes this simple regularizer not only a tool for defense but potentially for creating attacks without knowing the target network.

Input sparsity

After looking at the uses of an L2 regularizer on the input gradient, it is only natural to explore the utility of L1 regularization. Luckily, this has already been done for us . And indeed, using the L1 norm shifts the network’s perception of the data in such a way that it only deems few inputs as relevant at any time.

This can be useful when feature acquisition is costly and one would prefer models that can work with few features only. Perhaps, it could also be used as an analysis tool for analyzing feature relevance, similar to conventional neural attention mechanisms.

The beauty of regularization methods is their simplicity. You can apply input gradient regularization to any network with little effort (but some computational cost).

I hope you learned something useful!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK