VGGNet vs ResNet (The Vanishing Gradient Problem)

VGGNet vs ResNet

A lucid answer to the Vanishing Gradient Problem!

Dec 29 ·3min read

iu6jqai.jpg!web

“Can you explain what is the difference between VGGNet and ResNet?” is a popular interview question asked in the field of AI and Machine Learning. While the answer exists on the internet, I haven’t been able to stumble upon a to-the-point clear and concise answer. We will begin with what is VGGNet, what problem it encountered, and how the ResNet came in to solve it.

VGGNet

VGG stands for Visual Geometry Group (a group of researchers at Oxford who developed this architecture). The VGG architecture consists of blocks, where each block is composed of 2D Convolution and Max Pooling layers. VGGNet comes in two flavors, VGG16 and VGG19, where 16 and 19 are the number of layers in each of them respectively.

UB7Bn2e.png!web

Fig. 1 VGGNet architecture

In a Convolutional Neural Network (CNN), as the number of layers increase, so does the ability of the model to fit more complex functions. Therefore, more number of layers is always better (not to be confused with an artificial neural network which does not necessarily give a significantly better performance with increase in number of hidden layers). So now you can argue why not use VGG20, or VGG50 or VGG100 and so on.

Well, there is a problem.

The weights of a neural network are updated using the backpropagation algorithm. The backpropagation algorithm makes a small change to each weight in such a way that the loss of the model decreases. How does this happen? It updates each weight such that it takes a step in the direction along which the loss decreases. This direction is nothing but the gradient of this weight (with respect to the loss).

Using chain rule we can find this gradient for each weight. It is equal to (local gradient) x (gradient flowing from ahead), as shown in Fig. 2.

jUZRrqr.png!web

Fig. 2 Flow of Gradients through a Neuron

Here comes the problem. As this gradient keeps flowing backward to the initial layers, this value keeps getting multiplied by each local gradient. Hence, the gradient becomes smaller and smaller, making the updates to the initial layers very small, increasing the training time considerably.

We can solve our problem if the local gradient somehow became 1.

Voila! Enter ResNet.

ResNet

How can the local gradient be 1, i.e, the derivative of which function would always be 1? The Identity function!

JRNJBj3.png!web

Fig. 3 Mathematics behind solving the Vanishing Gradient problem

So, as this gradient is backpropagated, it does not decrease in value because the local gradient is 1.

The ResNet architecture, shown below, should now make perfect sense as to how it would not allow the vanishing gradient problem to occur. ResNet stands for Residual Network.

MFBBjmJ.png!web

Fig. 4 ResNet architecture

These skip connections act as gradient superhighways , allowing the gradient to flow unhindered. And now you can understand why ResNet comes in flavors like ResNet50, ResNet101 and ResNet152.

I hope that this article was of benefit to you.

References:

[1] CS231n Convolutional Neural Networks for Visual Recognition by Andrej Karpathy.

[2] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.

[3] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778.

[4] draw.io for diagrams.

VGGNet vs ResNet

A lucid answer to the Vanishing Gradient Problem!

VGGNet

ResNet

References:

Recommend

Standard Machine Learning Datasets for Imbalanced Classification

Blind software development at 450 words per minute

2020年不容错过的前端技术趋势 - GMTC全球大前端技术大会解读

闲鱼喊话腾讯：都9012年了还不管管陈年骗局？

CI/CD 每次 Build 前都要 npm install 一次吗? 每次安装依赖太浪费时间了

This week in KDE: holiday presents for you!

Node-based a/v composition: programs as graphs and graphs as compositional tools

vue组件库之popup弹窗组件

2019年的每一天日更只为等待她的出现，好好过余生，庆余年 | 掘金年度征文

老司机 iOS 周报 #96 | 2019-12-23

About Joyk