11

Creating DenseNet 121 with TensorFlow

 3 years ago
source link: https://mc.ai/creating-densenet-121-with-tensorflow-2/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Figure 1: Various blocks and layers in DenseNet (Source: Original DenseNet paper)

DenseNet paper link: https://arxiv.org/pdf/1608.06993.pdf

DenseNet (Dense Convolutional Network) is an architecture that focuses on making the deep learning networks go even deeper, but at the same time making them more efficient to train, by using shorter connections between the layers. DenseNet is a convolutional neural network where each layer is connected to all other layers that are deeper in the network, that is, the first layer is connected to the 2nd, 3rd, 4th and so on, the second layer is connected to the 3rd, 4th, 5th and so on. This is done to enable maximum information flow between the layers of the network. To preserve the feed-forward nature, each layer obtains inputs from all the previous layers and passes on its own feature maps to all the layers which will come after it. Unlike Resnets it does not combine features through summation but combines the features by concatenating them. So the ‘ith’ layer has ‘i’ inputs and consists of feature maps of all its preceding convolutional blocks. Its own feature maps are passed on to all the next ‘I-i’ layers. This introduces ‘(I(I+1))/2’ connections in the network, rather than just ‘I’ connections as in traditional deep learning architectures. It hence requires fewer parameters than traditional convolutional neural networks, as there is no need to learn unimportant feature maps.

DenseNet consists of two important blocks other than the basic convolutional and pooling layers. they are the Dense Blocks and the Transition layers.

Next, we look at how all these blocks and layers look, and how to implement them in python.

Figure 2: The DenseNet121 framework (Source: Original DenseNet paper, edited by author)

DenseNet starts with a basic convolution and pooling layer. Then there is a dense block followed by a transition layer, another dense block followed by a transition layer, another dense block followed by a transition layer, and finally a dense block followed by a classification layer.

The first convolution block has 64 filters of size 7×7 and a stride of 2. It is followed by a MaxPooling layer with 3×3 max pooling and a stride of 2. These two lines can be represented with the following code in python.

input = Input (input_shape)
x = Conv2D(64, 7, strides = 2, padding = 'same')(input)
x = MaxPool2D(3, strides = 2, padding = 'same')(x)

Defining the convolutional block— Each convolutional block after the input has the following sequence: BatchNormalization, followed by ReLU activation and then the actual Conv2D layer. To implement that, we can write the following function.

#batch norm + relu + conv
def bn_rl_conv(x,filters,kernel=1,strides=1):

x = BatchNormalization()(x)
x = ReLU()(x)
x = Conv2D(filters, kernel, strides=strides,padding = 'same')(x)
return x
Figure 3. Dense blocks (Source: DenseNet paper- edited by author)

Defining the Dense block— As seen in figure 3, Every dense block has two convolutions, with 1×1 and 3×3 sized kernels. In dense block 1, this is repeated 6 times, in dense block 2 it is repeated 12 times, in dense block 3, 24 times and finally in dense block 4, 16 times.

In dense block, each of the 1×1 convolutions has 4 times the number of filters. So we use 4*filters, but 3×3 filters are only present once. Also, we have to concatenate the input with the output tensor.

Each block is run for the 6,12,24,16 repetitions respectively, using the ‘for loop’.

def dense_block(x, repetition):

for _ in range(repetition):
y = bn_rl_conv(x, 4*filters)
y = bn_rl_conv(y, filters, 3)
x = concatenate([y,x])
return x
Figure 4: Transition layers (Source: DenseNet paper-edited by author)

Defining the transition layer— In the transition layer, we are to reduce the number of channels to half of the existing channels. There are a 1×1 convolutional layer and a 2×2 average pooling layer with a stride of 2. kernel size of 1×1 is already set in the function, bn_rl_conv, so we do not explicitly need to define it again.

In the transition layers, we have to remove channels to half of the existing channels. We have the input tensor x, and we want to find how many channels there are, and we need to get half of them. So we can use Keras backend (K) to take the tensor x and return a tuple with the dimension of x. And, we only require the last number of that shape, that is, the number of the filters. So we add [-1]. Finally, we can just divide this number of filters by 2 to get the desired result.

def transition_layer(x):

x = bn_rl_conv(x, K.int_shape(x)[-1] //2 )
x = AvgPool2D(2, strides = 2, padding = 'same')(x)
return x

So we are done with defining the dense blocks and transition layers. Now we need to stack the dense blocks and transition layers together. So we write a for loop to run through the 6,12,24,16 repetitions. So the loop runs 4 times, each time using one of the values from 6,12,24 or 16. This completes the 4 dense blocks and transition layers.

for repetition in [6,12,24,16]:

d = dense_block(x, repetition)
x = transition_layer(d)

In the end, there is GlobalAveragePooling, followed by the final output layer. As we see in the above code block, the dense block is defined by ‘d’, and in the final layer, after Dense block 4, there is no transition layer 4, but it directly goes into the classification layer. So, ‘d’ is the connection on which GlobalAveragePooling is applied, and not on ‘x’. Another alternative is to remove the ‘for’ loop from the code above and stack the layers one after the other without the final transition layer.

x = GlobalAveragePooling2D()(d)
output = Dense(n_classes, activation = 'softmax')(x)

Now that we have all the blocks together, let’s merge them to see the entire DenseNet architecture.

Complete DenseNet 121 architecture:

import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Dense
from tensorflow.keras.layers import AvgPool2D, GlobalAveragePooling2D, MaxPool2D
from tensorflow.keras.models import Model
from tensorflow.keras.layers import ReLU, concatenate
import tensorflow.keras.backend as K
# Creating Densenet121def densenet(input_shape, n_classes, filters = 32):

#batch norm + relu + conv
def bn_rl_conv(x,filters,kernel=1,strides=1):

x = BatchNormalization()(x)
x = ReLU()(x)
x = Conv2D(filters, kernel, strides=strides,padding = 'same')(x)
return x

def dense_block(x, repetition):

for _ in range(repetition):
y = bn_rl_conv(x, 4*filters)
y = bn_rl_conv(y, filters, 3)
x = concatenate([y,x])
return x

def transition_layer(x):

x = bn_rl_conv(x, K.int_shape(x)[-1] //2 )
x = AvgPool2D(2, strides = 2, padding = 'same')(x)
return x

input = Input (input_shape)
x = Conv2D(64, 7, strides = 2, padding = 'same')(input)
x = MaxPool2D(3, strides = 2, padding = 'same')(x)

for repetition in [6,12,24,16]:

d = dense_block(x, repetition)
x = transition_layer(d)
x = GlobalAveragePooling2D()(d)
output = Dense(n_classes, activation = 'softmax')(x)

model = Model(input, output)
return model
input_shape = 224, 224, 3
n_classes = 3
model = densenet(input_shape,n_classes)
model.summary()

Output: (Assuming 3 final classes — last few lines of the model summary)

To view the architecture diagram , the following code can be used.

from tensorflow.python.keras.utils.vis_utils import model_to_dot
from IPython.display import SVG
import pydot
import graphviz

SVG(model_to_dot(
model, show_shapes=True, show_layer_names=True, rankdir='TB',
expand_nested=False, dpi=60, subgraph=False
).create(prog='dot',format='svg'))

Output — first few blocks of the diagram

And that’s how we can implement the DenseNet 121 architecture.

References:

  1. 1. Gao Huang and Zhuang Liu and Laurens van der Maaten and Kilian Q. Weinberger, Densely Connected Convolutional Networks, arXiv 1608.06993 (2016)

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK