19

CNN模型-ResNet、MobileNet、DenseNet、ShuffleNet、EfficientNet

 3 years ago
source link: http://www.banbeichadexiaojiubei.com/index.php/2020/11/29/cnn模型-resnet、mobilenet、densenet、shufflenet、efficientnet/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

文章来源:

https://medium.com/@CinnamonAITaiwan/cnn%E6%A8%A1%E5%9E%8B-resnet-mobilenet-densenet-shufflenet-efficientnet-5eba5c8df7e4

CNN演进

下图为我们了展示了2018前常用CNN模型大小与Accuracy的比较,网络上不乏介绍CNN演进的文章[LeNet/AlexNet/Vgg/ Inception/ResNet],写的也都很好,今天我们为各位读者介绍几个最新的CNN模型,如何搭建以及他们的优势在哪里。

68ec8616-4698-4727-90cf-3add0c85b74a.pngCNN模型比较

CNN经典架构

要了解最新模型的优势,有一些架构的基本观念还是得先认识,下面就让我们来看看:Inception、残差网络、Depthwise Separable Convolution的观念。

Inception

Inception的架构最早由Google在2014年提出,其目的在于结合不同特征接收域(Receptive Field)的Kernel,而我们要怎么做到这一点呢?大家可以先看看下图:

2463dfd0-0af9-4160-a383-f2c6c16e740a-1024x624.pngInception架构

图中展示了经典的Inception架构,接在Feature Maps后一共有四条分支,其中三条先经过1*1 kernel的压缩这样做的意义主要是为了控制输出Channels的深度,并同时能增加模型的非线性;一条则是先通过3*3 kernel,而为了确保输出Feature Map在长宽上拥有一样尺寸,我们就要借用Padding技巧,1*1 kernel输出大小与输入相同,而3*3、5*5kernel则分别设定补边值为1、2,当然在tensorflow、Keras中最快的方式就是设定padding=same,就能在步长为1时确保输出尺寸维持相同。具体实现代码如下:

import tensorflow as tf
def Inception(input_data, input_depth = 192):
    with tf.name_scope('Branch_1'):
        X_1 = tf.layers.conv2d(input_data, 64, (1, 1))
        X_1 = tf.layers.batch_normalization(X_1)
        X_1 = tf.nn.leaky_relu(X_1)

    with tf.name_scope('Branch_2'):
        X_2 = tf.layers.conv2d(input_data, 96, (1, 1))
        X_2 = tf.layers.batch_normalization(X_2)
        X_2 = tf.nn.leaky_relu(X_2)

        X_2 = tf.layers.conv2d(X_2, 128, (3, 3), padding = 'same')
        X_2 = tf.layers.batch_normalization(X_2)
        X_2 = tf.nn.leaky_relu(X_2)

    with tf.name_scope('Branch_3'):
        X_3 = tf.layers.conv2d(input_data, 16, (1, 1))
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)

        X_3 = tf.layers.conv2d(X_3, 48, (3, 3), padding = 'same')
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)

        X_3 = tf.layers.conv2d(X_3, 32, (5, 5), padding = 'same')
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)

    with  tf . name_scope('Branch_4'):
        X_4 = tf.layers.max_pooling2d(input_data, 2, 1, padding = 'same')
        X_4 = tf.layers.batch_normalization(X_4)
        X_4 = tf.nn.leaky_relu(X_4)

        X_4 = tf.layers.conv2d(X_4, 32, (1, 1), padding = 'same')
        X_4 = tf.layers.batch_normalization(X_4)
        X_4 = tf.nn.leaky_relu(X_4)

    out = tf.concat((X_1, X_2, X_3, X_4), axis = 3)

    return  out

残差网路

3801f104-f169-4f1c-95a4-4f5e0f921437-1024x575.png残差结构

上图为经典的残差结构,将输入的input与经过2–3层的F(x)跨接并相加,使输出表示为y=F(x)+x,这样的好处在于反向传播时能保至少会有一个1存在,降低梯度消失(vanishing gradient)发生的可能性。

什么意思呢?举个例子来说,如果上方function中y(输出)对x偏微分,有一项是x自己对自己微分(得到1),在链式求导中每一项都保有一个1,比较不容易梯度消失,因此可以搭建更深的网路。

a82acf0e-e63c-4ba4-98c0-e8c893fa038c-1024x224.png

tensorflow实现的残差结构如下:

def Residual_Block(input_data, in_channel, out_channel, s = 1):
    X_shortcut = input_data ##记住输入
    X = tf.layers.conv2d(input_data, out_channel, (1, 1), strides = (s , s))
    X = tf.layers.batch_normalization(X)
    X = tf.nn.relu(X)

    X = tf.layers.conv2d(X, out_channel, (3, 3), padding = 'same', strides = (s, s))
    X = tf.layers.batch_normalization(X)
    X = tf.nn.relu(X)

    X = tf.layers.conv2d(X, out_channel, (1 , 1), strides = (1, 1),)
    X = tf.layers.batch_normalization(X)

    if(in_channel  !=  out_channel):
        X_shortcut = tf.layers.conv2d(X_shortcut, out_channel, (1 , 1),)
        X_shortcut = tf.layers.batch_normalization(X)

    X = X + X_shortcut
    X = tf.nn.relu(X)

    return  X

Depthwise Separable Convolution

76f2d7e8-1edc-485e-b1d6-a20d206ad666-1024x620.pngDepthwise+Pointwise

上图为Depthwise Separable Convolution的架构,有别于一般的卷积,其主要可以分为两个步骤:

第一个步骤先将输入Feature Maps与k*k 、深度与input相同的kernel卷积(Depthwise),并且每一个Feature Map与Kernel的卷积是独立的。

第二步再用1*1 、深度与输出深度相同的kernel卷积(Pointwise)。

这样的好处是可以节省大量的参数,下方我们试着算算看参数量差别:

import tensorflow as tf
#计算总参数量
def get_num_params ():
  total_parameters = 0
  for variable in tf.trainable_variables():
    shape = variable.get_shape()
    # print(shape)
    # print(len(shape))
    variable_parameters = 1
    for dim in shape:
      # print(dim)
      variable_parameters *= dim.value
    # print(variable_parameters)
    total_parameters += variable_parameters
  return total_parameters

tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
X = tf.layers.conv2d(inputs, 64, (3, 3), strides = (1, 1), activation = tf.nn.leaky_relu)
print (get_num_params()) ## (3*3*3+1)*64=1792

tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
X = tf.layers.separable_conv2d(inputs, 64, (3, 3), padding = 'SAME')
print(get_num_params()) ## 3*3*3+(1*1*3+1)*64=283

由上方程式可以看出,同样输出是300*300*64,separable convolution的参数量大概是一般Convolution的1/6,达到轻量化模型的目的。

Depthwise Separable Convolution的参考代码如下:

import tensorflow as tf
import tensorflow.contrib as tc

slim = tc.slim

tf.reset_default_graph()
##單獨定義 depthwise_conv層
## 參考 https://github.com/TropComplique/shufflenet-v2-tensorflow/blob/master/architecture.py
def depthwise_conv(
        x, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv'):

    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape[3].value
        W = tf.get_variable(
            'depthwise_weights',
            [kernel, kernel, in_channels, 1], dtype=tf.float32,
            initializer=weights_initializer
        )
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC')
        x = normalizer_fn(x) if normalizer_fn is not None else x  # batch normalization
        x = activation_fn(x) if activation_fn is not None else x  # nonlinearity
        return x
      
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
out=depthwise_conv(
        inputs, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv')
print(get_num_params()) ## 3*3*3=27  


##運用slim更簡單
def depthwise_conv_bn(x, kernel_size, stride=1, dilation=1):
    with tf.variable_scope(None, 'depthwise_conv_bn'):
        x = slim.separable_conv2d(x, None, kernel_size, depth_multiplier=1, stride=stride,
                                  rate=dilation,)
        #x = slim.batch_norm(x, activation_fn=None, fused=False)
    return x
  
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
out = depthwise_conv_bn(inputs, (3,3), stride=1, dilation=1)
print(get_num_params()) ## 3*3*3=27

CNN模型

ResNetV2

a2c1f777-509e-454d-b408-74eaf24a83bc-1024x512.png(a) ResnetV1 (e) ResnetV2以及其他变形

ResnetV2同样由Kaiming He团队提出,承袭ResnetV1的残差概念,但在Identity branch(左线)与Residual branch(右线)上做了一些更改。

拿掉Residual Block后的ReLU

作者认为,ReLU接在每个Residual block后面会导致Forward Propagation陷入单调递增,降低表达能力。

拿掉Identity branch后的BN

而如果使用图B的做法,BN层会改变Identity Branch的信息分布,造成收敛速度下降。论文中还有用到一个小技巧,先用1*1 kernel压缩深度,最后再用1*1 kernel回放深度,借此降低运算。

def  ResentV2_block(input_data, input_depth, compress_depth, output_depth, strides = (1, 1)):
    X_shortcut = input_data
    X = tf.layers.conv2d(input_data, compress_depth, (1, 1)) ##先压缩
    X = tf.layers.batch_normalization(X)
    X = tf.nn.leaky_relu(X)
    X = tf.layers.conv2d(X, compress_depth, (3, 3), padding = 'same', strides = strides)
    X = tf.layers.batch_normalization(X)
    X = tf.nn.leaky_relu(X)
    X = tf.layers.conv2d(X, output_depth, (1, 1)) ##再放大
    if(input_depth != output_depth):
        X_shortcut = tf.layers.conv2d(X_shortcut, output_depth, (1, 1), strides = strides , padding = 'same') ##深度不同
    if(input_depth == output_depth) & (strides != (1, 1)):
        X_shortcut = tf.image.resize_images(X_shortcut, (X.shape[1], X.shape[2]), method = 0) ##Size不同
    out = X_shortcut + X
    return  out

有了Residual_Block,大家就可以依照论文给的参数去重建ResnetV2模型,论文中还有一些变化,像是多种变形的Residual_Block,有兴趣的读者们可以再深入去了解。

Inception-ResNet

b4c0a16f-eba7-4cc7-bf6b-23fa69510524.pngInceptionResnet-A block

Inception-ResNet也是目前时常会用到的model,像是Inception-ResNetV2、InceptionV4等模型,我们上面有了Inception以及Residual Block的观念其实就很容易理解Inception-ResNet。模型核心就是把Residual Block中的Residual branch修改成Inception架构,文献中提出了三种不一样的组合,我们在这里实现InceptionResnet-A block。

def  InceptionResentA_block(input_data, input_depth = 3, output_depth = 384):
    X_shortcut = input_data
    with tf.name_scope('Branch_1'):
        X_1 = tf.layers.conv2d(input_data, 32, (1, 1))
        X_1 = tf.layers.batch_normalization(X_1)
        X_1 = tf.nn.leaky_relu(X_1)

    with tf.name_scope('Branch_2'):
        X_2 = tf.layers.conv2d(input_data, 32, (1, 1))
        X_2 = tf.layers.batch_normalization(X_2)
        X_2 = tf.nn.leaky_relu(X_2)

        X_2 = tf.layers.conv2d(X_2 , 32, (3, 3), padding = 'same')
        X_2 = tf.layers.batch_normalization(X_2)
        X_2 = tf.nn.leaky_relu(X_2)

    with tf.name_scope('Branch_3'):
        X_3 = tf.layers.conv2d(input_data, 32, (1, 1))
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)

        X_3 = tf.layers.conv2d(X_3 , 48, (3, 3), padding = 'same')
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)

        X_3 = tf.layers.conv2d(X_3 , 64, (3, 3), padding = 'same')
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)
    out = tf . concat ((X_1, X_2, X_3), axis = 3)

    out = tf.layers.conv2d(out, output_depth, (1, 1))

    if(input_depth  !=  output_depth):
        X_shortcut = tf.layers.conv2d(X_shortcut, output_depth, (1, 1))

    out = X_shortcut + out
    return out

DenseNet

95373012-c2b1-4914-8589-5c6b905fd65f-1024x696.pngDensenet架构

DenseNet为轻量模型的代表之一。下方代码实现Dense_Stage_Block(同时引入Depthwise separable convolution来进一步节省参数,加快模型速度,原文为一般卷积层):

def Dense_Stage(inputs_, depth=64, repeat=8):
    for _ in range(repeat):
        X_input = inputs_
        X = tf.layers.conv2d(inputs_,depth, (1,1), strides=(1,1), activation=tf.nn.leaky_relu)
        X = tf.layers.batch_normalization(X)
        X = tf.layers.separable_conv2d(X, depth, (3,3), padding='SAME')
        X = tf.nn.leaky_relu(X)
        X = tf.layers.batch_normalization(X)
        X = tf.concat([X_input,X],3)
        inputs_ = X
    return X

ShuffleNetV2

谈到轻量级模型,『ShuffleNet』应该是目前常见模型中的翘楚。轻量级模型主要有两个分支,分别为UC Berkeley and Stanford University推出的『SqueezeNet』以及Google推出的『MobileNet』,Depthwise separable convolution就是源于MobileNet,而SqueezeNet的原理与Inception非常类似在这就先不多加赘述。

ShuffleNet以SqueezeNet为基础并做了一些改变,其原理与Depthwise separable convolution有几分神似,Depthwise separable convolution是由Depthwise+Pointwise convolution组成,而之所以要运用Pointwise convolution是因为Depthwise中Feature Maps通道不流通的问题,在Depthwise Convolution中每一个Kernel都只对一张Feature Map卷积,并不能看到全局的信息。而在ShuffleNet中,Group Convolution一样有通道不流通的问题(参考下图,与Depthwise非常类似),然而不同于MobileNet使用Pointwise convolution来解决,ShuffleNet使用的方法就是『Shuffle』,直接把不同Group的Feature Map洗牌,送到下一层,这样一来又进一步节省了Pointwise convolution中的参数,达到『超轻量』级别。

1908548d-d970-4436-89e5-d27b31f39c67-1024x542.pngGroup Convolution

好,有了一些基本观念,现在让我们来看看ShuffleNetV2相较于V1做了哪些重要的改变:

1*1卷积

首先,V1使用大量的1*1卷积,会增加MAC(乘法加法操作),在Depthwise Separable Convolution中占运算及参数大宗的就是Pointwise Convolution,因此在V2中先对进入Block的Feature Maps做Split。

输出使用Concate

作者发现,Pixelwise的运算如相加与ReLU也是造成MAC上升的主因,因此V2中使用Concat取代V1的Add。

301b426d-f1a7-44e8-b66e-9fb234f2dad7-1024x586.pngShufflenetV1以及ShufflenetV2,(a) V1基本架构、(b)带有downsampling的V1架构、© V2基本架构、(d)带有downsampling的V2架构

下方代码为大家示范如何搭建一个ShuffleNetV2的Block,其中比较要注意的是Shuffle_group要能被输入Feature Map通道深度所整除。

##參考:https://github.com/timctho/shufflenet-v2-tensorflow/blob/master/module.py
##參考:https://github.com/TropComplique/shufflenet-v2-tensorflow/blob/master/architecture.py

def shuffle_unit(x, groups):  ##一般的shuffle depthwise_conv輸出的Feature Map
    with tf.variable_scope('shuffle_unit'):
        n, h, w, c = x.get_shape().as_list()
        x = tf.reshape(x, shape=([tf.shape(x)[0], h, w, groups, c // groups]))
        x = tf.transpose(x, tf.convert_to_tensor([0, 1, 2, 4, 3]))
        x = tf.reshape(x, shape=[tf.shape(x)[0], h, w, c])
    return x
def depthwise_conv(
        x, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv'):      ##一般的depthwise_conv

    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape[3].value
        W = tf.get_variable(
            'depthwise_weights',
            [kernel, kernel, in_channels, 1], dtype=tf.float32,
            initializer=weights_initializer
        )
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC')
        x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x  # batch normalization
        x = tf.nn.leaky_relu(x) if activation_fn is not None else x  # nonlinearity
        return x
    
def conv_bn_relu(x, out_channel, kernel_size, stride=1):  ##一般的Convolution+BN+Relu
    with tf.variable_scope(None, 'conv_bn_relu'):
        x = tf.layers.conv2d(x, out_channel, kernel_size, stride,)
        x = tf.nn.leaky_relu(tf.layers.batch_normalization(x))
    return x

def shufflenet_v2_block(x, out_channel, kernel_size, stride=1, shuffle_group=2): ##shufflenet_v2_block
    with tf.variable_scope(None, 'shuffle_v2_block'):
        if stride == 1:
            top, bottom = tf.split(x, num_or_size_splits=2, axis=3)

            half_channel = out_channel // 2

            top = conv_bn_relu(top, half_channel, 1)
            top = depthwise_conv_bn(top, kernel_size, stride)
            top = conv_bn_relu(top, half_channel, 1)

            out = tf.concat([top, bottom], axis=3)
            out = shuffle_unit(out, shuffle_group)

        else:   ##downsampling的Block
            half_channel = out_channel // 2
            b0 = conv_bn_relu(x, half_channel, 1)
            b0 = depthwise_conv_bn(b0, kernel_size, stride)
            b0 = conv_bn_relu(b0, half_channel, 1)

            b1 = depthwise_conv_bn(x, kernel_size, stride)
            b1 = conv_bn_relu(b1, half_channel, 1)

            out = tf.concat([b0, b1], axis=3)
            out = shuffle_unit(out, shuffle_group)
        return out

tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 4])
out = shufflenet_v2_block(inputs, 2, (3,3), stride=1, shuffle_group=2)
print(get_num_params())

EfficientNet

EfficientNet由Google于2019年提出,透过Google AutoML的技术,搭建了八种高效的模型,分别为B0-B7,而如果我们将细节拆开来看,其实Bottleneck是由MobileNetV2所提出的Inverted Residual Block加上Squeeze-and-Excitation Networks所组成,所以我们其实只要会搭建MBConv block就能重现EfficientNet的架构,下方我们先来看看MobileNetV2相较于MobileNetV1与Resnet做了哪些重要改变。

08e29b58-653b-49d0-b7f8-c6b7b7251ce7-1024x499.pngEfficientnetB0架构

先扩张再压缩

作者认为,当低通道数的Feature Map经过ReLU激活后,所有值都会大于等于零,造成大量信息的流失,因此有别于Resnet先压缩、MobileNetV1直接做Depthwise separable convolution,MobileNetV2则是先透过Pointwise卷积扩张Feature Map深度。

跨接

相较于V1、V2采用ReseNet概念,对Feature Map进行跨接。

输出改用线性激活

如上方提到的,作者认为低通道数的Feature Map不适合使用ReLU激活,因此将输出层改用线性激活,如果想要使用ReLU的话,要确保输出通道深度。

4aae6803-f816-4eca-bea8-cdbcdb28c74c.jpegMobilenetV1 、MobilenetV2、Resnet比较

跟不同阵营的shuffleNet架构比较一下,MobileNetV2推出时ShuffleNetV2还没推出,所以图中是与ShuffleNetV1比较。

13ce73a9-50ca-467b-86bd-be8caba7536b-1024x536.pngShufflenet、MobilenetV2比较

下方代码示范如何搭建MobileNetV2中的Residual_block。

def  depthwise_conv (x, kernel = 3, stride = 1, padding = 'SAME',
        activation_fn = None, normalizer_fn = None,
        weights_initializer = tf.contrib.layers.xavier_initializer(),
        data_format = 'NHWC', scope = 'depthwise_conv'):
        ##一般的depthwise_conv

    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape [ 3 ].value
        W = tf.get_variable ('depthwise_weights',
            [kernel, kernel, in_channels, 1 ], dtype = tf.float32,
            initializer = weights_initializer)
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format = 'NHWC')
        x = tf.layers.batch_normalization(x) if  normalizer_fn  is  not  None  else  x   # batch normalization
        x = tf.nn.leaky_relu(x) if  activation_fn  is  not  None  else  x   # nonlinearity
        return  x


def  res_block(input, expansion_ratio, output_dim, stride, name, bias = False, shortcut = True):
    with tf.name_scope(name), tf.variable_scope(name):
        # pw
        bottleneck_dim = round(expansion_ratio * input.get_shape().as_list()[-1])
        net = tf.layers.conv2d(input, bottleneck_dim,( 1, 1), name = 'pw',
                        kernel_regularizer = tf.contrib.layers.l2_regularizer(0.003), use_bias = bias) ##先扩张
        net = tf.layers.batch_normalization(net, name = 'pw_bn')
        net = tf.nn.relu6(net)
        # dw
        net = depthwise_conv(net)
        net = tf.layers.batch_normalization(net, name = 'dw_bn')
        net = tf.nn.relu6(net)
        # pw & linear
        net = tf.layers.conv2d(net, output_dim, (1, 1), name = 'pw_linear',
                        kernel_regularizer = tf.contrib.layers.l2_regularizer(0.003), use_bias = bias) ##压回输出深度
        net = tf.layers.batch_normalization(net, name = 'pw_linear_bn')

        # element wise add, only for stride==1
        if  shortcut  and  stride  ==  1 :
            in_dim = int(input.get_shape().as_list()[-1])
            if  in_dim  !=  output_dim :
                ins = tf.layers.conv2d(input, output_dim, (1, 1), name = 'ex_dim',
                        kernel_regularizer = tf.contrib.layers.l2_regularizer(0.003), use_bias = bias)
                net = ins + net
            else :
                net = input + net

        return  net

SENet(Squeeze-and-Excitation Networks)

06cc7987-9d05-49c5-9067-9109cec53fe2-1024x833.pngSENET_Block

有了Inverted Residual Block,我们还缺Squeeze-and-Excitation Networks,SENet的核心思想在于通过网络去学习特征权重,使得有效的特征图权重大,无效或效果小的特征图权重小的方式训练模型达到更好的结果,我认为跟Attention有几分神似。

结合Residual后的架构如上,透过Global Average Pooling获得全局信息(Squeeze),利用FC层获取语意信息,先压缩再扩张(Excitation),最后将各个Feature Map得到系数去乘回本来的Input(中间压缩层运用ReLU感觉不太适合?),具体结合Inverted residual block,建构MBConv代码如下。

import tensorflow as tf
def depthwise_conv(
        x, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv'):      ##一般的depthwise_conv

    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape[3].value
        W = tf.get_variable(
            'depthwise_weights',
            [kernel, kernel, in_channels, 1], dtype=tf.float32,
            initializer=weights_initializer
        )
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC')
        x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x  # batch normalization
        x = tf.nn.leaky_relu(x) if activation_fn is not None else x  # nonlinearity
        return x


def MBConvBlock(input, expansion_ratio, output_dim, stride, name, squeeze ,bias=False, shortcut=True, 
                use_Squeeze_Excitation=True):
    with tf.name_scope(name), tf.variable_scope(name):
        # pw
        bottleneck_dim = round(expansion_ratio*input.get_shape().as_list()[-1]) 
        net = tf.layers.conv2d(input, bottleneck_dim,(1,1), name='pw', 
                               kernel_regularizer=tf.contrib.layers.l2_regularizer(0.003),use_bias=bias) ##先擴張
        net = tf.layers.batch_normalization(net, name='pw_bn')
        net = tf.nn.relu6(net)
        # dw
        net = depthwise_conv(net, stride=stride)
        net = tf.layers.batch_normalization(net, name='dw_bn')
        net = tf.nn.relu6(net)
        # pw & linear
        net = tf.layers.conv2d(net, output_dim,(1,1), name='pw_linear', 
                               kernel_regularizer=tf.contrib.layers.l2_regularizer(0.003), use_bias=bias) ##壓回輸出深度
        net = tf.layers.batch_normalization(net, name='pw_linear_bn')
        
        # SENET-Squeeze-Excitation
        if use_Squeeze_Excitation:
            in_dim=int(net.get_shape().as_list()[-1])
            Squeeze=tf.layers.average_pooling2d(net, net.get_shape()[1:-1], 1)
            Squeeze=tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=in_dim//squeeze))
            Excitation=tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=output_dim))
            Excitation=tf.nn.sigmoid(Excitation)
            net = tf.reshape(Excitation, [-1,1,1,output_dim])*net

        
        in_dim=int(input.get_shape().as_list()[-1])
        if shortcut and stride == 1:
            if in_dim != output_dim:
                ins = tf.layers.conv2d(input, output_dim,(1,1), name='ex_dim', 
                               kernel_regularizer=tf.contrib.layers.l2_regularizer(0.003), use_bias=bias) 
                net = ins+net
            else:
                net = input+net

        return net
    
    
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 43])
out = MBConvBlock(inputs, 4, 64, 1, 'first', 4,bias=False, shortcut=True, use_Squeeze_Excitation=True)

MobileNetV3

只能说大神们发论文的速度比我们看论文的速度还要快,MobileNetV3传承MobileNetV1的Depthwise Separable Convolution、MobileNetV2的跨接与先放大再压缩观念,并加入了Squeeze-and-Excitation Networks,所以整个架构上与EfficientNet的MBConvBlock很相似,除此之外MobileNetV3在激励函数上做了一些变动:

部分Block中的ReLU使用H-swish取代,Sigmoid则使用H-sigmoid取代,H-swish是参考swish函数设计,主要是由于swish函数运算较慢,作者实验证实,使用H-swish能提高准度。

2a4d0e4c-db50-4dd3-89b9-9d837539e20a-1024x241.pngH-swish激励函数9bdf49f5-47fe-4804-a70a-35754525970c-1024x391.png激励函数之间的差异

下图为MobileNetV3与MobileNetV2的比较图,图中可以发现相同Latency下,MobileNetV3模型在Top-1 Accuracy上都较为胜出。

ec78369a-7c02-4645-9b19-0395ba3dd8a8-1024x717.pngMobilenetV3 vs MobilenetV2

下方的代码实现MobileNetV3的Bottleneck。

import tensorflow as tf
def Hswish(input_):
    return input_* tf.nn.relu6(input_ + 3.) / 6.

def Hsigmoid(input_):
    return tf.nn.relu6(input_ + 3.) / 6.

def depthwise_conv(
        x, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv'):      ##一般的depthwise_conv

    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape[3].value
        W = tf.get_variable(
            'depthwise_weights',
            [kernel, kernel, in_channels, 1], dtype=tf.float32,
            initializer=weights_initializer
        )
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC',)
        x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x  # batch normalization
        x = tf.nn.leaky_relu(x) if activation_fn is not None else x  # nonlinearity
        return x
    
def SEBlock(input_, squeeze=4):
    in_dim=int(input_.get_shape().as_list()[-1])
    Squeeze = tf.layers.average_pooling2d(input_, input_.get_shape()[1:-1], 1)
    Squeeze = tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=in_dim//squeeze)) 
    Excitation = tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=in_dim))
    Excitation = Hsigmoid(Excitation) ##Hsigmoid replace Sigmoid
    Excitation = tf.reshape(Excitation, [-1,1,1,in_dim])
    return input_*Excitation
    
    
def MobileV3Bottleneck(input_,expand_size, squeeze,out_size, kernel_size,stride=1, relu=True, se=True):
    Shortcut = input_
    in_dim = int(input_.get_shape().as_list()[-1])
    out = tf.layers.batch_normalization(tf.layers.conv2d(input_,expand_size, (1,1), (1,1), use_bias=False))
    if relu:
        out = tf.nn.relu(out) #or relu6
    else:
        out = Hswish(out)

    out = depthwise_conv(out, kernel=kernel_size, stride=stride, padding='SAME')
    out = tf.layers.batch_normalization(out)
    if relu:
        out = tf.nn.relu(out) #or relu6
    else:
        out = Hswish(out)
        
    out = tf.layers.batch_normalization(tf.layers.conv2d(out, out_size, (1,1), (1,1), use_bias=False))
    
    if (in_dim != out_size) and (stride == 1):
        Shortcut = tf.layers.conv2d(Shortcut,out_size, (1,1), strides = (stride, stride), use_bias=False)
        Shortcut = tf.layers.batch_normalization(Shortcut)
    if se:
        assert squeeze <= out_size
        out = SEBlock(out,squeeze=squeeze)

    out = out + Shortcut if stride == 1 else out
    return out

tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 80])
out = MobileBottleneck(inputs,480,4,112,3,stride=1,relu=False,se=True)

结论

今天为读者们介绍了几个最新CNN架构的核心技术,相信大家在看完后都有所收获,往后在搭建模型时,也不会局限在pretrained model,而是能依照自己的需求与想法打造最适合的Model。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK