CNN模型-ResNet、MobileNet、DenseNet、ShuffleNet、EfficientNet

文章来源:

https://medium.com/@CinnamonAITaiwan/cnn%E6%A8%A1%E5%9E%8B-resnet-mobilenet-densenet-shufflenet-efficientnet-5eba5c8df7e4

CNN演进

下图为我们了展示了2018前常用CNN模型大小与Accuracy的比较，网络上不乏介绍CNN演进的文章[LeNet/AlexNet/Vgg/ Inception/ResNet]，写的也都很好，今天我们为各位读者介绍几个最新的CNN模型，如何搭建以及他们的优势在哪里。

CNN模型比较

CNN经典架构

要了解最新模型的优势，有一些架构的基本观念还是得先认识，下面就让我们来看看：Inception、残差网络、Depthwise Separable Convolution的观念。

Inception

Inception的架构最早由Google在2014年提出，其目的在于结合不同特征接收域(Receptive Field)的Kernel，而我们要怎么做到这一点呢？大家可以先看看下图：

Inception架构

图中展示了经典的Inception架构，接在Feature Maps后一共有四条分支，其中三条先经过1*1 kernel的压缩这样做的意义主要是为了控制输出Channels的深度，并同时能增加模型的非线性；一条则是先通过3*3 kernel，而为了确保输出Feature Map在长宽上拥有一样尺寸，我们就要借用Padding技巧，1*1 kernel输出大小与输入相同，而3*3、5*5kernel则分别设定补边值为1、2，当然在tensorflow、Keras中最快的方式就是设定padding=same，就能在步长为1时确保输出尺寸维持相同。具体实现代码如下：

import tensorflow as tf
def Inception(input_data, input_depth = 192):
    with tf.name_scope('Branch_1'):
        X_1 = tf.layers.conv2d(input_data, 64, (1, 1))
        X_1 = tf.layers.batch_normalization(X_1)
        X_1 = tf.nn.leaky_relu(X_1)

    with tf.name_scope('Branch_2'):
        X_2 = tf.layers.conv2d(input_data, 96, (1, 1))
        X_2 = tf.layers.batch_normalization(X_2)
        X_2 = tf.nn.leaky_relu(X_2)

        X_2 = tf.layers.conv2d(X_2, 128, (3, 3), padding = 'same')
        X_2 = tf.layers.batch_normalization(X_2)
        X_2 = tf.nn.leaky_relu(X_2)

    with tf.name_scope('Branch_3'):
        X_3 = tf.layers.conv2d(input_data, 16, (1, 1))
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)

        X_3 = tf.layers.conv2d(X_3, 48, (3, 3), padding = 'same')
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)

        X_3 = tf.layers.conv2d(X_3, 32, (5, 5), padding = 'same')
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)

    with  tf . name_scope('Branch_4'):
        X_4 = tf.layers.max_pooling2d(input_data, 2, 1, padding = 'same')
        X_4 = tf.layers.batch_normalization(X_4)
        X_4 = tf.nn.leaky_relu(X_4)

        X_4 = tf.layers.conv2d(X_4, 32, (1, 1), padding = 'same')
        X_4 = tf.layers.batch_normalization(X_4)
        X_4 = tf.nn.leaky_relu(X_4)

    out = tf.concat((X_1, X_2, X_3, X_4), axis = 3)

    return  out

残差网路

残差结构

上图为经典的残差结构，将输入的input与经过2–3层的F(x)跨接并相加，使输出表示为y=F(x)+x，这样的好处在于反向传播时能保至少会有一个1存在，降低梯度消失(vanishing gradient)发生的可能性。

什么意思呢？举个例子来说，如果上方function中y(输出)对x偏微分，有一项是x自己对自己微分(得到1)，在链式求导中每一项都保有一个1，比较不容易梯度消失，因此可以搭建更深的网路。

tensorflow实现的残差结构如下：

def Residual_Block(input_data, in_channel, out_channel, s = 1):
    X_shortcut = input_data ##记住输入
    X = tf.layers.conv2d(input_data, out_channel, (1, 1), strides = (s , s))
    X = tf.layers.batch_normalization(X)
    X = tf.nn.relu(X)

    X = tf.layers.conv2d(X, out_channel, (3, 3), padding = 'same', strides = (s, s))
    X = tf.layers.batch_normalization(X)
    X = tf.nn.relu(X)

    X = tf.layers.conv2d(X, out_channel, (1 , 1), strides = (1, 1),)
    X = tf.layers.batch_normalization(X)

    if(in_channel  !=  out_channel):
        X_shortcut = tf.layers.conv2d(X_shortcut, out_channel, (1 , 1),)
        X_shortcut = tf.layers.batch_normalization(X)

    X = X + X_shortcut
    X = tf.nn.relu(X)

    return  X

Depthwise Separable Convolution

Depthwise+Pointwise

上图为Depthwise Separable Convolution的架构，有别于一般的卷积，其主要可以分为两个步骤：

第一个步骤先将输入Feature Maps与k*k 、深度与input相同的kernel卷积(Depthwise)，并且每一个Feature Map与Kernel的卷积是独立的。

第二步再用1*1 、深度与输出深度相同的kernel卷积(Pointwise)。

这样的好处是可以节省大量的参数，下方我们试着算算看参数量差别：

import tensorflow as tf
#计算总参数量
def get_num_params ():
  total_parameters = 0
  for variable in tf.trainable_variables():
    shape = variable.get_shape()
    # print(shape)
    # print(len(shape))
    variable_parameters = 1
    for dim in shape:
      # print(dim)
      variable_parameters *= dim.value
    # print(variable_parameters)
    total_parameters += variable_parameters
  return total_parameters

tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
X = tf.layers.conv2d(inputs, 64, (3, 3), strides = (1, 1), activation = tf.nn.leaky_relu)
print (get_num_params()) ## (3*3*3+1)*64=1792

tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
X = tf.layers.separable_conv2d(inputs, 64, (3, 3), padding = 'SAME')
print(get_num_params()) ## 3*3*3+(1*1*3+1)*64=283

由上方程式可以看出，同样输出是300*300*64，separable convolution的参数量大概是一般Convolution的1/6，达到轻量化模型的目的。

Depthwise Separable Convolution的参考代码如下：

import tensorflow as tf
import tensorflow.contrib as tc

slim = tc.slim

tf.reset_default_graph()
##單獨定義 depthwise_conv層
## 參考 https://github.com/TropComplique/shufflenet-v2-tensorflow/blob/master/architecture.py
def depthwise_conv(
        x, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv'):

    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape[3].value
        W = tf.get_variable(
            'depthwise_weights',
            [kernel, kernel, in_channels, 1], dtype=tf.float32,
            initializer=weights_initializer
        )
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC')
        x = normalizer_fn(x) if normalizer_fn is not None else x  # batch normalization
        x = activation_fn(x) if activation_fn is not None else x  # nonlinearity
        return x
      
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
out=depthwise_conv(
        inputs, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv')
print(get_num_params()) ## 3*3*3=27  


##運用slim更簡單
def depthwise_conv_bn(x, kernel_size, stride=1, dilation=1):
    with tf.variable_scope(None, 'depthwise_conv_bn'):
        x = slim.separable_conv2d(x, None, kernel_size, depth_multiplier=1, stride=stride,
                                  rate=dilation,)
        #x = slim.batch_norm(x, activation_fn=None, fused=False)
    return x
  
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 3])
out = depthwise_conv_bn(inputs, (3,3), stride=1, dilation=1)
print(get_num_params()) ## 3*3*3=27

CNN模型

ResNetV2

(a) ResnetV1 (e) ResnetV2以及其他变形

ResnetV2同样由Kaiming He团队提出，承袭ResnetV1的残差概念，但在Identity branch(左线)与Residual branch(右线)上做了一些更改。

拿掉Residual Block后的ReLU

作者认为，ReLU接在每个Residual block后面会导致Forward Propagation陷入单调递增，降低表达能力。

拿掉Identity branch后的BN

而如果使用图B的做法，BN层会改变Identity Branch的信息分布，造成收敛速度下降。论文中还有用到一个小技巧，先用1*1 kernel压缩深度，最后再用1*1 kernel回放深度，借此降低运算。

def  ResentV2_block(input_data, input_depth, compress_depth, output_depth, strides = (1, 1)):
    X_shortcut = input_data
    X = tf.layers.conv2d(input_data, compress_depth, (1, 1)) ##先压缩
    X = tf.layers.batch_normalization(X)
    X = tf.nn.leaky_relu(X)
    X = tf.layers.conv2d(X, compress_depth, (3, 3), padding = 'same', strides = strides)
    X = tf.layers.batch_normalization(X)
    X = tf.nn.leaky_relu(X)
    X = tf.layers.conv2d(X, output_depth, (1, 1)) ##再放大
    if(input_depth != output_depth):
        X_shortcut = tf.layers.conv2d(X_shortcut, output_depth, (1, 1), strides = strides , padding = 'same') ##深度不同
    if(input_depth == output_depth) & (strides != (1, 1)):
        X_shortcut = tf.image.resize_images(X_shortcut, (X.shape[1], X.shape[2]), method = 0) ##Size不同
    out = X_shortcut + X
    return  out

有了Residual_Block，大家就可以依照论文给的参数去重建ResnetV2模型，论文中还有一些变化，像是多种变形的Residual_Block，有兴趣的读者们可以再深入去了解。

Inception-ResNet

InceptionResnet-A block

Inception-ResNet也是目前时常会用到的model，像是Inception-ResNetV2、InceptionV4等模型，我们上面有了Inception以及Residual Block的观念其实就很容易理解Inception-ResNet。模型核心就是把Residual Block中的Residual branch修改成Inception架构，文献中提出了三种不一样的组合，我们在这里实现InceptionResnet-A block。

def  InceptionResentA_block(input_data, input_depth = 3, output_depth = 384):
    X_shortcut = input_data
    with tf.name_scope('Branch_1'):
        X_1 = tf.layers.conv2d(input_data, 32, (1, 1))
        X_1 = tf.layers.batch_normalization(X_1)
        X_1 = tf.nn.leaky_relu(X_1)

    with tf.name_scope('Branch_2'):
        X_2 = tf.layers.conv2d(input_data, 32, (1, 1))
        X_2 = tf.layers.batch_normalization(X_2)
        X_2 = tf.nn.leaky_relu(X_2)

        X_2 = tf.layers.conv2d(X_2 , 32, (3, 3), padding = 'same')
        X_2 = tf.layers.batch_normalization(X_2)
        X_2 = tf.nn.leaky_relu(X_2)

    with tf.name_scope('Branch_3'):
        X_3 = tf.layers.conv2d(input_data, 32, (1, 1))
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)

        X_3 = tf.layers.conv2d(X_3 , 48, (3, 3), padding = 'same')
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)

        X_3 = tf.layers.conv2d(X_3 , 64, (3, 3), padding = 'same')
        X_3 = tf.layers.batch_normalization(X_3)
        X_3 = tf.nn.leaky_relu(X_3)
    out = tf . concat ((X_1, X_2, X_3), axis = 3)

    out = tf.layers.conv2d(out, output_depth, (1, 1))

    if(input_depth  !=  output_depth):
        X_shortcut = tf.layers.conv2d(X_shortcut, output_depth, (1, 1))

    out = X_shortcut + out
    return out

DenseNet

Densenet架构

DenseNet为轻量模型的代表之一。下方代码实现Dense_Stage_Block(同时引入Depthwise separable convolution来进一步节省参数，加快模型速度，原文为一般卷积层)：

def Dense_Stage(inputs_, depth=64, repeat=8):
    for _ in range(repeat):
        X_input = inputs_
        X = tf.layers.conv2d(inputs_,depth, (1,1), strides=(1,1), activation=tf.nn.leaky_relu)
        X = tf.layers.batch_normalization(X)
        X = tf.layers.separable_conv2d(X, depth, (3,3), padding='SAME')
        X = tf.nn.leaky_relu(X)
        X = tf.layers.batch_normalization(X)
        X = tf.concat([X_input,X],3)
        inputs_ = X
    return X

ShuffleNetV2

谈到轻量级模型，『ShuffleNet』应该是目前常见模型中的翘楚。轻量级模型主要有两个分支，分别为UC Berkeley and Stanford University推出的『SqueezeNet』以及Google推出的『MobileNet』，Depthwise separable convolution就是源于MobileNet，而SqueezeNet的原理与Inception非常类似在这就先不多加赘述。

ShuffleNet以SqueezeNet为基础并做了一些改变，其原理与Depthwise separable convolution有几分神似，Depthwise separable convolution是由Depthwise＋Pointwise convolution组成，而之所以要运用Pointwise convolution是因为Depthwise中Feature Maps通道不流通的问题，在Depthwise Convolution中每一个Kernel都只对一张Feature Map卷积，并不能看到全局的信息。而在ShuffleNet中，Group Convolution一样有通道不流通的问题(参考下图，与Depthwise非常类似)，然而不同于MobileNet使用Pointwise convolution来解决，ShuffleNet使用的方法就是『Shuffle』，直接把不同Group的Feature Map洗牌，送到下一层，这样一来又进一步节省了Pointwise convolution中的参数，达到『超轻量』级别。

Group Convolution

好，有了一些基本观念，现在让我们来看看ShuffleNetV2相较于V1做了哪些重要的改变:

1*1卷积

首先，V1使用大量的1*1卷积，会增加MAC(乘法加法操作)，在Depthwise Separable Convolution中占运算及参数大宗的就是Pointwise Convolution，因此在V2中先对进入Block的Feature Maps做Split。

输出使用Concate

作者发现，Pixelwise的运算如相加与ReLU也是造成MAC上升的主因，因此V2中使用Concat取代V1的Add。

ShufflenetV1以及ShufflenetV2，(a) V1基本架构、(b)带有downsampling的V1架构、© V2基本架构、(d)带有downsampling的V2架构

下方代码为大家示范如何搭建一个ShuffleNetV2的Block，其中比较要注意的是Shuffle_group要能被输入Feature Map通道深度所整除。

##參考：https://github.com/timctho/shufflenet-v2-tensorflow/blob/master/module.py
##參考：https://github.com/TropComplique/shufflenet-v2-tensorflow/blob/master/architecture.py

def shuffle_unit(x, groups):  ##一般的shuffle depthwise_conv輸出的Feature Map
    with tf.variable_scope('shuffle_unit'):
        n, h, w, c = x.get_shape().as_list()
        x = tf.reshape(x, shape=([tf.shape(x)[0], h, w, groups, c // groups]))
        x = tf.transpose(x, tf.convert_to_tensor([0, 1, 2, 4, 3]))
        x = tf.reshape(x, shape=[tf.shape(x)[0], h, w, c])
    return x
def depthwise_conv(
        x, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv'):      ##一般的depthwise_conv

    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape[3].value
        W = tf.get_variable(
            'depthwise_weights',
            [kernel, kernel, in_channels, 1], dtype=tf.float32,
            initializer=weights_initializer
        )
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC')
        x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x  # batch normalization
        x = tf.nn.leaky_relu(x) if activation_fn is not None else x  # nonlinearity
        return x
    
def conv_bn_relu(x, out_channel, kernel_size, stride=1):  ##一般的Convolution+BN+Relu
    with tf.variable_scope(None, 'conv_bn_relu'):
        x = tf.layers.conv2d(x, out_channel, kernel_size, stride,)
        x = tf.nn.leaky_relu(tf.layers.batch_normalization(x))
    return x

def shufflenet_v2_block(x, out_channel, kernel_size, stride=1, shuffle_group=2): ##shufflenet_v2_block
    with tf.variable_scope(None, 'shuffle_v2_block'):
        if stride == 1:
            top, bottom = tf.split(x, num_or_size_splits=2, axis=3)

            half_channel = out_channel // 2

            top = conv_bn_relu(top, half_channel, 1)
            top = depthwise_conv_bn(top, kernel_size, stride)
            top = conv_bn_relu(top, half_channel, 1)

            out = tf.concat([top, bottom], axis=3)
            out = shuffle_unit(out, shuffle_group)

        else:   ##downsampling的Block
            half_channel = out_channel // 2
            b0 = conv_bn_relu(x, half_channel, 1)
            b0 = depthwise_conv_bn(b0, kernel_size, stride)
            b0 = conv_bn_relu(b0, half_channel, 1)

            b1 = depthwise_conv_bn(x, kernel_size, stride)
            b1 = conv_bn_relu(b1, half_channel, 1)

            out = tf.concat([b0, b1], axis=3)
            out = shuffle_unit(out, shuffle_group)
        return out

tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 4])
out = shufflenet_v2_block(inputs, 2, (3,3), stride=1, shuffle_group=2)
print(get_num_params())

EfficientNet

EfficientNet由Google于2019年提出，透过Google AutoML的技术，搭建了八种高效的模型，分别为B0-B7，而如果我们将细节拆开来看，其实Bottleneck是由MobileNetV2所提出的Inverted Residual Block加上Squeeze-and-Excitation Networks所组成，所以我们其实只要会搭建MBConv block就能重现EfficientNet的架构，下方我们先来看看MobileNetV2相较于MobileNetV1与Resnet做了哪些重要改变。

EfficientnetB0架构

先扩张再压缩

作者认为，当低通道数的Feature Map经过ReLU激活后，所有值都会大于等于零，造成大量信息的流失，因此有别于Resnet先压缩、MobileNetV1直接做Depthwise separable convolution，MobileNetV2则是先透过Pointwise卷积扩张Feature Map深度。

跨接

相较于V1、V2采用ReseNet概念，对Feature Map进行跨接。

输出改用线性激活

如上方提到的，作者认为低通道数的Feature Map不适合使用ReLU激活，因此将输出层改用线性激活，如果想要使用ReLU的话，要确保输出通道深度。

MobilenetV1 、MobilenetV2、Resnet比较

跟不同阵营的shuffleNet架构比较一下，MobileNetV2推出时ShuffleNetV2还没推出，所以图中是与ShuffleNetV1比较。

Shufflenet、MobilenetV2比较

下方代码示范如何搭建MobileNetV2中的Residual_block。

def  depthwise_conv (x, kernel = 3, stride = 1, padding = 'SAME',
        activation_fn = None, normalizer_fn = None,
        weights_initializer = tf.contrib.layers.xavier_initializer(),
        data_format = 'NHWC', scope = 'depthwise_conv'):
        ##一般的depthwise_conv

    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape [ 3 ].value
        W = tf.get_variable ('depthwise_weights',
            [kernel, kernel, in_channels, 1 ], dtype = tf.float32,
            initializer = weights_initializer)
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format = 'NHWC')
        x = tf.layers.batch_normalization(x) if  normalizer_fn  is  not  None  else  x   # batch normalization
        x = tf.nn.leaky_relu(x) if  activation_fn  is  not  None  else  x   # nonlinearity
        return  x


def  res_block(input, expansion_ratio, output_dim, stride, name, bias = False, shortcut = True):
    with tf.name_scope(name), tf.variable_scope(name):
        # pw
        bottleneck_dim = round(expansion_ratio * input.get_shape().as_list()[-1])
        net = tf.layers.conv2d(input, bottleneck_dim,( 1, 1), name = 'pw',
                        kernel_regularizer = tf.contrib.layers.l2_regularizer(0.003), use_bias = bias) ##先扩张
        net = tf.layers.batch_normalization(net, name = 'pw_bn')
        net = tf.nn.relu6(net)
        # dw
        net = depthwise_conv(net)
        net = tf.layers.batch_normalization(net, name = 'dw_bn')
        net = tf.nn.relu6(net)
        # pw & linear
        net = tf.layers.conv2d(net, output_dim, (1, 1), name = 'pw_linear',
                        kernel_regularizer = tf.contrib.layers.l2_regularizer(0.003), use_bias = bias) ##压回输出深度
        net = tf.layers.batch_normalization(net, name = 'pw_linear_bn')

        # element wise add, only for stride==1
        if  shortcut  and  stride  ==  1 :
            in_dim = int(input.get_shape().as_list()[-1])
            if  in_dim  !=  output_dim :
                ins = tf.layers.conv2d(input, output_dim, (1, 1), name = 'ex_dim',
                        kernel_regularizer = tf.contrib.layers.l2_regularizer(0.003), use_bias = bias)
                net = ins + net
            else :
                net = input + net

        return  net

SENet（Squeeze-and-Excitation Networks）

SENET_Block

有了Inverted Residual Block，我们还缺Squeeze-and-Excitation Networks，SENet的核心思想在于通过网络去学习特征权重，使得有效的特征图权重大，无效或效果小的特征图权重小的方式训练模型达到更好的结果，我认为跟Attention有几分神似。

结合Residual后的架构如上，透过Global Average Pooling获得全局信息(Squeeze)，利用FC层获取语意信息，先压缩再扩张(Excitation)，最后将各个Feature Map得到系数去乘回本来的Input(中间压缩层运用ReLU感觉不太适合？)，具体结合Inverted residual block，建构MBConv代码如下。

import tensorflow as tf
def depthwise_conv(
        x, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv'):      ##一般的depthwise_conv

    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape[3].value
        W = tf.get_variable(
            'depthwise_weights',
            [kernel, kernel, in_channels, 1], dtype=tf.float32,
            initializer=weights_initializer
        )
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC')
        x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x  # batch normalization
        x = tf.nn.leaky_relu(x) if activation_fn is not None else x  # nonlinearity
        return x


def MBConvBlock(input, expansion_ratio, output_dim, stride, name, squeeze ,bias=False, shortcut=True, 
                use_Squeeze_Excitation=True):
    with tf.name_scope(name), tf.variable_scope(name):
        # pw
        bottleneck_dim = round(expansion_ratio*input.get_shape().as_list()[-1]) 
        net = tf.layers.conv2d(input, bottleneck_dim,(1,1), name='pw', 
                               kernel_regularizer=tf.contrib.layers.l2_regularizer(0.003),use_bias=bias) ##先擴張
        net = tf.layers.batch_normalization(net, name='pw_bn')
        net = tf.nn.relu6(net)
        # dw
        net = depthwise_conv(net, stride=stride)
        net = tf.layers.batch_normalization(net, name='dw_bn')
        net = tf.nn.relu6(net)
        # pw & linear
        net = tf.layers.conv2d(net, output_dim,(1,1), name='pw_linear', 
                               kernel_regularizer=tf.contrib.layers.l2_regularizer(0.003), use_bias=bias) ##壓回輸出深度
        net = tf.layers.batch_normalization(net, name='pw_linear_bn')
        
        # SENET-Squeeze-Excitation
        if use_Squeeze_Excitation:
            in_dim=int(net.get_shape().as_list()[-1])
            Squeeze=tf.layers.average_pooling2d(net, net.get_shape()[1:-1], 1)
            Squeeze=tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=in_dim//squeeze))
            Excitation=tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=output_dim))
            Excitation=tf.nn.sigmoid(Excitation)
            net = tf.reshape(Excitation, [-1,1,1,output_dim])*net

        
        in_dim=int(input.get_shape().as_list()[-1])
        if shortcut and stride == 1:
            if in_dim != output_dim:
                ins = tf.layers.conv2d(input, output_dim,(1,1), name='ex_dim', 
                               kernel_regularizer=tf.contrib.layers.l2_regularizer(0.003), use_bias=bias) 
                net = ins+net
            else:
                net = input+net

        return net
    
    
tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 43])
out = MBConvBlock(inputs, 4, 64, 1, 'first', 4,bias=False, shortcut=True, use_Squeeze_Excitation=True)

MobileNetV3

只能说大神们发论文的速度比我们看论文的速度还要快，MobileNetV3传承MobileNetV1的Depthwise Separable Convolution、MobileNetV2的跨接与先放大再压缩观念，并加入了Squeeze-and-Excitation Networks，所以整个架构上与EfficientNet的MBConvBlock很相似，除此之外MobileNetV3在激励函数上做了一些变动：

部分Block中的ReLU使用H-swish取代，Sigmoid则使用H-sigmoid取代，H-swish是参考swish函数设计，主要是由于swish函数运算较慢，作者实验证实，使用H-swish能提高准度。

H-swish激励函数

激励函数之间的差异

下图为MobileNetV3与MobileNetV2的比较图，图中可以发现相同Latency下，MobileNetV3模型在Top-1 Accuracy上都较为胜出。

MobilenetV3 vs MobilenetV2

下方的代码实现MobileNetV3的Bottleneck。

import tensorflow as tf
def Hswish(input_):
    return input_* tf.nn.relu6(input_ + 3.) / 6.

def Hsigmoid(input_):
    return tf.nn.relu6(input_ + 3.) / 6.

def depthwise_conv(
        x, kernel=3, stride=1, padding='SAME',
        activation_fn=None, normalizer_fn=None,
        weights_initializer=tf.contrib.layers.xavier_initializer(),
        data_format='NHWC', scope='depthwise_conv'):      ##一般的depthwise_conv

    with tf.variable_scope(scope):
        assert data_format == 'NHWC'
        in_channels = x.shape[3].value
        W = tf.get_variable(
            'depthwise_weights',
            [kernel, kernel, in_channels, 1], dtype=tf.float32,
            initializer=weights_initializer
        )
        x = tf.nn.depthwise_conv2d(x, W, [1, stride, stride, 1], padding, data_format='NHWC',)
        x = tf.layers.batch_normalization(x) if normalizer_fn is not None else x  # batch normalization
        x = tf.nn.leaky_relu(x) if activation_fn is not None else x  # nonlinearity
        return x
    
def SEBlock(input_, squeeze=4):
    in_dim=int(input_.get_shape().as_list()[-1])
    Squeeze = tf.layers.average_pooling2d(input_, input_.get_shape()[1:-1], 1)
    Squeeze = tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=in_dim//squeeze)) 
    Excitation = tf.nn.relu(tf.layers.dense(Squeeze, use_bias=False, units=in_dim))
    Excitation = Hsigmoid(Excitation) ##Hsigmoid replace Sigmoid
    Excitation = tf.reshape(Excitation, [-1,1,1,in_dim])
    return input_*Excitation
    
    
def MobileV3Bottleneck(input_,expand_size, squeeze,out_size, kernel_size,stride=1, relu=True, se=True):
    Shortcut = input_
    in_dim = int(input_.get_shape().as_list()[-1])
    out = tf.layers.batch_normalization(tf.layers.conv2d(input_,expand_size, (1,1), (1,1), use_bias=False))
    if relu:
        out = tf.nn.relu(out) #or relu6
    else:
        out = Hswish(out)

    out = depthwise_conv(out, kernel=kernel_size, stride=stride, padding='SAME')
    out = tf.layers.batch_normalization(out)
    if relu:
        out = tf.nn.relu(out) #or relu6
    else:
        out = Hswish(out)
        
    out = tf.layers.batch_normalization(tf.layers.conv2d(out, out_size, (1,1), (1,1), use_bias=False))
    
    if (in_dim != out_size) and (stride == 1):
        Shortcut = tf.layers.conv2d(Shortcut,out_size, (1,1), strides = (stride, stride), use_bias=False)
        Shortcut = tf.layers.batch_normalization(Shortcut)
    if se:
        assert squeeze <= out_size
        out = SEBlock(out,squeeze=squeeze)

    out = out + Shortcut if stride == 1 else out
    return out

tf.reset_default_graph()
inputs = tf.placeholder(tf.float32, [None, 300, 300, 80])
out = MobileBottleneck(inputs,480,4,112,3,stride=1,relu=False,se=True)

结论

今天为读者们介绍了几个最新CNN架构的核心技术，相信大家在看完后都有所收获，往后在搭建模型时，也不会局限在pretrained model，而是能依照自己的需求与想法打造最适合的Model。

CNN演进

CNN经典架构

Inception

残差网路

Depthwise Separable Convolution

CNN模型

ResNetV2

拿掉Residual Block后的ReLU

拿掉Identity branch后的BN

Inception-ResNet

DenseNet

ShuffleNetV2

1*1卷积

输出使用Concate

EfficientNet

先扩张再压缩

跨接

输出改用线性激活

SENet（Squeeze-and-Excitation Networks）

MobileNetV3

结论

Recommend

一些MSSQL注入的Tricks

Python 进阶——什么是上下文管理器？

Java 赢了很多小战役，但如何赢得这场艰苦卓绝的大战争？

CRLFuzz：一款基于Go的CRLF漏洞快速扫描工具

下一个十年，新锐品牌会以“设计”取胜吗?

蒂姆·费里斯的 “DEAL 法”：如何做到少工作，多赚钱？

最近看了几个技术方面的教程视频，怎么对于“文件夹”的“夹”字的发音都是二声，难道说是...

潜望｜Zappos创始人意外离世：一位与传统商业逻辑对着干的创业者

零售周报 | 由中国主导制定的泡菜国际标准正式诞生；北京市消协：七成消费者办理“携号...

科技周报 | 华龙一号全球首堆并网成功；北斗卫星在165国观测频率比GPS高

About Joyk