Batch Normalization In Neural Networks (Code)

Code

The first step is to import tools and libraries that will be utilized to either implement or support the implementation of the neural network. The tools that are utilized are as follow:

TensorFlow : An open-source platform for the implementation, training, and deployment of machine learning models. Keras : An open-source library used for the implementation of neural network architectures that run on both CPUs and GPUs.

import tensorflow as tf
from tensorflow import keras

The dataset we’ll be utilizing is the trivial fashion-MNIST dataset .

The fashion-MNIST dataset contains 70,000 images of clothing. More specifically, it includes 60,000 training examples and 10,000 testing examples, that are all grayscale images with dimension 28 x 28 categorized into ten classes.

Preparation of the dataset includes the normalization of the training image and test images by dividing each pixel value by 255.0. This places the pixel value within the range 0 and 1.

A validation portion of the dataset is also created at this stage. This group of the dataset is utilized during training to assess the performance of the network at various iterations.

(train_images, train_labels),  (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
validation_images = train_images[:5000]
validation_labels = train_labels[:5000]

Keras provides tools required to implement the classification model. Keras presents a Sequential API for stacking layers of the neural network in a consecutive manner.

Below is some information on the layers that will be implemented to make up our neural network.

Flatten : Takes an input shape and flattens the input image data into a one-dimensional array. Dense : A dense layer has an embedded number of arbitrary units/neurons within. Each neuron is a perceptron. A Perceptron is a fundamental component of an artificial neural network, and it was invented by Frank Rosenblatt in 1958. A perceptron utilizes operations based on the threshold logic unit. Batch Normalization : Batch Normalization layer works by performing a series of operations on the incoming input data. The set of operations involves standardization, normalization, rescaling and shifting of offset of input values coming into the BN layer. Activation Layer : This performs a specified operation on the inputs within the neural network. This layer introduces non -linearity within the network. The model implemented in this article will be utilizing the activation functions: Rectified Linear Unit(ReLU) and softmax . The transformation imposed by ReLU on values from a neuron is represented by the formula y=max(0,x). The ReLU activation function clamps down any negative values from the neuron to 0, and positive values remain unchanged. The result of this mathematical transformation is utilized as the activation of the current layer, and as input to the next.

# Placing batch normalization layer before the activation layers
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28,28]),
    keras.layers.Dense(300, use_bias=False),
    keras.layers.BatchNormalization(),
    keras.layers.Activation(keras.activations.relu),
    keras.layers.Dense(200, use_bias=False),
    keras.layers.BatchNormalization(),
    keras.layers.Activation(keras.activations.relu),
    keras.layers.Dense(100, use_bias=False),
    keras.layers.BatchNormalization(),
    keras.layers.Activation(keras.activations.relu),
    keras.layers.Dense(10, activation=keras.activations.softmax)
])

Let’s take a look at the internal components of a BN layer

Merely accessing the layer at index two will provide information into the variables and their contents within the first BN layer,

model.layers[2].variables

I won’t go into too many details here, but take note of the variable names ‘gamma’, and ‘beta’, the values held within these variables are responsible for the rescaling and offsetting of activations within the layer.

for variable in model.layers[2].variables:
    print(variable.name)>> batch_normalization/gamma:0
>> batch_normalization/beta:0
>> batch_normalization/moving_mean:0
>> batch_normalization/moving_variance:0

Thisarticle goes into more detail in regards to the operations within BN layers.

Within the dense layers, the bias component is set to false. The omission of bias is as a result of the cancellation of constant values that occurs due to mean subtraction during normalization of activations.

Below is a snippet of a twitter post by Andrej Karpathy, current Director of AI at Tesla. His tweet was based on the topic of neural network mistakes that are often made, not setting bias to false when using BN was on the list.

In the next snippet of code we set and specify the optimization algorithm to train the implemented neural network with, along with the loss function and hyperparameters such as learning rate and the number of epochs.

sgd = keras.optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="sparse_categorical_crossentropy", optimizer=sgd, metrics=["accuracy"])

Now we train the network using the model’s sequential API’s ‘ fit ’ method provides the tools to train the implemented network. We will skip some details in regards to how the neural network model is trained. For further information on a detailed explanation on the training and implementation of neural networks, refer to the link below.

(In-depth) Machine Learning Image Classification With TensorFlow 2.0

Understand the processes involved in implementing neural networks for image classification.

towardsdatascience.com

model.fit(train_images, train_labels, epochs=60, validation_data=(validation_images, validation_labels))

The evaluation of the model performance is conducted using the test data set aside earlier.

With evaluation results, you can decide to fine-tune the network hyperparameters or move forward to production after observing the accuracy of the evaluation over the test dataset.

model.evaluate(test_images, test_labels)

During the training phase, you might notice that each epoch takes longer to train in comparison to a training a network without batch normalization layers. This is since the batch normalization adds a layer of complexity to the neural network, along with extra parameters required for the model to learn during training.

Although the increase in each epoch time is balanced with the fact that Batch Normalization reduces the time taken for the model to converge to an optimal solution.

The model implemented in this article is too shallow for us to notice the full benefits of utilizing batch normalization within a neural network architecture. Typically, batch normalization is found in deeper convolutional neural networks such as Xception , ResNet50 and Inception V3 .

Extra

The neural network implemented above has the Batch Normalization layer just before the activation layers. But it is entirely possible to add BN layers after activation layers.

# Placing batch normalization layer after the activation layers
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28,28]),
    keras.layers.Dense(300, use_bias=False),
    keras.layers.Activation(keras.activations.relu),
    keras.layers.BatchNormalization(),
    keras.layers.Dense(200, use_bias=False),
    keras.layers.Activation(keras.activations.relu),
    keras.layers.BatchNormalization(),
    keras.layers.Dense(100, use_bias=False),
    keras.layers.Activation(keras.activations.relu),
    keras.layers.BatchNormalization(),
    keras.layers.Dense(10, activation=keras.activations.softmax)
])

There has been some extensive work done by researchers on the Batch Normalization technique. For example Batch Renormalization and Self Normalizing Neural Networks

Code

(In-depth) Machine Learning Image Classification With TensorFlow 2.0

towardsdatascience.com

Extra

Recommend

K8s 从懵圈到熟练-集群伸缩原理

前端团队代码规范最佳实践

共享电单车走向台前，是否会重蹈单车覆辙？

一夜之间，张颖、朱啸虎等投资人争相“当网红”：时隔3年，这个风口再度引燃？

黑石往事

新浪收到武汉市新冠肺炎疫情防控指挥部感谢信

人类的宇宙之眼：全民偶像“哈勃”时代30年

RELX悦刻全球专利申请425件，用创新力守护品质安心

旁友们我又来了，可能真的要止损了，流年不利不过如此

GeekHub 上线公测，欢迎来玩。第一波活动送出两个 AirPods 2 ...

About Joyk