Forward propagation in neural networks — Simplified math and code version - JOYK Joy of Geek, Geek News, Link all geek

As we all know from the last one-decade deep learning has become one of the most widely accepted emerging technology. This is due to its representational power of functions.

According to Universal approximation theorem , a well-guided and engineered deep neural network can approximate any arbitrary complex and continuous relationship among the variables. In fact, there are several other reasons behind the success of deep learning . I am not going to discuss these possible reasons here.

The goal of this post is to explain forward propagation (one of the core process during learning phase) in a simpler way.

A learning algorithm/model finds out the parameters (weights and biases) with the help of forward propagation and backpropagation.

Forward propagation

aAs the name suggests, the input data is fed in the forward direction through the network. Each hidden layer accepts the input data, processes it as per the activation function and passes to the successive layer.

Why Feed-forward network?

In order to generate some output, the input data should be fed in the forward direction only. The data should not flow in reverse direction during output generation otherwise it would form a cycle and the output could never be generated. Such network configurations are known as feed-forward network . The feed-forward network helps in forward propagation .

At each neuron in a hidden or output layer, the processing happens in two steps:

Preactivation: it is a weighted sum of inputs i.e. the linear transformation of weights w.r.t to inputs available. Based on this aggregated sum and activation function the neuron makes a decision whether to pass this information further or not.
Activation: the calculated weighted sum of inputs is passed to the activation function. An activation function is a mathematical function which adds non-linearity to the network. There are four commonly used and popular activation functions — sigmoid, hyperbolic tangent(tanh), ReLU and Softmax.

Now let us understand forward propagation with the help of an example. Consider a non-linearly separable data in the form of two moons of data points following a swirl pattern. This generated data has two different classes.

The data can be generated using make_moons() function of sklearn.datasets module. The total number of samples to be generated and noise about the moon’s shape can be adjusted using the function parameters.

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
from sklearn.datasets import make_moons

np.random.seed(0)

data, labels = make_moons(n_samples=200,noise = 0.04,random_state=0)
print(data.shape, labels.shape)

color_map = matplotlib.colors.LinearSegmentedColormap.from_list("", ["red","yellow"])
plt.scatter(data[:,0], data[:,1], c=labels, cmap=my_cmap)
plt.show()

dataset visualization

Here, 200 samples are used to generate the data and it has two classes shown in red and green color.

Now, let us see the neural network structure to predict the class for this binary classification problem. Here, I am going to use one hidden layer with two neurons, an output layer with a single neuron and sigmoid activation function.

During forward propagation at each node of hidden and output layer preactivation and activation takes place. For example at the first node of the hidden layer, a1 ( preactivation ) is calculated first and then h1 ( activation ) is calculated.

a1is a weighted sum of inputs. Here, the weights are randomly generated.

a1= w1*x1 + w2*x2 + b1 = 1.76* 0.88 + 0.40*(-0.49) + 0 = 1.37 approx and h1 is the value of activation function applied on a1.

Similarly

a2= w3*x1 + w4*x2 + b2 = 0.97 *0.88 + 2.24 *(- 0.49)+ 0 = -2.29 approx and

For any layer after the first hidden layer, the input is output from the previous layer.

a3= w5*h1 + w6*h2 + b3 = 1.86*0.8 + (-0.97)*0.44 + 0 = 1.1 approx

and

So there are 74% chances the first observation will belong to class 1. Like this for all the other observations predicted output can be calculated.

The below image represents the transformation of data from the input layer to the output layer for the first observation.

Data transformation from the input layer to the output layer

Now let us see the implementation of the above neural network in Jupyter notebook. In fact, while building deep neural networks frameworks like Tensorflow, Keras, PyTorch, etc. are used.

from sklearn.model_selection import train_test_split

#Splitting the data into training and testing data
X_train, X_val, Y_train, Y_val = train_test_split(data, labels, stratify=labels, random_state=0)
print(X_train.shape, X_val.shape)

Here, 150 observations are used for training purpose and 50 for testing purpose as per the default split ratio of 75:25.

Now let us define a class for forward propagation where weights are randomly initialized.

class FeedForwardNetwork:

 def __init__(self):
 np.random.seed(0)
 self.w1 = np.random.randn()
 self.w2 = np.random.randn()
 self.w3 = np.random.randn()
 self.w4 = np.random.randn()
 self.w5 = np.random.randn()
 self.w6 = np.random.randn()
 self.b1 = 0
 self.b2 = 0
 self.b3 = 0

 def sigmoid(self, x):
 return 1.0/(1.0 + np.exp(-x))

 def forward_pass(self, x):
 self.x1, self.x2 = x
 self.a1 = self.w1*self.x1 + self.w2*self.x2 + self.b1
 self.h1 = self.sigmoid(self.a1)
 self.a2 = self.w3*self.x1 + self.w4*self.x2 + self.b2
 self.h2 = self.sigmoid(self.a2)
 self.a3 = self.w5*self.h1 + self.w6*self.h2 + self.b3
 self.h3 = self.sigmoid(self.a3)
 forward_matrix = np.array([[0,0,0,0,self.h3,0,0,0], 
 [0,0,(self.w5*self.h1), (self.w6*self.h2),self.b3,self.a3,0,0],
 [0,0,0,self.h1,0,0,0,self.h2],
 [(self.w1*self.x1), (self.w2*self.x2), self.b1, self.a1,(self.w3*self.x1),(self.w4*self.x2), self.b2, self.a2]])
 forward_matrices.append(forward_matrix)
 return self.h3

Here, the forward_pass() function calculates the output value for the given input observation. forward_matrix is a 2d array to store the values of a1, h1, a2, h2, a3, h3, etc for each observation. The reason to use it is just to visualize the transformation of these values using a GIF image. Entries of forward_matrix are as shown below

forward_matrix

forward_matrices = []
ffn = FeedForwardNetwork()
for x in X_train:
 ffn.forward_pass(x)

forward_matrices is a list of forward_matrix for all the observations.

import seaborn as sns
import imageio
from IPython.display import HTML

def plot_heat_map(observation):
 fig = plt.figure(figsize=(10, 1))
 sns.heatmap(forward_matrices[observation], annot=True, cmap=my_cmap, vmin=-3, vmax=3)
 plt.title(“Observation “+str(observation))

fig.canvas.draw()
 image = np.frombuffer(fig.canvas.tostring_rgb(), dtype=’uint8')
 image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))

return image

imageio.mimsave(‘./forwardpropagation_viz.gif’, [plot_heat_map(i) for i in range(0,len(forward_matrices),len(forward_matrices)//15)], fps=1)

plot_heat_map() function creates a heat map to visualize the values of forward_matrix for each observation. These heatmaps are stored in forwardpropagation_viz.gif image. Here, total 15 different heat maps are created for 15 different observations.

forward propagation for 15 different observations

Code Optimization

Instead of using different variables like w1, w2…w6, a1, a2, h1, h2, etc. separately, a vectorized matrix can be used for weights, preactivation(a) and activation(h) respectively. Vectorization enables much more efficient and faster execution of code. It also has easy to follow and learn syntax.

class FeedForwardNetwork_Vectorised:

 def __init__(self):
 np.random.seed(0)
 self.W1 = np.random.randn(2,2)
 self.W2 = np.random.randn(2,1)
 self.B1 = np.zeros((1,2))
 self.B2 = np.zeros((1,1))

 def sigmoid(self, X):
 return 1.0/(1.0 + np.exp(-X))


 def forward_pass(self,X):
 self.A1 = np.matmul(X,self.W1) + self.B1 
 self.H1 = self.sigmoid(self.A1) 
 self.A2 = np.matmul(self.H1, self.W2) + self.B2
 self.H2 = self.sigmoid(self.A2) 
 return self.H2

ffn_v = FeedForwardNetwork_Vectorised()
ffn_v.forward_pass(X_train)

Conclusion

This is about forward propagation from my side and I hope I was able to explain the intuition and steps involved in forward propagation . If you are interested in learning or exploring more about neural networks, refer to my other blog offerings on neural networks. The links are as below

Why better weight initialization is important in neural networks?

Analyzing different types of activation functions in neural networks — which one to prefer?

Why Gradient descent isn’t enough: A comprehensive introduction to optimization algorithms in neural networks

Forward propagation in neural networks — Simplified math and code version

Forward propagation

Why Feed-forward network?

Code Optimization

Recommend

My Take on Property-Based Testing

6 Tips For Tackling Inherited Code

面试必备：《Java 最常见 200+ 面试题全解析》

urlencode和rawurlencode，傻傻分不清

Citibank Data Science Interview Questions

5 You’re Probably F**king Up Your Microservices

ProtonMail is dropping support for Internet Explorer 11

1 行Python代码能干哪些事，这 13个你知道吗？

Prototyping with Web Components: Build an RSS Reader

再谈项目范围控制(5.7)

About Joyk