Getting to know Activation Functions in Neural Networks.

What are activation functions in Neural Networks and why should you know about them?

Jun 24 ·5min read

7v6fQfu.jpg!web

Photo by Marius Masalar onUnsplash

If you are someone who has experience implementing Neural Networks, you might have encountered the term ‘Activation functions’. Does the name ring any bells? No? How about ‘ relu, softmax or sigmoid ’? Well, those are a few of the most widely used activation functions in today’s context. When I started working with Neural Networks I had no idea what an activation function really does. But there was a point where I could not go ahead with the implementation of my neural network without a sound knowledge of activation functions. I did a little bit of digging and here’s what I found…

What are activation functions?

To put it simply, activation functions are mathematical equations that determine the output of neural networks. They basically decide to deactivate neurons or activate them to get the desired output thus the name, activation functions. Now, let’s get into the math…

73MnYbr.png!web

Figure 1

In a neural network, input data points(x) which are numerical values are fed into neurons. Each and every neuron has a weight(w) which will be multiplied by the inputs and output a certain value which will again be fed into the neurons in the next layer. Activation functions come into the play as mathematical gates in between this process as depicted in figure 1 and decide whether the output of a certain neuron is on or off.

Activation functions can be divided into three main categories; Binary Step Function, Linear Activation Function and Non-Linear Activation functions. However, non-linear activation functions consist of several types of functions. Let’s take a deeper look…

1. Binary Step Function

MfqQvi7.png!web

Binary Step Activation Function

Binary step function is a threshold-based activation function which means after a certain threshold neuron is activated and below the said threshold neuron is deactivated. In the above graph, the threshold is zero. This activation function can be used in binary classifications as the name suggests, however it can not be used in a situation where you have multiple classes to deal with.

2. Linear Activation Function

rAj2yy3.jpg!web

Linear Activation Function

Here, our function (Output) is directly proportional to the weighted sum of neurons. Linear Activation function can deal with multiple classes, unlike Binary Step function. However, it has its own drawbacks. With linear activation function changes made in back-propagation will be constant which is not good for learning. Another huge drawback of Linear Activation Function is that no matter how deep the neural network is (how many layers neural network consist of) last layer will always be a function of the first layer. This limits the neural network’s ability to deal with complex problems.

3. Non-Linear Activation Functions

Deep learning practitioners today work with data of high dimensionality such as images, audios, videos, etc. With the drawbacks mentioned above, it is not practical to use Linear Activation Functions in complex applications that we use neural networks for. Therefore, it is Non-Linear Functions that are being widely used in present. We’ll take a look at a few of the popular non-linear activation functions.

Sigmoid function.

ZBza22n.jpg!web

Sigmoid function

Sigmoid function (also known as logistic function) takes a probabilistic approach and the output ranges between 0–1. It normalizes the output of each neuron. However, Sigmoid function makes almost no change in the prediction for very high or very low inputs which ultimately results in neural network refusing to learn further, this problem is known as the vanishing gradient .

tanh function

A7nme2B.jpg!web

tanh function

tanh function (also known as hyperbolic tangent) is almost like the sigmoid function but slightly better than that since it’s output ranges between -1 and 1 allowing negative outputs. However, tanh also comes with the vanishing gradient problem just like sigmoid function.

ReLU (Rectified Linear Unit) function

YJnaMrz.jpg!web

ReLU function

In this function, outputs for the positive inputs can range from 0 to infinity but when the input is zero or a negative value, the function outputs zero and it hinders with the back-propagation. This problem is known as the dying ReLU problem.

Leaky ReLU

iUbmuy7.jpg!web

Leaky ReLU function

Leaky ReLU prevents the dying ReLU problem and enable back-propagation. One flaw of Leaky ReLU is the slope being predetermined rather than letting the neural network figure it out.

There are quite a few other non-linear activation functions such as softmax, Parametric ReLU etc. which is not discussed in this article. Now comes the million dollar question! Which activation function is the best? Well, my answer would be it depends … It depends on the problem you are applying the neural network to. For an instance if you are applying a neural network to a classification problem, sigmoid will work well, but for some other problem it might not work well and that is why it is important to learn about pros and cons of activation functions so that you can choose the best activation function to the project that you are working on.

How to include activation functions in your code?

Maybe years ago implementing the math behind all these functions might have been quite difficult but now with the advancement of open-source libraries such as TensorFlow and PyTorch it has become easier! Let’s see a code snippet where activation functions have been included in code using TensorFlow.

Activation functions in TensorFlow

Seems quite simple right? As easier it is with TensorFlow, it is important to have an actual understanding of theses activation functions because the learning process of your neural network highly depends on it.

Thank you for reading and hope this article was of help.

Getting to know Activation Functions in Neural Networks.