22

Convolutional Neural Networks

 4 years ago
source link: https://towardsdatascience.com/convolutional-neural-networks-357b9b2d75bd?gi=56dd02b798d8
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Introduction & Convolutions

eeARZfj.png!web

Feb 27 ·10min read

VvuINbr.jpg!web

https://galusaustralis.com/2020/01/393272/astonishing-growth-in-deep-learning-market-advance-study-focusing-on-market-analysis-focusing-on-top-leading-vendors-like-advanced-micro-devices-arm-ltd-clarifai-entilic-google/

The goal of this article is to explore the following concepts:

  • Introduction to Convolutional Neural Networks. Use cases and examples.
  • Convolutions. Examples in Python
  • CNNs.
  • Locally Connected layers

Introduction to Convolutional Neural Networks

As you can findhere, a neural network is a universal function approximator. This means that in essence, neural networks solve problems by trying to find the best possible approximation to a function that allows us to solve our problem.

To do this we have a series of parameters (the weights and the bias) that we are updating using the backpropagation algorithm, which is based on the descent gradient.

Thanks to our labels, we can calculate the error in each iteration and modify the weights to reduce it progressively.

And what’s a convolutional neural network? Or more importantly, what problems does it solve?

In short, convolutional neural networks can solve all the problems that can be expressed in image form.

For example, just take into account when you are trying to label someone in your Facebook pictures. Have you noticed that it suggests the person’s profile? That’s a convnet!

63I3umV.jpg!web

https://nakedsecurity.sophos.com/2019/09/06/facebook-expands-use-of-face-recognition/

Or perhaps you have heard of autonomous cars, which can “read” traffic signs, recognize other cars and even detect if a person is crossing the street. That functionalities are based in convnets too!

MBR7vaa.png!web

https://www.aitrends.com/ai-insider/sensor-fusion-self-driving-cars/

CNNs are the state of art for solving medical imaging problems.And these are just a few examples but there are many more.

The reason why they have become so popular in the past recent years is because they can find the right features, on their own, to later classify images correctly. And they do it in a very efficient way.

But what exactly is a CNN?

CNN is a neural network in which new types of layers are introduced, the most important of which is convolutional.

And what is convolution?

Convolution

Strictly speaking, convolution is mainly used in signal processing and is a mathematical operation that allows two signals to be combined.

In digital signal processing, convolution is used to know what will happen to a signal after “passing” through a certain device.

For example, to know how our voice changes after passing through the microphone of our mobile phone, we could calculate the convolution of our voice with the response to the microphone impulse.

Convolutional neural networks have become famous for their ability to detect patterns that they then classify. Those pattern detectors are convolutions.

Let’s see how a computer understands an image:

myq2umR.png!web

http://cs231n.github.io/classification/

AneAZr2.png!web

https://datascience-enthusiast.com/DL/Convolution_model_Step_by_Stepv2.html

As you can see, a color image is represented as a 3-dimensional matrix: Width x Height x Channels.

There are several ways to represent the images, but the most common is using the RGB color space. This means that a computer eventually sees 3 matrices of Weight x Height, where the first one tells you the amount of red the image has, the second one tells you the amount of green, and the third one tells you the amount of blue.

If the image were in grayscale, the computer would see it as a single two-dimensional Weight x Height matrix.

Finally, the values that the elements of the matrix can take depend on the type of the variable used. The most common ones are:

  • If we use 8-bit integers: they can go from 0 to 255
  • If we use floats: 0 to 1

Knowing that the image is a matrix, what the convolution does is to define a filter or kernel through which it will multiply the image matrix. If we take a look at the next image:

yiqAJzI.png!web

https://developer.apple.com/documentation/accelerate/blurring_an_image

You define a kernel, 3x3 pixels, and multiply it to the input_image. What happens? That the kernel is much smaller than the image, so to multiply the whole image, first we place the kernel on the first 3x3 pixels, then we move it one to the right, then another, then another… and we calculate the sum of the multiplication of each element of the kernel by each corresponding pixel of the image. The result of this operation is stored in the output image, as you can see.

Here you can see it more clearly:

j6VRzay.jpg!web

https://www.researchgate.net/publication/334974839_Graduation_Thesis_Implementing_and_Optimizing_Neural_Networks_using_Tiramisu/figures?lo=1

Examples in Python

Let’s see some examples to see what happens when we do these multiplications and additions and how they help to detect patterns and make predictions.

import numpy as np
from scipy import signal
from scipy import misc
ascent = misc.ascent()
kernel = np.array([[-1, 0, +1],
                   [-1, 0, +1],
                   [-1, 0, +1]])
grad = signal.convolve2d(ascent, kernel, boundary='symm', mode='same')import matplotlib.pyplot as plt# function to show two pictures together
def plot_two(img_orig, img_conv):
  fig, (ax_orig, ax_mag) = plt.subplots(1, 2, figsize=(20, 50))
  ax_orig.imshow(img_orig, cmap='gray')
  ax_orig.set_title('Original')
  ax_orig.set_axis_off()
  ax_mag.imshow((img_conv), cmap='gray')
  ax_mag.set_title('Gradient')
  ax_mag.set_axis_off()plot_two(ascent, grad)

mMbaIrv.png!web

This a vertical line detector. Let’s define and use a horizontal line detector.

kernel = np.array([[-1, -1, -1],
                   [ 0,  0,  0],
                   [+1, +1, +1]])
grad_v = signal.convolve2d(ascent, kernel, boundary='symm', mode='same')
plot_two(ascent, grad_v)

ENjIZzI.png!web

Let’s see some of the most used kernels in Traditional convolution

Let’s print first an unmodified picture:

# load and show the original picture
url_img = 'https://upload.wikimedia.org/wikipedia/commons/5/50/Vd-Orig.png'
from urllib.request import urlopen
from io import BytesIO
from PIL import Image
file = BytesIO(urlopen(url_img).read())
img = np.asarray(Image.open(file), dtype='uint8')
plt.imshow(img)
plt.axis('off')

emENFbm.png!web

def convolve3d(img, kernel):
  img_out = np.zeros(img.shape)
  for i in range(img.shape[-1]):
     img_out[:,:,i] = signal.convolve2d(img[:,:,i], kernel, boundary='symm', mode='same')
  return img_out.astype('uint8')

Identity Kernel

# Let's try with the Identity Kernel
kernel = [[0, 0, 0],
         [0, 1, 0],
         [0, 0, 0]]
img_ki = convolve3d(img, kernel)
plot_two(img, img_ki)

q2EnYfZ.png!web

Other kernels that are most used (and their outputs) are:

jM7j2iu.png!web

All of this is very informative… But how is convolution able to detect patterns?

Pattern Detection Example

Let’s say that we have this filter:

eQriA3j.png!web

https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/

And the following image

e6FRZni.png!web

https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/

What happens if the filter lays on the rat’s back?

iU7beej.png!web
https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/

The result will be:

30·0 + 30·50 + 30·20 + 30·50 + 30·50 + 30·50=6600

Which is a very high number and indicates that we have found out a curve.

What happens if the filter lays on the rat’s head?

3QnmIvq.png!web
https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/

The result will be:

30·0 + 30·0 + 30·0 + 30·0 + 30·0 + 30·0=0

CNNs

Now that we have introduced the concept of convolution, let’s study what are the convolutional neural networks and how they work.

aINrQj6.png!web

https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/

In these images, we can see the typical architecture of a convolutional neural network. This is nothing more than a W x H x 3 matrices (because it is RGB). Then the “convolutional blocks” start.

These blocks are usually composed of:

  • Convolutional layers
  • Pooling layers, which decimate the content of the convolutional layer output

Before we already know how convolution works: We define a kernel or filter that serves to highlight certain structures in the image.

But how do I define a filter that allows me to find out that the input image has a black cat in it?

That’s CNNs magic! We don’t have to define any filter, the network learns them automatically thanks to backpropagation!

Our CNNs has two stages: feature extractor and classifier.

The feature extraction stage goes from the general patterns, or structures, to the specifics:

  • The first convolutional layers detect lines in different orientations
  • The next ones detect shapes and colors
  • The next ones more complex patterns

So in the end, what we have is a network that learns on its own, so we don’t have to worry about which characteristics we choose to classify since it chooses them on its own.

And how is it learning? The same way as a traditional neural network.

NzyA3iU.jpg!web

https://www.sciencedirect.com/science/article/abs/pii/S0893608015001896

The second stage, the classifier, is made up of dense layers, which are the layers used in traditional neural networks.

So finally a CNN could be understood as a set of convolutional stages coupled to a traditional neural network, which is the one that classifies the patterns extracted by the convolutions and returns some probabilities for each class.

Types of layers in a CNN

Convolutional

These layers are in charge of applying the convolution to our input images to find the patterns that will later allow us to classify it:

  • The number of filters/kernels to apply to the image: the number of matrices through which the input images will be convoluted
  • The size of these filters: 99% of the time they are square, 3x3, 5x5, etc.

Here you can see the general scheme, in which you can see how a given input image is convoluted by each filter, and the output is 2D activation maps. This means that if the input image is RGB, it will have 3 channels. Therefore, we will convolute each filter for each channel, and then we will add the results, to reduce from 3 channels to only 1:

Nfeyyur.png!web

As the input has 3 channels, R, G and B, this means that our input image is defined as 3 two-dimensional arrays, one for each channel.

So what the convolution layer does is apply the convolution separately to each channel, get the result of each channel, and then add them up to get a single 2D matrix that is called an activation map.

In this link you can see it more in detail:

http://cs231n.github.io/assets/conv-demo/index.html

IBzQFfi.png!web

Besides the number of filters and the size, convolutional layers have another important parameter that we should take into account: the stride.

This would be an example of a 1 unit Stride:

NJ7v6rA.gif

https://arxiv.org/abs/1603.07285

And this would be an example of a 2 unit Stride convolution:

QVFBza6.gif

https://arxiv.org/abs/1603.07285

You can tell that the difference is the length of the step that the kernel takes in each iteration.

Receptive Field

vYVRFvn.png!web

https://www.slideshare.net/ssuser06e0c5/convolutional-neural-networks-135496264

In the case of convolutional layers, the output neurons have been connected to only one local region of the input image.

It can be understood as “what the network see”. With the dense layers the opposite happens, all the neurons have been connected to all the previous elements. However, the neurons still function the same, the only thing is that at the entrance they “see” the whole image, instead of a region of it.

As you can find in thisgreat article:

The receptive field determines what area of the original input to the entire network the output gets to see.

Pooling

Pooling layers are used to reduce the size of our activation maps, otherwise, it would not be possible to run them on many GPUs. The two most common types of pooling are:

  • max pooling: calculates the maximum of the elements
  • average pooling: calculates the average of the elements

EBveA3z.png!web

https://www.quora.com/What-is-max-pooling-in-convolutional-neural-networks

It must be taken into account that this is done for each activation map of our volume, that is, the depth dimension does not intervene at all in the calculations.

Let’s see an example of maxpooling with different strides:

FjUZZfv.png!web

Locally-connected Layers

Imagine we have an input image of 32x32, and our network has 5 convolutional layers, each with 5 filters of size 3x3. This is because the filter runs through the image.

This is based on the assumption that if a certain filter is good at detecting something in the position (x, y) of the image, it should also be good for the position (x2,y2).

This assumption is almost always valid because normally we do not know where our features are going to be located in the image, but if for example, we have a dataset in which faces appear centered in the image, we might want the filters to be different for the eye areas than for the nose or the mouth, right?

In this case, if we know where our features are going to be located, it makes more sense to have a filter for each area.

Where before we had to learn 5 filters of 3x3 per layer, which gives us a total of: 5⋅3⋅3=45 parameters, now we would have to learn: 32⋅32⋅5⋅3=46080 parameters.

That’s a huge difference. So unless we know where we want to look for the patterns, that they are going to be different and always in the same position, it is worth using convolutional rather than locally connected layers.

By the way, look at the image below: the layers with the most parameters are the dense ones! It makes sense, in them, all the neurons interconnect with all the neurons in the next layer.

fAv2iyA.png!web

https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/convolutional_neural_networks.html

Final Words

As always, I hope you enjoyed the post, and that you gained an intuition about convolutional neural networks!

If you liked this post then you can take a look at my other posts on Data Science and Machine Learning here .

If you want to learn more about Machine Learning, Data Science and Artificial Intelligence follow me on Medium , and stay tuned for my next posts!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK