33

Demystifying Convolutional Neural Networks Using Class Activation Maps.

 4 years ago
source link: https://www.tuicool.com/articles/zmeQJvE
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Machine Learning is gaining exponential momentum every day and its applications are increasing in every domain whether it be a trivial stock price prediction in finance domain or a complex task like detection and segmentation of Objects in Computer Vision Domain. No domain is left untouched by the AI revolution and in some domains, the Machine Learning Algorithms are even surpassing human-level performance. For example, ImageNet challenge was organised every year for various Computer Vision tasks like Image Classification, Object Detection, Image Localization etc and each year the error rates of the best-performing algorithms kept on decreasing and in 2017, 29 of 38 competing teams had greater than 95% accuracy. Human top-5 classification error rate on the large scale ImageNet dataset has been reported to be 5.1% whereas the state of the art CNN achieves an accuracy of about 3.57%.

With the increasing performance of Machine Learning Systems, the interpretability of the systems are gradually decreasing. This trend is seen more in Deep Learning Algorithms comprising of millions of parameters and hundreds of layers making it extremely difficult to interpret them as compared to basic machine learning algorithms like Linear Regression, K Nearest Neighbours, Decision Tree etc. These algorithms have become like black boxes taking input from the users and giving beyond expectation outputs but offering no intuition about the causal-effect information that led to such output. These algorithms may be suitable for tasks where accuracy is the main requirement like Kaggle Competitions and the data scientist does not have to interpret and explain the results to various stakeholders. But in applications where the result interpretability is very crucial then the black box nature leads to various obstacles. For example, if some image recognition system is trained to detect a tumour in images and performs very well in terms of accuracy both on validation and test set. But when you present the results to the stakeholders they question from what parts of the image your model is learning or what is the main cause of this output and your most probable answer would be “I don’t know” and no matter how perfect your model is, the stakeholders won’t accept it because human life is at stake .

With the increased research in the field of Machine Learning especially Deep Learning various efforts are being made to solve the problem of interpretability and reach the stage of interpretable AI.

In the case of CNN’s various visualization techniques have been discovered and one of them is Class Activation Maps (CAM).

Class Activation Maps or CAM which was introduced in the paper Learning Deep Features for Discriminative Localization by using the Global Average Pooling in CNNs. A Class Activation map for a particular category indicates the discriminative region used by CNN to identify the category.

ARCHITECTURE:

The Authors of the paper have used a Network Architecture similar to GoogLeNet and Network in Network . The Network mainly consists of a large number of convolutional layers and just before the final output layer, we perform Global Average Pooling. The features thus obtained are fed to a fully connected layer having with softmax activation which produces the desired output. We can identify the importance of the image regions by projecting back the weights of the output layer on the convolutional feature maps obtained from the last Convolution Layer. This technique is known as Class Activation Mapping.

VrIJJfB.jpg!web

Architecture and Working

The Global Average Pooling layer(GAP) is preferred over the Global MaxPooling Layer(GMP) which was used in Oquab et al because GAP layers help to identify the complete extent of the object as compared to GMP layer which identifies just one discriminative part. This is because in GAP we take an average across all the activation which helps to find all the discriminative regions while GMP layer just considers only the most discriminative one.

The spatial average of the feature map of each unit in the last convolutional layer is produced by the GAP layer which is then weighted summed to generate the final output. Similarly, we produce the weighted sum of the last convolutional layer to obtain our class activation map.

SIMPLE WORKING

After defining the architecture, let us see how things work to produce the Class Activation Maps. Let us consider an image containing an object for which we have trained our network. The softmax layer outputs probabilities for the various classes the model was trained on. By using Argmax of all the probabilities we find the class which is most likely to be the object present in the image. The weights of the final layer corresponding to that class are extracted. Also, the feature map from the last convolutional layer is extracted.

Finally, the dot product of the extracted weights from the final layer and the feature map is calculated to produce the class activation map. The class activation map is upsampled by using Bi-Linear Interpolation and superimposed on the input image to show the regions which the CNN model is looking at.

Implementation and Results:

I implemented the Class Activation maps as directed in the paper using Keras.

Initially, the following model architecture was used with 3 convolutional layers each followed by max-pooling layer, a final convolutional layer which is followed by the GAP layer and the final output layer with softmax activation.

IV7Fzam.png!web

Model Architecture

iqYZve2.png!web

Model 1 Summary

The above model is trained on the MNIST dataset and produced an accuracy of approx 99 % on training, validation and final test set.

The following outputs were obtained:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK