

Overcome Overfitting During Instance Segmentation with Mask-RCNN
source link: https://towardsdatascience.com/overcome-overfitting-during-instance-segmentation-with-mask-rcnn-32db91f400bc?gi=88b2968bf1e1
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Introduction
Advancements in computer vision hold many promising applications such as self-driving cars or medical diagnosis. In these tasks, we rely on the machine’s ability to recognize objects.
There are four tasks related to object recognition we often see: classification and localization, object detection, semantic segmentation, and instance segmentation.
In classification and localization , we are interested in assigning the class label to the object in the image and drawing a bounding box around the object. In this task, the number of objects to be detected is fixed .
Object detectiondiffers from classification and localization because here, we do not make assumptions on the number of objects in the image beforehand. We start with a fixed set of object categories and we aim to assign the class label and draw the bounding box each time an object in these categories appears in the image.
In semantic segmentation , we assign a class label to each image pixel : all pixels belonging to the grass are labeled “grass”, those belonging to sheep are labeled “sheep”. Notably, this task does not make the difference between two sheep, for example.
Our task in this assignment is instance segmentation which builds on both object detection and semantic segmentation. As in object detection, we aim to label and localize all instances of objects in predefined categories. However, instead of generating bounding boxes for detected objects, we go further by identifying which pixels belong to the object, like in semantic segmentation. The difference with semantic segmentation is that instance segmentation draws a separate mask for each object instance, while semantic segmentation will use the same mask for all instances of the same class
In this article, we will train an instance segmentation model on a tiny Pascal VOC dataset with only 1,349 images for training, and 100 images for testing. The main challenge here will be to prevent the model from overfitting without using external data.
You can find the datasets used and the full training and inference pipeline on Github .
Data processing
The annotations are in the COCO format so we can use functions from pycocotools to retrieve class labels and masks. In this dataset, there are 20 categories in total.
Below are some visualizations of the training images and the associated masks. Different shades of the masks represent separate masks for several instances of the same object category.
The images are of varying size and aspect ratios so before feeding the images into the model, we resize each image to have dimension 500x500 . When the image dimensions are smaller than 500, we upscale the image so that the largest side is of length 500, and add zero paddings as necessary to obtain square images.
For the model to generalize well, especially on a limited dataset such as this one, data augmentation is key to overcome overfitting. For each image, a horizontal flip is performed with probability 0.5, the image is randomly cropped to a scale between 0.9 and 1 times the original dimension, a Gaussian blur with random standard deviation is performed with probability 0.5, the contrast is adjusted by a scale between 0.75 and 1.5, the brightness is adjusted by a scale between 0.8 and 1.2, and a series of random affine transformations are also applied such as scaling, translation, rotation, and shearing.
Mask-RCNN
We will use matterport’s implementation of Mask-RCNN for training. Though tempting, we will not use their pre-trained weights for MS COCO to show how we can obtain good results using only 1,349 training images.
Mask-RCNN was proposed in the Mask-RCNN paper in 2017 and it is an extension of Faster-RCNN by the same authors. Faster-RCNN is widely used for object detection in which the model generates bounding boxes around detected objects. Mask-RCNN takes it a step further by generating the object masks as well.
I will provide a quick overview of the model architecture below, and matterport published a great article that details their model implementation.
Recommend
-
38
README.md Flood-Filling Networks Flood-Filling Networks (FFNs) are a class of neural networks designed for instance segmentation of complex and large shapes, particularly in volume EM datasets...
-
155
README.md Path Aggregation Network for Instance Segmentation by Shu Liu, Lu Qi, Haifang Qin,
-
113
README.md You Only Look At CoefficienTs ...
-
86
ShapeMask: High-performance, large-scale instance segmentation with C...
-
18
Image Segmentation Using Mask R-CNN A simple tutorial to perform instance segmentation using Python and OpenCV
-
31
Instance Segmentation with PyTorch and Mask R-CNN
-
7
A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNNAt Athelas, we use Convolutional Neural Networks(CNNs) for a lot more than just c...
-
10
[Submitted on 26 Nov 2021] Mask Transfiner for High-Quality Instance Segmentation Download PDF...
-
8
Mask Transfiner Mask Transfiner for High-Quality Instance Segmentation [Mask Transfiner, CVPR 2022]. This is the official pytorch implementation of Transfiner built on...
-
5
Introduction From the 2000s onward, Many convolutional neural networks have been emerging, trying to push the li...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK