Inpainting with AI — get back your images! [PyTorch] - JOYK Joy of Geek, Geek News, Link all geek

Python-PyTorch

Inpainting with AI — get back your images! [PyTorch]

Solving the problem of Image Inpainting with PyTorch and Python

Apr 22 ·6min read

Zf2EJjV.jpg!web

Photo by James Pond on Unsplash

Did you know the old childhood photo you have in that dusty album can be restored? Yeah, that one in which everyone is holding hands and having the time of their lives! Don’t believe me? Check this out here —

Inpaintingis a conservation process where damaged, deteriorating, or missing parts of an artwork are filled in to present a complete image. [1] This process can be applied to both physical and digital art mediums such as oil or acrylic paintings, chemical photographic prints , 3-dimensional sculptures , or digital images and video . — https://en.wikipedia.org/wiki/Inpainting

Image inpainting is an active area of AI research where AI has been able to come up with better inpainting results than most artists. In this article, we are going to discuss image inpainting using neural networks — specifically context encoders. This article explains and implements the research work on context encoders that was presented in CVPR 2016.

Context Encoders

To get started with Context Encoders, we have to learn what are autoencoders . An autoencoder structurally consists of an encoder, a decoder and a bottleneck. The general autoencoder aims to reduce image size by ignoring the noise in the image. Autoencoders are however not specific for images and can be extended to other data as well. There are specific variants of autoencoders to fulfill specific tasks.

6F3Ezq7.png!web

Autoencoder Architecture

Now that we know about autoencoders we can describe context encoders as an analogy to autoencoders. A context encoder is a convolutional neural network trained to generate the contents of an arbitrary image region on the basis of its surroundings — i.e. a context encoder takes in the surrounding data of the image region and tries to generate something that would fit into the image region. Same as we fitted jigsaw puzzles when we were small — only we didn’t have to generate the puzzle pieces ;)

Our context encoder here consists of an encoder capturing the context of an image into a compact latent feature representation and a decoder which uses that representation to produce the missing image content. Missing image content? — Since we need an enormous dataset to train a neural network, we cannot afford to work with just the inpainting problem images. So we block out portions of images from normal image datasets to create an inpainting problem and feed the images to the neural network, thus creating missing image content at the region we block.

[It is important to note here that the images fed to the neural network have too many missing portions for classical inpainting methods to work at all.]

Use of GAN

GANs or Generative Adversarial Networks have been shown to be extremely useful for image generation. Generative Adversarial Networks run on a basic principle of a generator trying to ‘fool’ a discriminator and a determined discriminator trying to get hold of the generator. In other words, two networks trying to minimize and maximize a loss function respectively.

More about GANs here — https://medium.com/@hmrishavbandyopadhyay/generative-adversarial-networks-hard-not-eea78c1d3c95

Region Masks

Region Masks are the portion of images we block out so that we can feed the generated inpainting problems to the model. By blocking out, we just set the pixel value to zero for that image region. Now, there are 3 ways we can do this —

Central Region: The simplest way of blocking out image data is to set a central square patch as zero. Although the network learns inpainting, we face the problem of generalization. The network fails to generalize well and only low level features are learned.
Random Block: To counter the problem of the network ‘latching’ onto the masked region boundary as in central region mask, the masking process is randomized. Instead of choosing a single square patch as mask, a number of overlapping square masks are set up which take up to 1/4 of the image.
Random Region: The Random Block masking, however, still has sharp boundaries for the network to latch onto. To deal with this, arbitrary shapes have to be removed from images. Arbitrary shapes can be obtained from the PASCAL VOC 2012 dataset, deformed and placed as masks at random image locations.

r6BrUfr.png!web

From left — a)Central region mask, b) Random block mask, c)Random region mask [ source: https://arxiv.org/abs/1604.07379 ]

In here, I have implemented only the Central Region masking method as this is just a guide to get you started on inpainting with AI. Feel free to try with other masking methods and let me know about the results in the comments!

Structure

By now, you should have some idea about the model. Let’s see if you’re correct ;)

The model consists of an encoder and a decoder section, building up the context-encoder part of the model. This part also acts as the generator which generates data and tries to fool the discriminator. The discriminator consists of convolution networks followed by a Sigmoid function that finally gives a single scalar as output.

Loss

The loss function of the model is divided into 2 parts:

Reconstruction Loss — The reconstruction loss is a L2 loss function. It helps to capture the overall structure of the missing region and coherence with regards to its context. Mathematically, it is expressed as —

L2 loss

It is important to note here that only using L2 loss would give us a blurry image. Because having a blurry image reduces the mean pixel wise error and thus the L2 loss is minimized — but not in a way we want it to.

2. Adversarial Loss — This tries to make the prediction ‘look’ real (remember the generator has to fool the discriminator!) and this helps us in getting over the blurry image that the L2 loss would have led us into. Mathematically, we can express it as —

Adversarial Loss

Here an interesting observation is that the adversarial loss encourages the entire output to look real and not just the missing part. The adversarial network, in other words, gives the whole image a realistic look.

The total loss function:

Total loss of the model

Let’s build it!

Now since we have cleared the main points of the network lets get down to building the model. I will first build the model structure and then will get down to the training and the loss function part. The model will be built with the help of the PyTorch library on python.

Let’s start with the generator network:

The generator model for the network — implemented as a python module

Now, the discriminator network:

The discriminator network — implemented as a module

Let’s start training the network now. We will set the batch-size to 64, and the number of epochs to 100. The learning rate is set to 0.0002.

Training module for training the generator and the discriminator

Results

Let’s take a glance at what our model has been able to build!

Images at the zeroth epoch(noise) —

UZ3iUfI.png!web

Image at zeroth epoch

Images at the 100th epoch —

3aau2aJ.png!web

Images at the 100th epoch

Let’s see what went into the model —

iym6jmV.png!web

Central Region Masked image

That from this ? Yeah! Pretty cool, huh?

Implement your version of the model. Watch it recreate your childhood photos — and if you are good enough, you might just recreate the future of Inpainting with AI. So, what are you waiting for?

Let me know in the comments if anything goes wrong with your implementation. Here to help :)

Inpainting with AI — get back your images! [PyTorch]

Python-PyTorch