GANs or ❓

GANs or :question:

In this blog post, I will discuss the alternative to using GANs more precisely. I would suggest the readers read theGANs to understand and compare the strategy explained below. Let us get started.

Rohan arorA

Nov 22 ·5min read

NvEjUnm.jpg!web

Image by Gerd Altmann from Pixabay

I hope by now, you all are very well kenned with the concept of GANs. But, still, we cannot achieve much accuracy with the GANs. If you have read my blog on GANs, the below was the output at the last stage.

QZ3EFzr.png!web

quE3meq.png!web

The mid image is what the GAN predicted. Though it is somewhat similar to the targeted one but still not the acceptable one because the eyes of the cat in the predicted image are not bright, the paws of the cat are not clear. These small-small things matter a lot if we start differentiating the cats based on the features.

We may improve the results to a more extent, but still, the GANs cannot come up with enough clearance of the features because we have not trained the model in that way. GAN does not care about the eyes of the cat, and it does not care about the paws and other things.

Can we get rid of the GANs?

Fastai has come up with something which provides more better, restoring the power of the images. Now, when we talk of a better model, the first thing that comes up in mind is to improve the loff function. If we could enhance the loss function, then our model will train better and hence, more accurate results. So, our primary purpose is to create a better loss function. Now, we could also come up with more complex architecture design, but then that would undoubtedly be the last option to implement.

Something that i concern here is explained in Perceptual Losses for Real-Time Style Transfer and Super-Resolution .

6NjIbeI.png!web

Image source — research paper mentioned above

Image Transformation Net — It is basically the UNet or generator part of the GANs. It produces the output, which is the image predicted.
The above type of model is known as generative models. In these types of models, we have a downsampling part, which is also known as the encoder and the upsampling part, which is known as the decoder.
Ŷ is the predicted image. Yc is the target image that we want to come up with.
Now, we pass the predicted image and the target image through the pre-trained image classification model like ResNet34, VGG-16, etc. These models are pre-trained on a lot of categories of images, and they classify the input images.
Typically, the output of that would tell you, “Hey, is this generated thing a dog, a cat, an airplane, or whatever.” But in the process of getting to that final classification, it goes through lots of different layers. In this case, they’ve color-coded all the layers with the same grid size and the feature map with the same color. So every time we switch colors, we’re switching grid size. So there’s a stride two convolution, or in VGG’s case, they use max-pool layer.
Now, what we could do is that rather than comparing the last layers, we could compare the activations of the middle layers and then find the pixel losses between the predicted image layers and the target image layers. If we could do so, then we could direct the middle layers of the model to be more specific and accurate.
Let us understand this concept of comparing the middle layers with an example.

The structure of the layers in the output before the max-pooling layer is 216*28*28. this means that we have 216 channels with image size of 28*28. Each channel tells us a different kind of features in the image. Maybe the 100th channel compares the eyeball of the cat image. If we could compare the 100th layer of the predicted image and the target image, then our model would be prepared better for the desired results.

That ought to go a long way towards fixing our eyeball problem because, in this case, the feature map is going to say, “there are eyeballs here (in the target image), but there isn’t in the generated version, so please more and do a better job. Make better eyeballs.” So that’s the idea. That’s what fastai calls feature losses or Johnson et al. called perceptual losses.

This is how we could improve the image restoration without using the GANs.

Let us understand how the loss function in practical. I am using the code from the fastai library.

class class FeatureLoss(nn.Module):
    def __init__(self, m_feat, layer_ids, layer_wgts):
        super().__init__()
self.m_feat = m_feat
self.loss_features = [self.m_feat[i] for i in layer_ids]
self.hooks = hook_outputs(self.loss_features, detach=False)
self.wgts = layer_wgtsdef make_features(self, x, clone=False):
self.m_feat(x)
return [(o.clone() if clone else o) for o in self.hooks.stored]

    def forward(self, input, target):
out_feat = self.make_features(target, clone=True)
in_feat = self.make_features(input)
self.feat_losses = [base_loss(input,target)]
self.feat_losses += [base_loss(f_in, f_out)*w
                             for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
self.feat_losses += [base_loss(gram_matrix(f_in), gram_matrix(f_out))*w**2 * 5e3
                             for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
 self.metrics = dict(zip(self.metric_names, self.feat_losses))
return sum(self.feat_losses)

    def __del__(self): self.hooks.remove()(nn.Module):

m_feat
layer_ids
out_feat
in_feat
self.feat_losses

So, this is how we improve the loss function. After training the model for quite a good time, we come up with the below output.

U7rya2N.png!web

Now, we predict the image, which is quite similar to the target image. Currently, our predicted cat has more justified features as compared to the target image.

N otably, this is how we could use something different than GANs. I would suggest the users explore the fastai library more to know more behind the approach, as mentioned above.

GANs or :question:

Can we get rid of the GANs?

Recommend

Use := for Assignment (Scala 3)

不止熊本熊，这位设计师简直是爆款制造机

36氪终身学习社群 | 10年赚2亿佣金，详解淘宝客规模化玩法

Smith Chart

Lightweight Filesystem sandboxing with eBPF

Removing WebSQL Support

多次尝试学习，终于搞懂了微服务架构

P6跨级晋升P8，再到P10，我的11年前端成长之路

网易员工拒绝被辞退，HR怒怼：听话主动离职，别找事，结果蒙了

GitHub - Hack-with-Github/Awesome-Hacking: A collection of various awesome lists...

About Joyk