31

GANs or ❓

 4 years ago
source link: https://towardsdatascience.com/gans-or-d0fb38ff8ddb?gi=ac36318c8a88
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

GANs or :question:

In this blog post, I will discuss the alternative to using GANs more precisely. I would suggest the readers read theGANs to understand and compare the strategy explained below. Let us get started.

N3yyymm.jpg!web

Nov 22 ·5min read

NvEjUnm.jpg!web

Image by Gerd Altmann from Pixabay

I hope by now, you all are very well kenned with the concept of GANs. But, still, we cannot achieve much accuracy with the GANs. If you have read my blog on GANs, the below was the output at the last stage.

QZ3EFzr.png!web

quE3meq.png!web

The mid image is what the GAN predicted. Though it is somewhat similar to the targeted one but still not the acceptable one because the eyes of the cat in the predicted image are not bright, the paws of the cat are not clear. These small-small things matter a lot if we start differentiating the cats based on the features.

We may improve the results to a more extent, but still, the GANs cannot come up with enough clearance of the features because we have not trained the model in that way. GAN does not care about the eyes of the cat, and it does not care about the paws and other things.

Can we get rid of the GANs?

Fastai has come up with something which provides more better, restoring the power of the images. Now, when we talk of a better model, the first thing that comes up in mind is to improve the loff function. If we could enhance the loss function, then our model will train better and hence, more accurate results. So, our primary purpose is to create a better loss function. Now, we could also come up with more complex architecture design, but then that would undoubtedly be the last option to implement.

Something that i concern here is explained in Perceptual Losses for Real-Time Style Transfer and Super-Resolution .

6NjIbeI.png!web

Image source — research paper mentioned above
  • Image Transformation Net — It is basically the UNet or generator part of the GANs. It produces the output, which is the image predicted.
  • The above type of model is known as generative models. In these types of models, we have a downsampling part, which is also known as the encoder and the upsampling part, which is known as the decoder.
  • Ŷ is the predicted image. Yc is the target image that we want to come up with.
  • Now, we pass the predicted image and the target image through the pre-trained image classification model like ResNet34, VGG-16, etc. These models are pre-trained on a lot of categories of images, and they classify the input images.
  • Typically, the output of that would tell you, “Hey, is this generated thing a dog, a cat, an airplane, or whatever.” But in the process of getting to that final classification, it goes through lots of different layers. In this case, they’ve color-coded all the layers with the same grid size and the feature map with the same color. So every time we switch colors, we’re switching grid size. So there’s a stride two convolution, or in VGG’s case, they use max-pool layer.
  • Now, what we could do is that rather than comparing the last layers, we could compare the activations of the middle layers and then find the pixel losses between the predicted image layers and the target image layers. If we could do so, then we could direct the middle layers of the model to be more specific and accurate.
  • Let us understand this concept of comparing the middle layers with an example.

The structure of the layers in the output before the max-pooling layer is 216*28*28. this means that we have 216 channels with image size of 28*28. Each channel tells us a different kind of features in the image. Maybe the 100th channel compares the eyeball of the cat image. If we could compare the 100th layer of the predicted image and the target image, then our model would be prepared better for the desired results.

  • That ought to go a long way towards fixing our eyeball problem because, in this case, the feature map is going to say, “there are eyeballs here (in the target image), but there isn’t in the generated version, so please more and do a better job. Make better eyeballs.” So that’s the idea. That’s what fastai calls feature losses or Johnson et al. called perceptual losses.

This is how we could improve the image restoration without using the GANs.

Let us understand how the loss function in practical. I am using the code from the fastai library.

class class FeatureLoss(nn.Module):
def __init__(self, m_feat, layer_ids, layer_wgts):
super().__init__()
self.m_feat = m_feat
self.loss_features = [self.m_feat[i] for i in layer_ids]
self.hooks = hook_outputs(self.loss_features, detach=False)
self.wgts = layer_wgts
def make_features(self, x, clone=False):
self.m_feat(x)
return [(o.clone() if clone else o) for o in self.hooks.stored]

def forward(self, input, target):
out_feat = self.make_features(target, clone=True)
in_feat = self.make_features(input)
self.feat_losses = [base_loss(input,target)]
self.feat_losses += [base_loss(f_in, f_out)*w
for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
self.feat_losses += [base_loss(gram_matrix(f_in), gram_matrix(f_out))*w**2 * 5e3
for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
self.metrics = dict(zip(self.metric_names, self.feat_losses))
return sum(self.feat_losses)

def __del__(self): self.hooks.remove()(nn.Module):
m_feat
layer_ids
out_feat
in_feat
self.feat_losses

So, this is how we improve the loss function. After training the model for quite a good time, we come up with the below output.

U7rya2N.png!web

Now, we predict the image, which is quite similar to the target image. Currently, our predicted cat has more justified features as compared to the target image.

N otably, this is how we could use something different than GANs. I would suggest the users explore the fastai library more to know more behind the approach, as mentioned above.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK