A Comprehensive Guide to Generative Adversarial Networks (GANs)

This is the main component of the GAN. Here a generative network makes an attempt to learn the trends of the distribution and generate data belonging to this distribution.

So how do we do this? Well it depends. In case of our example, for generating dog images, our noise will undergo some De-Convolution Layers and finally, the output will be an image, hoping to be a dog’s. The generated image will be classified as real or fake by the discriminator. The generator will be penalized with more loss if the generated image is far from ‘being real’ and the loss will decrease as it makes progress in fooling the discriminator.

This is the basic idea behind a generator network. The architecture may differ with the use case. There are GAN architectures that are conditional i.e. there is some notion of a condition for generating data alongside the noise. Such additions may alter the architecture a bit, but the fundamental idea of the adversaries stand.

We now move on to the collective training of the generator and the discriminator.

Combined Training of the Generator and Discriminator as ‘The GAN’

So, GANs are great! But combined training of the generator and discriminator poses a problem for convergence. Let’s see how the GAN is trained.

The generator should be isolated from the discriminator and vice versa while training. In short, first the discriminator is trained for several epochs then the generator, and this sequence continues back and forth. This ensures that the training is in the right direction. Otherwise, training both at the same time would be like hitting a moving target and would possess higher chances of failure.

If you can’t train a classifier to tell the difference between real and generated data even for the initial random generator output, you can’t get the GAN training started.

— Google Developer Blog

Moreover, as the generator gets better at fooling the discriminator, as a corollary, the discriminator’s accuracy decreases. There comes a point where the discriminator reaches accuracy 0.5 which suggests that the discriminator has started predicting randomly i.e. on a coin toss (which it is obviously not supposed to do). This does not help the generator training in any way and may deteriorate its performance instead.

Hence, convergence in GANs is not stable and is a major issue.

Training Loss

The GANs paper defines the training loss as:

Minimax Loss via Google Developer Blog

This is the Minimax loss, the discriminator tries to maximize it and the generator tries to minimize log(1 – D(G(z))) as it cannot touch log(D(x)).

However, the paper also suggests modifying this loss for the generator to maximize log(D(G(z))).

A Few Use Cases (to get you thinking)

Vanilla GANs (the ones described in the GAN paper ) can be used to augment data for training in case of imbalanced or less data.
Deep Convolutional Generative Adversarial Networks or DCGAN are vanilla GANs with Convolutional Layers for image generation
The pix2pix model can be used to take a wireframe of a structure as an input and generate the complete structure as output. Other applications include coloring black-white images.
CycleGAN is used to transform images from one domain to another without any paired training samples
DeepFake Generation and Detection is one of the latest research topics in GANs which essentially generates edited or tampered images that look realistic to the naked eye. This can be easily done manually with the help of photo/video editing softwares and hence, is vulnerable to misuse. DeepFake detection can be derived as a corollary to generation.

I will try to cover the implementation of some of these GANs in future articles. So stay tuned!

Conclusion

The main takeaway from this article is the concept of adversarial training evaluation of a generative model. This gives rise to many branches for researchers to come up with similar evaluators for better model performance.

We have also seen the idea of generative and discriminative modeling, and how they differ from each other.

We studied how GANs work and the idea behind it. We’ve had a pragmatic approach overall; we addressed the training aspect of the GANs and the issues that arise while actually training the network.

Finally, we had a word on some well-known flavors of GANs.

References:

Original paper: https://arxiv.org/abs/1406.2661

For more on the mathematical transformational aspect of GANs, refer:

Combined Training of the Generator and Discriminator as ‘The GAN’

Training Loss

A Few Use Cases (to get you thinking)

Conclusion

References:

Recommend

拒绝 996，你做了什么？

分享自己用 Go 写的游戏加速器 IkaGo

阿里腾讯给你发 offer 但是必须 996 你去不？

Java 并发编程（三）：MESI、内存屏障

深度学习瞎学之路--感知器

万字长文带你一览ICLR2020最新Transformers进展（上）

如何实现抄袭文章的识别？这可能是种思路，不妨看看

我是最差的NLPer之Trie树

用深度学习实现 2D 到 3D 的转换

【GNN】GAN：Attention 在 GNN 中的应用

About Joyk