A review of Generative Adversarial Networks

A review on Generative Adversarial Networks

How did the GANs change the way machine learning works?

Jun 15 ·5min read

The history of deep learning has shown to be a bit unusual. Many practices, such as convolutional neural networks, invented in the 80s, had a comeback only after 20 years. While most of the methods had a comeback, Generative Adversarial Networks were one of the most innovative techniques to happen to deep learning in the past ten years. While discriminative networks with propagation and dropout algorithms with a well-behaved gradient shown to be very successful, it was not the same case with generative networks. Deep generative networks had issues with approximating intractable probabilistic computations during the estimation of maximum likelihood. Furthermore, it can not leverage the benefits of linear units in a generative context. GANs came to assist the field with these two issues while bringing both a generative and a discriminative network together.

GANs were first proposed by Goodfellowet al. [1] at the University of Montreal. The basic framework contains a generator working against an adversary, while the discriminator learns to tell if a sample belongs the data distribution or from the generative network. The idea is for these two networks to get better while competing against each other.

The most straightforward modelling is having both the discriminator and generator as a multilayer neural network. The generator learns the mapping from a latent space to data distribution, tending to become similar to the ground data distribution. The discriminator, on the other hand, tries distinguishing between real data distribution and what was generated from the generator. The goal of the generative network is to trick the discriminator into thinking that the novel data produced is coming from true data distribution; this way, it increases the discriminator’s error rate.

Figure 1: Building blocks of a GAN ( https://mc.ai/deep-convolutional-generative-adversarial-networksdcgans/ )

We should emphasise that the role of GAN is not to reproduce data used during training, instead to produce new data. We can describe it as a two-person game, these two networks opposing each-other, meaning that the end goal is achieving an equilibrium in which these trained networks have the best response to each other. At this point, they can not improve anymore, and the training stops. However, such an equilibrium is difficult to be achieved and even less maintained, and this is the first issue with GANs. Another problem is that there is no way to validate if the generator has learned to produce a distribution similar to real-life data distribution in a held-out dataset like other deep learning techniques.

In the original paper, it is experimentally shown that the amount of data and the depth of the network plays a huge role in a better performance. When the data point is an image that would mean that the amount of data should be exponential to the number of pixels. Given that images have hundreds/thousands of pixels that would mean better results are achieved in nets that can not be implemented yet with the available computational power and data.

GANs application has been extensive, from art, fashion, advertising, science to video games. However, these networks have also been adopted for malicious intents such as creating fake social media profiles using synthesised images produced with GANs. As we can see, its application is more extensive in the field of computer vision.

uQbENfz.png!web

Figure 2: A road map of GANs since the original paper, inspired by [9]

In Figure 1 a road map of GANs starting from the original paper is given. Because of the page limitation, I will list the mentioned methods briefly and what they tackle — many papers proceeding original work focus in modification during the training process.

Deep convolutional GANs (DCGANs) [2] have better performance since instead of defining the generator (G) and discriminator (D) with multilayer perceptrons it defines it with CNNs, when used with images. It does not have pooling layers, so to increase spatial dimensionality, it uses deconvolution. Normalise batches for all layers in G and D except for the last layer of D and the first layer of G, so the information on the correct mean of data distribution is not lost.

Changes in the training settings were proposed by ImprovedGANs [3] that have to do with minibatch discrimination, virtual batch normalisation and feature matching. Given that the original GANs suffer from low-resolution, LAPGAN [4] using CNNs within a Laplacian pyramid generate higher resolution images. Progressive GANs (PGGAN) [5] also propose a modification in training, based on progressive neural networks, to grow both discriminator and generator, from low to higher resolution by adding new layers progressively.

image-to-image translation traditionally to learn to map between an output and input image using a training set that contains aligned pairs. CycleGANs [6] use an adversarial loss to map an image from source domain X to a target domain Y, lacking pairs. Furthermore, they couple this loss with an inverse mapping achieving a cycle constancy.

Another issue with original GANs is mode collapse, which means they tend to produce similar samples, even when trained on diverse datasets. PACGANs [7] handle this issue with what they call packing. The main changes happen in the discriminative network, enabling the network to make decisions based on multiple samples from the same class, from both real and generated data distribution.

Self-Attention Generative Adversarial Network (SAGAN) [8] propose using long-range dependency modelling with attention for image generation. It uses spectral normalisation for G and D, and prove to improve the training process.

Another different way of using GANs has been training the generator on a single natural image, using a pyramid of FC GANs, each learns a distribution at different scales of the image.

A problem yet to be tackled in GANs is the fact that they assume that the generated samples have different generative parameters, which means they can not produce discrete data directly. Another open question is how to measure the uncertainty of a well trained generative network.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in advances in neural information processing systems, pp. 2672–2680, 2014.
A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,”arXiv preprint arXiv:1511.06434, 2015.
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” in Advances in neural information processing systems, pp. 2234–2242, 2016.
E. L. Denton, S. Chintala, R. Fergus, et al., “Deep generative image models using a laplacian pyramid of adversarial networks,” in advances in neural information processing systems, pp. 1486–1494, 2015.
T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,”arXiv preprint arXiv:1710.10196, 2017.
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, pp. 2223–2232, 2017.
Z. Lin, A. Khetan, G. Fanti, and S. Oh, “Pacgan: The power of two samples in generative adversarial networks,” in Advances in neural information processing systems, pp. 1498–1507,2018.
H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,”arXiv preprint arXiv:1805.08318, 2018.
J. Gui, Z. Sun, Y. Wen, D. Tao, and J. Ye, “A review on generative adversarial networks: Algorithms, theory, and applications,”arXiv preprint arXiv:2001.06937, 2020.

A review on Generative Adversarial Networks

How did the GANs change the way machine learning works?

Recommend

为什么还有人幻想A股能再来一次典型牛市呢

美国商务部：将允许美国公司与华为合作制定5G网络标准

Opinion: Case for removing master-slave terminology from music production tools

Bitcoin Can Scale on Ethereum Says Gavin Andresen

An exploratory statistical analysis of Akira and Ghost in the Shell

AI Lesson for Teachers, Teens, and Everyone In Between

Figma Auto Layout: Practical tips for dynamic designs

特斯拉5月在中国注册11565辆汽车环比增长150%

窗口期即将关闭：造车新势力靠什么活下去？

华为麒麟芯片探索独立上车，已和比亚迪签订合作

About Joyk