23

Why Do GANs Need So Much Noise?

 4 years ago
source link: https://towardsdatascience.com/why-do-gans-need-so-much-noise-1eae6c0fb177?gi=ad62665d01d9
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

J3yM7v7.jpg!web

Image Source: Pexels

Generative Adversarial Networks (GANs) are a tool for generating new, “fake” samples given a set of old, “real” samples. These samples can be practically anything: hand-drawn digits, photographs of faces, expressionist paintings, you name it. To do this, GANs learn the underlying distribution behind the original dataset. Throughout training, the generator approximates this distribution while the discriminator tells it what it got wrong, and the two alternatingly improve through an arms race. In order to draw random samples from the distribution, the generator is given random noise as input. But, have you ever wondered why GANs need random input? The common answer is “so they don’t generate the same thing every time”, and that’s true, but the answer is a bit more nuanced than that.

Random Sampling

Before we continue with GANs, let’s take a detour and consider sampling from the normal distribution. Suppose you want to do this in Python, but you never read the numpy docs and don’t know that np.random.normal() exists. Instead, all you’ve got to work with is random.random() , which produces values uniformly in the interval (0, 1).

u6nmAnA.png!web

Figure 1: A histogram of 100k samples drown from our input, uniform distribution (blue) and our target, normal distribution (orange).

In short, we want to transform the blue distribution into the orange distribution in figure 1. Fortunately, there is a function to do this: the inverse cumulative distribution function , also called the quantile function . The (non-inverted) cumulative distribution function, or CDF, illustrated in figure 2, describes the probability that any random value drawn from the distribution in question will be equal to or less than x , for some specified x .

JBVzaeb.png!web

Figure 2: The CDF of the standard normal distribution.

For instance, at the point x=0 in figure 2, y=0.5 ; this means that 50% of the distribution lies below zero. A handy quality of the CDF is that the output ranges from 0 to 1, which is exactly the input we have available to us from the random.random() function! If we invert the CDF (flip it on its side), we get the quantile function:

2qyUzyZ.png!web

Figure 3: The quantile function of the standard normal distribution.

This function gives us the exact relationship between the quantile (our x , ranging from 0 to 1) and the corresponding value in the normal distribution, allowing us to sample directly from the normal distribution. That is, f(random.random()) ~ N(0, 1), where each point in the input space corresponds to a unique point in the output space .

N7VVNj3.gif

Figure 4: An animation illustrating the uniform distribution (blue) being mapped to the normal distribution (orange) using the quantile function.

What does this have to do with GANs?

In the above scenario, we had the quantile function at our disposal, but what if we didn’t, and had to learn a mapping from the input space to the output space? That is exactly the problem that GANs aim to solve. In aprevious article, I illustrated how GANs can be used to sample from the normal distribution if you’re in a data emergency and don’t have the quantile function available to you. In this light, I find it much more helpful to think of GANs not as tools for random sampling, but as functions that map some k -dimensional latent (input) space to some p -dimensional sample (output) space, which can then be used to transform samples from the latent space to samples from the sample space. In this view, much like the quantile function, there’s no randomness involved.

With maps on the mind, let’s consider how we might draw random samples from a 2D normal distribution with only 1D random samples between 0 and 1 as input.

IBZriiU.png!web

Figure 5: A 2D normal distribution (orange) and a 1D uniform distribution (blue), each with 100k samples.

How would we map the 100k samples in that blue line to the 100k samples in the orange blob? There’s no good way to do it. Sure, we could use Peano curves , but then we lose the useful property of having points close together in the input space result in points close together in the output space, and vice-versa. It’s for this reason that the dimensionality of the latent space of a GAN must equal or exceed the dimensionality of its sample space. That way, the function has enough degrees of freedom to map the input to the output.

But just for fun, let’s visualize what happens when a GAN with only one-dimensional input is tasked with learning multi-dimensional distributions. The results hopefully won’t surprise you, but they are fun to watch.

2D Gaussian

Let’s start out with the issue illustrated in figure 5: mapping the 1D range between 0 and 1 to the 2D normal (or “Gaussian”) distribution. We will be using a typical vanilla GAN architecture (code available at the end of the article).

Qrimqqn.gif

Figure 6: A GAN with a latent dimension of 1 trying to learn the 2D Gaussian distribution. Grey points are samples drawn from the true distribution, red points are generated samples. Each frame is one training step.

As you can see, the poor thing is at a loss for what to do. Having only one degree of freedom, it is hardly able to explore the sample space. What’s worse, because the generated samples are so densely-packed in that 1D manifold (there are as many grey dots in this gif as red dots!), the discriminator is able to slack off, never having to try hard to discern the real points from the fakes, and as such the generator doesn’t get very useful information (and certainly not enough to learn a space-filling curve, even if it had the capacity!).

Figure 6 shows the first 600 training steps. After 30k, this was the result:

2Eviuez.png!web

Figure 7: The distribution learned by the GAN from figure 6 after 30k training steps.

It’s a cute little squiggle, but hardly a Gaussian distribution. The GAN completely failed to learn the mapping after 30k steps. For context, let’s consider how a GAN with the same architecture and training routine fares when given 2D, 3D, 10D, and 100D latent spaces to map to the above distribution:

aQFVvuv.png!web

Figure 8: Output from GANs with latent spaces of 2D, 3D, 10D, and 100D after 30k training steps

The 2D latent space GAN is much better than the 1D GAN above, but is still nowhere near the target distribution and had several obvious kinks in it. The 3D and 10D latent spaces produced GANs with visually convincing results, and the 100D GAN produced what appears to be a Gaussian distribution with the right variance but wrong mean. But, we should keep in mind that the high-dimensional GANs are cheating in this particular problem, since the mean of many uniform distributions is approximately normally-distributed.

Eight Gaussians

UbY7baU.png!web

Figure 9: The eight gaussians distribution

The eight Guassians distribution (figure 9) is exactly as it sounds: a mixture of eight 2D Gaussians arranged in a circle about the origin, each with small enough variance that they hardly overlap, and with zero covariance. Although the sample space is 2D, a reasonable encoding of this distribution has three dimensions: the first dimension being discrete and describing the mode (numbered one through eight), and the other two describing the x and y displacement from that mode, respectively.

I trained a GAN with latent_dim=1 on the eight Gaussians distribution 600 steps, and these were the results:

uq67ny6.gif

Figure 10: A GAN with a latent dimension of 1 trying to learn the eight Gaussians distribution. Grey points are samples drawn from the true distribution, red points are generated samples. Each frame is one training step.

As expected, the GAN struggles to learn an effective mapping. After 30k steps, this is the learned distribution:

2AFJ7je.png!web

Figure 11: The distribution learned by the GAN from figure 10 after 30k training steps.

The GAN is clearly struggling to map the 1D latent space to this 3D distribution: The right-most mode is ignored, a considerable number of samples are being generated between modes, and samples aren’t normally-distributed. For comparison, let’s consider four more GANs after 30k steps, with latent dimensions of 2, 3, 10, and 100:

myU7R32.png!web

Figure 12: Output from GANs with latent spaces of 2D, 3D, 10D, and 100D after 30k training steps

It’s hard to tell which is best without actually measuring the KL divergence between the true distribution and the learned distribution (coming soon™️ in a follow-up article!), but the low-dimensional GANs seem to produce fewer samples in the negative space between modes. Even more interesting, the 2D GAN does not show mode collapse, the 3D and 10D GANs show only slight mode collapse, and the 100D GAN failed to generate samples in two of the modes.

Spiral

eEnYz2N.png!web

Figure 13: Spiral distribution. The distribution decreases in density as the spiral extends outward from the circle, and is uniform in density laterally across the arm

The spiral distribution, illustrated in figure 13, is in some ways simpler than the eight Gaussians distribution. Having only one mode (albeit elongated and twisty), the GAN isn’t forced to discretize its continuous input. It can be described efficiently with two dimensions: one describing position along the spiral, the other describing position laterally within the spiral.

I trained a GAN with latent_dim=1 for 600 steps, and these were the results:

QRzMV3a.gif

Figure 14: A GAN with a latent dimension of 1 trying to fit the spiral distribution. Grey points are samples drawn from the true distribution, red points are generated samples. Each frame is one training step.

Again, the GAN struggles to learn an effective mapping. After 30k steps, this is the learned distribution:

7n2E3ya.png!web

Figure 15: The distribution learned by the GAN from figure 14 after 30k training steps.

Similar to the case of the eight Gaussians distribution, the GAN does a poor job of mapping the spiral distribution. Two regions of the spiral are omitted and many samples are generated in the negative space. I address this inefficient mapping problem in detail inanother article, so I won’t belabour the point here; instead, let’s consider four more GANs tasked with learning this distribution after 30k steps, again with latent dimensions of 2, 3, 10, and 100:

EfUzmmv.png!web

Figure 16: Output from GANs with latent spaces of 2D, 3D, 10D, and 100D after 30k training steps

Again, it’s hard to tell which is best without actually measuring the KL divergence, but the differences in coverage, uniformity, and amount of sampling in negative space are interesting to consider.

Closing Thoughts

It’s easy to get caught up in the GAN fervor and treat them like magic machines that use random numbers as fuel to pop out new samples. Understanding the fundamentals of how a tool works is essential to using it effectively and troubleshooting it when it breaks. With GANs, that means understanding that the generator is learning a mapping from some latent space to some sample space, and understanding how that learning unfolds. The extreme case of mapping a 1D distribution to a higher-dimensional distribution clearly illustrates how complicated this task is.

All code used in this project is available in the following GitHub repo:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK