Introduction to Generative Adversarial Networks (GANs)

Generative AI has been at the center of most talks around the field of AI, be it generating art, image synthesis or even style transfer. By using a concept known as Adversarial Training, a type of neural network architecture known as Generative Adversarial Networks or commonly called GANs have introduced a new paradigm for training generative models for a huge number of tasks.

Before we understand more about adversarial training and the working of GANs, I first wanted to talk about a very high level overview of what GANs are and how do they work. GANs basically are consist of two main components one of them is called the generator and the other is called the discriminator. To understand how both of these components work, lets talk about a scenario, A person inspired by the fortunes that some paintings of some famous artists earn decides to create a fake, not of the paintings but by telling people that what he is selling are some lesser known or early works by the famous painter. The only problem he has with this approach of earning quick money is that their are experts who will be able to find out that these arent actual works of the painter and identify the scam. For this our scammer hires one of these experts to help refine his work. The job is simple, the scammer would paint his work and ask the expert to identify it as a fake painting or a real work. As time passes, the scammer would refine his skill and once he is able to convince the expert that all these fake paintings are actual works of famous painters, he no longer needs the experts and can go on to work without him.

This is how a GAN works, consider the scammer to be the generator in a GAN and the art expert which he hires to be the discriminator. Both of these have a convolution neural network based architecture to generate and discriminate.

The generator network, starts with a random noise similar to a few random brush strokes on a canvas, and learns to generate synthetic data such as images or texts, that closely resemble real data samples.

The discriminator is a network which acts a binary classifier and is trained on identifying real and fake samples.

Once the generator is able to fool the discriminator into classifying generated data as real, the work of the discriminator is done.

The interplay between the generator and discriminator is what makes GANs unique. The adversarial relationship between the two networks leads to a dynamic equilibrium, where eventually the generator learns to generate increasingly realistic samples, while the discriminator becomes more proficient at distinguishing real from fake.

GANs also have a very important application in the field of augmentation tasks which are used to increase the size of the training by making changes to the existing data for example rotating the images at different angles in the case of image data. GANs have been found to be particularly useful in these cases because of their ability to learn data distributions and then generating new data that capture the underlying patterns and structures of the training data. Because of the iterative training process used to train GANs, they carry the ability to capture intricate details, textures and complex relationships present in the data, which results in them generating high-quality synthetic samples.

GANs have gained their popularity mostly owing to the field of image synthesis. They have been highly successful in generating realistic images from random noises, going on to synthesize images conditioned to specific attributes specified in the inputs such as style, texture etc and performing image-to-image translation to generate relevantly styled images. GANs have been used to create stunning artwork, transform images in different styles, fill in missing portions of the images and even to extend images in to full-fledged scenes.

The power of GANs clearly extends beyond images. These models have been also found applications in areas such as text-to-image synthesis, where textual descriptions are translated into corresponding visual representations and images. They are not only limited to images but also have been used for video generation, voice synthesis and into various other creative domains.

What are GANs and How Do They Work?

As I had briefly described in the previous part, GANs are a class of machine learning models that have gained significant attention in recent years owing to their capabilities of generating unseen, unknown synthetic data which closes resembles real data. 

The two components of the architecture the generator and the discriminator are two networks which make GANs so powerful and unique. 

The generator network starts with random noise as input and transforms it into a sample that resembles real data. Initially, the generator produces crude or random outputs, but through the adversarial training process these GANs go through, the generator learns to generate samples that are indistinguishable from real data by adjusting its parameters and ultimately fooling the discriminator to classify them as authentic.

The discriminator network works as a critic  and is trained to classify between real and generated samples. It is trained on both real and fake inputs and its goal is to learn to discriminate between them accurately. The discriminator’s objective is to become an expert at discriminating between real samples and generated samples. As the training process moves forward, the discriminator becomes more adept at identifying the subtle differences, pushing the generator to improve its output quality. 

What make GANs so unique and powerful is the iterative training process that these networks go through. For each of these iterations, the generator network and the discriminator network are updated using backpropagation and gradient descent. The goal of the generator being to generate samples that deceive the discriminator, while the discriminator strives to correctly classify the real and fake samples. This competitive nature against each other during the training process drives the architecture towards a dynamic equilibrium towards a situation where both networks continually improve. 

In terms of the loss functions, the process aims to optimize two distinct loss functions. The generator’s loss function is such that it encourages it to generate samples that the discriminator classifies as real, effectively minimizing the generator’s ability to be detected. On the other hand, the discriminator’s loss function aims to accurately distinguish between real and fake samples, maximizing its discriminative capabilities. 

Over iterations, the generator gradually learns to produce samples that are more and more realistic and that align with the underlying distribution of the training data. In an ideal system, the generator network eventually becomes capable of generating diverse samples that capture the intricacies and variations present in the real data.

The Architecture of Generative Adversarial Networks (GANs)

The generator network in GANs takes a random noise vector \mathbf{z} as input and transforms it into a synthetic sample \mathbf{x}_{\text{fake}} that resembles real data samples. The generator can be represented as a function G(\mathbf{z}; \theta_{\text{G}}), where \theta_{\text{G}} represents the parameters of the generator. It can be implemented using various neural network architectures, such as fully connected layers or convolutional layers. The generator’s objective is to generate samples that are indistinguishable from real data, fooling the discriminator.

The discriminator network, represented as D(\mathbf{x}; \theta_{\text{D}}), takes both real data samples \mathbf{x}_{\text{real}} and generated samples \mathbf{x}_{\text{fake}} as input. The discriminator outputs a probability D(\mathbf{x}) that indicates the likelihood of the input being a real sample. The parameters of the discriminator are denoted as \theta_{\text{D}}.
 
1. Generator Loss: The generator aims to generate samples that the discriminator classifies as real. The generator loss, denoted as J_{\text{G}}, can be defined as the cross-entropy loss between the discriminator’s predictions on generated samples and the target labels indicating they are real:
 
J_{\text{G}} = -\frac{1}{m}\sum_{i=1}^{m}\log D(\mathbf{x}_{\text{fake}}^{(i)})
 
where m is the batch size and \mathbf{x}_{\text{fake}}^{(i)} represents the i-th generated sample.
 
2. Discriminator Loss: The discriminator aims to correctly classify real and generated samples. The discriminator loss, denoted as
 J_{\text{D}}, can be defined as the sum of the cross-entropy losses for real and generated samples:
 
 J_{\text{D}} = -\frac{1}{m}\sum_{i=1}^{m}\left(\log D(\mathbf{x}_{\text{real}}^{(i)}) + \log(1 – D(\mathbf{x}_{\text{fake}}^{(i)}))\right)
 
The overall objective is to find the optimal parameters for both the generator and discriminator by solving the following optimization problem:
 
\min_{\theta_{\text{G}}}\max_{\theta_{\text{D}}} J_{\text{G}}(\theta_{\text{G}}, \theta_{\text{D}}) + J_{\text{D}}(\theta_{\text{G}}, \theta_{\text{D}})
 
This adversarial training process continues iteratively until the generator generates samples that are highly realistic and the discriminator becomes unable to distinguish between real and generated samples.

To those of whom who felt these alphabets going over their heads, lets understand it more step by step.

Generator Loss: 

  • The goal of the generator is simple, generate realistic samples to eventually fool the discriminator into classifying them as real. 
  • Simply put, we need to encourage the generator to generate samples that have a high probability of being classified as real by the discriminator.
  • So we want to try to maximize the discriminator’s loss with a ‘-‘ sign.
  • In mathematical terms, the loss of the discriminator is calculated using logarithms so, the loss function of the generator looks like,
J_G = -\frac{1}{m}\sum_{i=1}^{m}\log D(\mathbf{x}_{\text{fake}}^{(i)})

Discriminator Loss:

  • The discriminators goal is to correctly classify real and generated samples. 
  • Our goal is to make the discriminator assign a high probability to real samples and low probabilities to generated samples.
  • Discriminator loss ( J_D ) captures the accuracy of the discriminator in differentiating between real and fake samples. 
  • So, we calculate it as the sum of two cross-entropy terms: one for real samples and one for generated samples.
J_D = -\frac{1}{m}\sum_{i=1}^{m}\left(\log D(\mathbf{x}_{\text{real}}^{(i)}) + \log(1 – D(\mathbf{x}_{\text{fake}}^{(i)}))\right)

Where, \mathbf{x}_{\text{real}}^{(i)} represents the i-th sample and  \mathbf{x}_{\text{fake}}^{(i)} represents the i-th generated sample. 

The training process in GANs involves iteratively updating the generator and discriminator based on their respective losses. The generator aims to minimize its loss (J_G) by generating samples that fool the discriminator, while the discriminator aims to maximize its loss (J_D) by correctly classifying real and fake samples. This adversarial interplay between the two networks leads to the improvement of both the generator and discriminator over time.

The ultimate goal is to reach an equilibrium where the generator can produce highly realistic samples, and the discriminator becomes unable to differentiate between real and fake samples. This equilibrium represents successful training for GANs, as the generator has learned to capture the underlying patterns and structures of the real data distribution.

The ultimate goal of training GANs is to find the optimal parameters for both the generator and discriminator, where the generator generates highly realistic samples, and the discriminator becomes unable to differentiate them from real samples. Achieving this equilibrium represents successful training for GANs, where the generator has learned to capture the underlying patterns and structures of the real data distribution.

Challenges with Generative Adversarial Networks (GANs)

With all these advantages and benefits they offer, GANs are not immune to challenges and issues, some of the common challenges, we face while working with GANs are: 

 

Mode Collapse

When the generator’s learning start to become saturated, i.e., the generator learns to produce a limited variety of samples. failing to capture the entire data distribution. 

Mode collapses generally happen due to an imbalance between the generator and discriminator during training or when the discriminator becomes too strong compared to the generator.

Training Instability

Training a GAN is not always a stable process and can sometimes be challenging, especially during the initial phases when the generator and the discriminator continuously update their parameters in an adversarial manner, it leads to oscillations and difficulty in finding a stable equilibrium. 

This instability during the initial processes of training can manifest as the generator failing to improve or the discriminator overpowering the generator.

Vanishing/Exploding Gradients

Just like other deep neural network training, vanishing gradients are a problem with GANs as well, because even in GAN training we keep backpropagating gradients through multiple layers of both the generator and the discriminator.

Vanishing Gradients lead to a slow convergence and ineffective parameter updates, while exploding gradients introduce a lot of instability during training.

Evaluation and Metrics

Another challenge specific to GANs and not to other neural networks is that how do we assess the performance and the quality of GAN-generated samples. Traditional evaluation metrics such as accuracy or loss may not provide a comprehensive measure of the generated samples quality. 

Alternative evaluation methods, such as using humans to evaluate or specialized metrics like Inception Score or Frechet Inception Distance (FID), are mostly used but they also carry their own limitations. 

Dataset Bias and Generalization

GANs rely heavily on the training dataset to identify underlying patterns in the data and a lot more, making them highly sensitive to it. Even the quality of generated samples heavily depends on the data distribution. If the data available is limited in terms of diversity i.e., has a bias, the quality of generated samples may exhibit similar biases or lack generalization to unseen data. 

Conclusion

I hope that through this blog, I was able to give you a good understanding of GANs, I have quiet the experience working with GANs and would be delighted to clarify any doubts or discussions you want to have, feel free to reach out to me on any of the socials. Cheers!