Autoencoder and GANs: A comparative study

In this article, I would like to shed some light on two machine learning models that have become very popular lately: Autoencoder and GAN. This is also a very high-level comparative study of their features.

What is an Autoencoder?

Let’s start with an autoencoder. What is it actually? An Autoencoder is a form of deep neural network that is quite adept at learning some dense representation of input data in an unsupervised manner. Well, if you haven’t understood that then let me tell you that the size of the input may be gargantuan and an autoencoder can represent the same in a laconic form that I’ve called the latent representation or coding.

The sense of being latent is that it’s not readily conspicuous from the initial representation of the data; rather some hidden patterns can be studied by a powerful model like an autoencoder. Also, this dense representation comes at no cost paid for labelling the data: it’s a pure unsupervised technique.

Where can an autoencoder become effective?

  1. One of the most important uses of Autoencoder is actually to reduce the dimensionality of the input data to make the training process faster. For smaller datasets, we normally use either PCA (Principal Component Analysis) or LLE (Locally Linear Embedding).
  2. Autoencoders are good for feature detection.
  3. Autoencoders can be used for unsupervised pretraining of deep neural networks. We can trace the origin of this technique back in 2006 when Geoffrey Hinton and his team proposed the RBM acronym for Restricted Boltzmann Machine, which led to the amelioration of deep neural networks. RBM has nowadays become antwacky; we use Autoencoder and GAN instead.
  4. Autoencoders can also be used as Generative Models. One can train an Autoencoder on a set of pictures of faces and the autoencoder would then become capable of generating fresh random data that can appear cognate with the original input data but the faces generated by the autoencoder in this way are not very down to earth: they might seem sometimes alien.

What is GAN?

Our second model of interest is GAN which is an acronym for generative adversarial networks. Unlike autoencoder, the faces generated by would seem to be more pragmatic. If you are really interested in the practical demonstrations of the autoencoder there are many websites on the internet that you can visit. Once you visit those demonstrations, I wager you won’t find anything different from reality.

GANs are now widely used for

  1. super-resolution (increasing the resolution of an image)
  2. colorization
  3. powerful image editing (e.g., replacing photo bombers with realistic background), turning a simple sketch into a photorealistic image,
  4. predicting the next frames in a video,
  5. augmenting a dataset (to train other models),
  6. generating other types of data (such as text, audio, and time series),
  7. identifying the weaknesses in other models and strengthening them, and more.

What have we learned about autoencoder and GAN it seems like they have striking similarities: both achieve their goals in an unsupervised manner, both are capable of condensing input data to coding. Both agree to be generative models. No wonder they must have application similarities as well. Despite these similarities inherently they work very differently.

Difference between Autoencoder and GAN

In short, an autoencoder simply learns how to copy the input data to the output. One might think of this task as an extremely frivolous one but it’s not so. Because we can constrain the network in various ways: by limiting the size of the latent representation or by adding noise to the input data and later training the network to recover the original input. So, by no means, an autoencoder can trivially copy the input data directly to the outputs.

This is the reason why the autoencoder learns some efficient representation of the data. In summary, we can say that the coding are the subsidiaries of the autoencoder learning the identity function under some constraints.

GANs, on the other hand, is composed of two neural networks: a generator that attempts to generate nouveau data that are very similar to the original input data and a discriminator that tries to distinguish between the real data and the counterfeit data. You can imagine a fierce battle that goes on between the generator and the discriminator.

You can really think about the movie ‘Catch me if you can’ where the generator played by ‘Leonardo DiCaprio’ constantly producing counterfeit money and the discriminator played by ‘Tom Hanks’ is trying to distinguish between real and fake. Adversarial training (training competing neural networks) is widely considered as one of the most important ideas in recent years. In 2016, Yann LeCun even said that it was “the most interesting idea in the last 10 years in Machine Learning.”

Wrapping It Up

I hope you learnt something new from this blog. Please share it with your friends and stay tuned for more updates. If you are an IT enthusiastic, and want to gather more knowledge, you can go through some of our blogs given below.


Dhruba Ray is an expert trainer having years of experience in disseminating proper training to aspirants. Possessing an in-depth knowledge of Data Analysis, Machine Learning and Deep Learning, he is a great enthusiast of modern and advanced technologies. He has sound knowledge in scikit learn and tensor flow and have developed several state-of-the-art CNN models from scratch and customize them for better results in image processing. His first-hand experience always works wonders for our students and their knowledge.