PyTorch Tutorial: Understanding and Implementing AutoEncoders

In our last section, we have seen what is ResNet and how to implement it. In this article, we will look at AutoEncoders and how to implement them in PyTorch.

What is AutoEncoder?

Well according to Wikipedia “It is an artificial neural network used to learn efficient data encoding”. Basically, autoencoder compresses the data or to put it in other words it transforms data of higher dimension to lower dimension by learning how to ignore noises. Encoder part in an autoencoder learns how to compress the data into lower dimensions, while the Decoder part learns how to reconstruct original data from the encoded data.

Autoencoder is heavily used in deepfake. The idea is to train two autoencoders both on different kinds of datasets. We use the first autoencoder’s encoder to encode the image and second autoencoder’s decoder to decode the encoded image. Here is an example of deepfake.

Deep Fake

The autoencoder is also used in GAN-Network for generating an image, image compression, image diagnosing, etc.

AutoEncoder Components

  1. Encoder:

Here the model learns how to compress or reduce the input dimensions of the input data to the encoded representation or lower representation.

  1. Decode:

Here the model learns how to reconstruct the encoded representation to its original form or close to its original form.

  1. Bottleneck:

It is the compressed representation of the input data. This is the lowest possible dimension of the input data.

  1. Reconstruction Loss:

This is the method which tells us how well the decoder performed in reconstructing data and how close the output is to the original data

Architecture

The network architecture for autoencoders can vary between a simple FeedForward network, LSTM network, or Convolutional Neural Network depending on the use case.

AutoEncoder
AutoEncoder architecture

Implementation

Let’s now implement a basic autoencoder. For the dataset, we will be using STL10. First, let’s import the necessary modules. Create a new file name main.py and write the following code :

#main.py

#! /usr/bin/env python
import torch
import numpy as np
import torchvision
import torch.nn as nn
from tqdm import tqdm
from AutoEncoder import AutoEncoder ## Our AutoEncoder Model
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
from torchvision.utils import save_image

__DEBUG__ = True
LOAD = True
PATH = "./autoencoder.pth"

DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

TRANSFORM = transforms.Compose([transforms.Resize((32,32)),transforms.ToTensor(),
                    transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])

EPOCHS = 50
BATCH_SIZE = 4

Downloading and transforming dataset

#main.py

def get_dataset(train = True):
    trainset = torchvision.datasets.STL10(root = '../dataset',
                            download = True,transform= TRANSFORM)
    trainLoader = torch.utils.data.DataLoader(trainset,batch_size = BATCH_SIZE,
                                        shuffle = True,num_workers = 8)
    return trainLoader

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

def showRandomImaged(train):
    # get some random training images
    dataiter = iter(train)
    images, labels = dataiter.next()

    # show images
    print(images.shape)
    imshow(torchvision.utils.make_grid(images))

The get_dataset  method will download and transform our data for our model. It takes one argument train is set to true it will give us a training dataset and if it is false it will give us a testing dataset. This method returns a DataLoader object which is used in training.

Now let’s write our AutoEncoder. Open new file name AutoEncoder.py and write the following code:

AutoEncoder

#AutoEncoder.py

# Encoder
class Encoder(nn.Module):

    def __init__(self):
        super(Encoder,self).__init__()

        self.layer1 = nn.Sequential(
        nn.Conv2d(3, 16, kernel_size=5,stride = 1, padding = 2),
        nn.ReLU(),
        nn.BatchNorm2d(16),
        nn.Conv2d(16,32,kernel_size = 5, stride = 1, padding = 2),
        nn.ReLU(),
        nn.BatchNorm2d(32))

        self.layer2 = nn.Sequential(
        nn.Conv2d(32, 64, kernel_size=5,stride = 1, padding = 2),
        nn.ReLU(),
        nn.BatchNorm2d(64),
        nn.Conv2d(64,128,kernel_size = 5, stride = 1, padding = 2),
        nn.ReLU())

        self.fc1 = nn.Linear(32*32*128,1000)
        self.fc2 = nn.Linear(1000,100)


    def forward(self,x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = x.view(x.size(0),-1)
        x = self.fc1(x)
        x = self.fc2(x)
        return x

In my previous article, I have explained why we import nn.Module and use super method. Now let jump to our layer1 which consists of two conv2d layers followed by ReLU activation function and BatchNormalization. self.layer1 takes 3 channels as an input and gives out 32 channels as output.

Similarly self.layer2 takes 32 channel as input and give out 128 channel as ouput.

Note: Here dimensions of the image is not being changed

Next, we create two fully connected layer layers self.fc1 and self.fc2.

In forward method we define how our data is followed first we pass the data to layer1 follow by layer2. After that, we flatten our 2D data to a 1D vector using x.view method. Now our data is ready to pass through a fully connected layer fc1 and fc2

Now let’s write our Decoder:


class Decoder(nn.Module):
    def __init__(self):
        super(Decoder,self).__init__()
        self.fc1 = nn.Linear(100,1000)
        self.fc2 = nn.Linear(1000,32*32*128)

        self.layer1 = nn.Sequential(
        nn.ConvTranspose2d(128, 64, kernel_size=5,stride = 1, padding = 2),
        nn.ReLU(),
        nn.BatchNorm2d(64),
        nn.ConvTranspose2d(64,32,kernel_size = 5, stride = 1, padding = 2),
        nn.ReLU(),
        nn.BatchNorm2d(32))

        self.layer2 = nn.Sequential(
        nn.ConvTranspose2d(32,16, kernel_size=5,stride = 1, padding = 2),
        nn.ReLU(),
        nn.BatchNorm2d(16),
        nn.ConvTranspose2d(16,3,kernel_size = 5, stride = 1, padding = 2),
        nn.ReLU())


    def forward(self,x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = x.view(x.size(0), 128, 32, 32)
        x = self.layer1(x)
        x = self.layer2(x)

        return x

As you can clearly see our Decoder is opposite to the Encoder. Here first we have two fully connected layers fc1 and fc2. The output of fc2 is fed to layer1 followed by layer2 which reconstructs our original image of 32x32x3.

Let’s now combine this to model:

#AutoEncoder.py

#AutoEncoder

class AutoEncoder(nn.Module):
    def __init__(self):
        super(AutoEncoder,self).__init__()
        self.encoder = Encoder()
        self.decoder = Decoder()

    def forward(self,x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

Training

#main.py

if __name__ == '__main__':
    if __DEBUG__ == True:
        print(DEVICE)
    train = get_dataset()
    if __DEBUG__ == True:
        print("Showing Random images from dataset")
        showRandomImaged(train)

    model = AutoEncoder().cuda() if torch.cuda.is_available()  else  AutoEncoder()
    if __DEBUG__ == True:
        print(model)

    criterian  = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(),weight_decay=1e-5)

    if LOAD == True:
        model.load_state_dict(torch.load(PATH))

    for epoch in range(EPOCHS):
        for i,(images,_) in enumerate(train):
            images = images.to(DEVICE)
            out = model(images)
            loss = criterian(out,images)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            ## LOG
            print(f"epoch {epoch}/{EPOCHS}\nLoss : {loss.data}")

            if __DEBUG__ == True:
                if i % 10 == 0:
                    out = out / 2 + 0.5     # unnormalize
                    img_path = "debug_img" + str(i) + ".png"
                    save_image(out,img_path)

        #SAVING
        torch.save(model.state_dict(),PATH)

For training, we have use MSELoss() and Adam optimizer. Next, we train our model to 50 epochs. Then we iterate to each of the training batches and pass these batches to our model. Then we calculate MSELoss(). Now before backpropagation, we make our gradient to be zero using optimzer.zero_grad() method. Then we call backword method on our loss variable to perform back-propagation. After gradient has been calculated we optimize our model with optimizer.step() method.

 

GITHUB

Sharing is caring!

1 thought on “PyTorch Tutorial: Understanding and Implementing AutoEncoders”

Leave a Comment

Your email address will not be published. Required fields are marked *

shares
Scroll to Top