In our last section, we have seen what is ResNet and how to implement it. In this article, we will look at AutoEncoders and how to implement them in PyTorch.
What is AutoEncoder?
Well according to Wikipedia “It is an artificial neural network used to learn efficient data encoding”. Basically, autoencoder compresses the data or to put it in other words it transforms data of higher dimension to lower dimension by learning how to ignore noises. Encoder part in an autoencoder learns how to compress the data into lower dimensions, while the Decoder part learns how to reconstruct original data from the encoded data.
Autoencoder is heavily used in deepfake. The idea is to train two autoencoders both on different kinds of datasets. We use the first autoencoder’s encoder to encode the image and second autoencoder’s decoder to decode the encoded image. Here is an example of deepfake.

The autoencoder is also used in GAN-Network for generating an image, image compression, image diagnosing, etc.
AutoEncoder Components
- Encoder:
Here the model learns how to compress or reduce the input dimensions of the input data to the encoded representation or lower representation.
- Decode:
Here the model learns how to reconstruct the encoded representation to its original form or close to its original form.
- Bottleneck:
It is the compressed representation of the input data. This is the lowest possible dimension of the input data.
- Reconstruction Loss:
This is the method which tells us how well the decoder performed in reconstructing data and how close the output is to the original data
Architecture
The network architecture for autoencoders can vary between a simple FeedForward network, LSTM network, or Convolutional Neural Network depending on the use case.

Implementation
Let’s now implement a basic autoencoder. For the dataset, we will be using STL10. First, let’s import the necessary modules. Create a new file name main.py
and write the following code :
#main.py
#! /usr/bin/env python
import torch
import numpy as np
import torchvision
import torch.nn as nn
from tqdm import tqdm
from AutoEncoder import AutoEncoder ## Our AutoEncoder Model
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
from torchvision.utils import save_image
__DEBUG__ = True
LOAD = True
PATH = "./autoencoder.pth"
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
TRANSFORM = transforms.Compose([transforms.Resize((32,32)),transforms.ToTensor(),
transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
EPOCHS = 50
BATCH_SIZE = 4
Downloading and transforming dataset
#main.py
def get_dataset(train = True):
trainset = torchvision.datasets.STL10(root = '../dataset',
download = True,transform= TRANSFORM)
trainLoader = torch.utils.data.DataLoader(trainset,batch_size = BATCH_SIZE,
shuffle = True,num_workers = 8)
return trainLoader
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
def showRandomImaged(train):
# get some random training images
dataiter = iter(train)
images, labels = dataiter.next()
# show images
print(images.shape)
imshow(torchvision.utils.make_grid(images))
The get_dataset
method will download and transform our data for our model. It takes one argument train
is set to true it will give us a training dataset and if it is false it will give us a testing dataset. This method returns a DataLoader object which is used in training.
Now let’s write our AutoEncoder. Open new file name AutoEncoder.py
and write the following code:
AutoEncoder
#AutoEncoder.py
# Encoder
class Encoder(nn.Module):
def __init__(self):
super(Encoder,self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(3, 16, kernel_size=5,stride = 1, padding = 2),
nn.ReLU(),
nn.BatchNorm2d(16),
nn.Conv2d(16,32,kernel_size = 5, stride = 1, padding = 2),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer2 = nn.Sequential(
nn.Conv2d(32, 64, kernel_size=5,stride = 1, padding = 2),
nn.ReLU(),
nn.BatchNorm2d(64),
nn.Conv2d(64,128,kernel_size = 5, stride = 1, padding = 2),
nn.ReLU())
self.fc1 = nn.Linear(32*32*128,1000)
self.fc2 = nn.Linear(1000,100)
def forward(self,x):
x = self.layer1(x)
x = self.layer2(x)
x = x.view(x.size(0),-1)
x = self.fc1(x)
x = self.fc2(x)
return x
In my previous article, I have explained why we import nn.Module
and use super
method. Now let jump to our layer1
which consists of two conv2d
layers followed by ReLU activation function and BatchNormalization
. self.layer1
takes 3 channels as an input and gives out 32 channels as output.
Similarly self.layer2
takes 32 channel as input and give out 128 channel as ouput.
Note: Here dimensions of the image is not being changed
Next, we create two fully connected layer layers self.fc1
and self.fc2
.
In forward
method we define how our data is followed first we pass the data to layer1
follow by layer2
. After that, we flatten our 2D data to a 1D vector using x.view
method. Now our data is ready to pass through a fully connected layer fc1
and fc2
Now let’s write our Decoder:
class Decoder(nn.Module):
def __init__(self):
super(Decoder,self).__init__()
self.fc1 = nn.Linear(100,1000)
self.fc2 = nn.Linear(1000,32*32*128)
self.layer1 = nn.Sequential(
nn.ConvTranspose2d(128, 64, kernel_size=5,stride = 1, padding = 2),
nn.ReLU(),
nn.BatchNorm2d(64),
nn.ConvTranspose2d(64,32,kernel_size = 5, stride = 1, padding = 2),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer2 = nn.Sequential(
nn.ConvTranspose2d(32,16, kernel_size=5,stride = 1, padding = 2),
nn.ReLU(),
nn.BatchNorm2d(16),
nn.ConvTranspose2d(16,3,kernel_size = 5, stride = 1, padding = 2),
nn.ReLU())
def forward(self,x):
x = self.fc1(x)
x = self.fc2(x)
x = x.view(x.size(0), 128, 32, 32)
x = self.layer1(x)
x = self.layer2(x)
return x
As you can clearly see our Decoder is opposite to the Encoder. Here first we have two fully connected layers fc1
and fc2
. The output of fc2
is fed to layer1
followed by layer2
which reconstructs our original image of 32x32x3.
Let’s now combine this to model:
#AutoEncoder.py
#AutoEncoder
class AutoEncoder(nn.Module):
def __init__(self):
super(AutoEncoder,self).__init__()
self.encoder = Encoder()
self.decoder = Decoder()
def forward(self,x):
x = self.encoder(x)
x = self.decoder(x)
return x
Training
#main.py
if __name__ == '__main__':
if __DEBUG__ == True:
print(DEVICE)
train = get_dataset()
if __DEBUG__ == True:
print("Showing Random images from dataset")
showRandomImaged(train)
model = AutoEncoder().cuda() if torch.cuda.is_available() else AutoEncoder()
if __DEBUG__ == True:
print(model)
criterian = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),weight_decay=1e-5)
if LOAD == True:
model.load_state_dict(torch.load(PATH))
for epoch in range(EPOCHS):
for i,(images,_) in enumerate(train):
images = images.to(DEVICE)
out = model(images)
loss = criterian(out,images)
optimizer.zero_grad()
loss.backward()
optimizer.step()
## LOG
print(f"epoch {epoch}/{EPOCHS}\nLoss : {loss.data}")
if __DEBUG__ == True:
if i % 10 == 0:
out = out / 2 + 0.5 # unnormalize
img_path = "debug_img" + str(i) + ".png"
save_image(out,img_path)
#SAVING
torch.save(model.state_dict(),PATH)
For training, we have use MSELoss()
and Adam
optimizer. Next, we train our model to 50 epochs. Then we iterate to each of the training batches and pass these batches to our model
. Then we calculate MSELoss()
. Now before backpropagation, we make our gradient to be zero using optimzer.zero_grad()
method. Then we call backword
method on our loss variable to perform back-propagation. After gradient has been calculated we optimize our model with optimizer.step()
method.
Hi nice website