Pytorch Tutorials – Understanding and Implimenting ResNet

Deep Convolution Neural Network

In our last article, we have seen how a simple convolution neural network works. A Deep Convolution Neural Network is the network that consists of many hidden layers, for example, AlexNet which consists of 8 layers where the first 5 were convolutional layer and last 3 were full connected layer or VGGNet which consists of 16 convolution layer.

The problem with these deep neural networks was as you increase the layer we start seeing degradation problems. Or to put it in another word as we increase the depth of the network the accuracy gets saturated and starts degrading rapidly. In a deep neural network, as we perform back-propagation, repeated multiplication for finding optimal solution makes gradient very small which results in degradation. This problem is often called the vanishing gradient/exploding gradient.

ResNet(or Residual Network)

ResNet solve this degradation problem, is by skipping connection or layer. Skipping connection means, consider input x and this input is passed through a stack of neural network layers and produce f(x) and this f(x) is then added to original input x. So our output will be:

H(x) = f(x) + x

Pytorch Tutorials – Understanding and Implimenting ResNet [2020]

So, instead of mapping direct function of x -> y with a function f(x), here we define a residual function using f(x) = H(x) – x. Which can be reframed to H(x) = f(x) + x, where f(x) represent stack of non-linear layers and x represent identity function. From this if the identity mapping is optimal we can easily put f(x) = 0 simply by putting value of weight to 0. So the f(x) is what authors call residual function.

This mapping ensures that the higher layer will perform at least as good as the lower layer, and not worse.

Implementing ResNet

Now let’s implement the ResNet model. Here I will be using the ResNet18 model which consists of 18 layers. The dataset I will be using is dog-vs-cat which I have downloaded from Kaggle websites. Our model will classify images of dogs and cats.


Pytorch Tutorials – Understanding and Implimenting ResNet [2020]

In the above diagram first, we take input image which consists 3 channel(RGB) passed it to convolution layer of kernel_size = 3 and get 64 channel output. The convolution block between the curved arrow represents a Residual Block which will consist of:

convolution layer -> Batch Normalization -> ReLU activation -> convolution layer-> Batch Normalization.

The output of these residual blocks is then added to the initial input(i.e x) of the residual block. After adding the output is then passed to the ReLU activation function for the next layer.

The dotted arrow represents that the output dimensions of residual have changed so we also have to change the dimensions of the input which is passed to that residual block(i.e x) for adding it. Because adding is only possible if the dimensions are equal.

The last layer of this architecture is a Linear Layer which will take the input and gives us output i.e wheater it is dog or cat.


Let’s first our import necessary libraries:

from PIL import Image
import torch.optim as optim
from tqdm import tqdm
from torchvision import transforms
import torch.nn.functional as F
import torch.nn as nn
import torchvision.datasets as dt
import torch
import os 

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

PREPROCESS = transforms.Compose([transforms.Resize(256),
                        transforms.Normalize(mean = [0.485,0.456,0.406],std = [0.229,0.224,0.225])])

PyTorch provides very good class transforms which are used for modifying and transforming imagetransforms.Compose is used to combine or chained different transformations. This is used to build transformation pipeline.

Now let’s get out dataset:

def get_dataset(train = True):
    if train:
        trainset = dt.ImageFolder(root = "./train/",transform = PREPROCESS)
        train_loader =,batch_size = 8,shuffle=True)
        return train_loader
        testset = dt.ImageFolder(root = "./test/",transform = PREPROCESS)
        test_loader =,batch_size = 8,shuffle=True)
        return test_loader

Next let’s write our Residual Block:

class ResidualBlock(nn.Module):
    expansion = 1
    def __init__(self, inchannel, outchannel, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Sequential(
                        nn.Conv2d(inchannel, outchannel, kernel_size=3, stride=stride, padding=1, bias=False),
        self.conv2  = nn.Sequential(
                        nn.Conv2d(outchannel, outchannel, kernel_size=3, stride=1, padding=1, bias=False),
        self.skip = nn.Sequential()
        if stride != 1 or inchannel != self.expansion * outchannel:
            self.skip = nn.Sequential(
                nn.Conv2d(inchannel, self.expansion * outchannel, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion * outchannel)

    def forward(self, X):
        out = F.relu(self.conv1(X))
        out = self.conv2(out)
        out += self.skip(X)
        out = F.relu(out)
        return out

In the last article, I have explained why we use nn.Module in our class so, I am going to skip that part.

We have created two convolution layer self.conv1 and self.conv2 just like in the diagram. The self.skip is our shortcut layer which will be added to the output of self.conv2.

The “if” part in __init__() method checks whether the dimensions of self.conv2 will change or not. If it changes than we have to change the output dimensions of input by passing it to nn.Conv2d layer. In forward() method it is straight forward that how our data will flow.

Now let’s write our Model class or ResNet class:

class Model(nn.Module):
    def __init__(self, ResidualBlock, num_classes):
        super(Model, self).__init__()
        self.inchannel = 64
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False),
        self.layer1 = self.make_layer(ResidualBlock, 64,  2, stride=1)
        self.layer2 = self.make_layer(ResidualBlock, 128, 2, stride=2)
        self.layer3 = self.make_layer(ResidualBlock, 256, 2, stride=2)
        self.layer4 = self.make_layer(ResidualBlock, 512, 2, stride=2)
        self.fc = nn.Linear(512*ResidualBlock.expansion, num_classes)

    def make_layer(self, block, channels, num_blocks, stride):
        strides = [stride] + [1] * (num_blocks - 1)   
        layers = []
        for stride in strides:
            layers.append(block(self.inchannel, channels, stride))
            self.inchannel = channels * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.conv1(x))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, out.size()[3])
        out = torch.flatten(out,1 )
        out = self.fc(out)
        return out

In __init__() method self.conv1 is the layer where we will take our input image of channel 3 (RGB) and will produce 64 output channels. Then we create 4 layers using make_layer method and each layer consists of 2 ResidualBlock. And the last layer(self.fc) is our Linear layer which will give us output whether it is a dog or cat.

In forward method before passing it to self.fc layer we first flatten or reshape our matrics to 1D.

Now let’s define our loss function and optimizer:

if __name__ == '__main__':
    resnet = Model(ResidualBlock,num_classes = 2)
    if torch.cuda.is_available():

    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(resnet.parameters(),lr = 0.01)

I have used here CrossEntropyLoss() and SGD()optimizer.


Let’s train our model:

    train = get_dataset(train = True)

    for epoch in tqdm(range(10)):
        for i,(images,target) in enumerate(train):
            images =
            target =

            out = resnet(images)
            loss = criterion(out,target)

            # Back-propogation

            _,pred = torch.max(,1)
            correct = (pred == target).sum().item()

            if i % 100 == 0:
                print(f" epoch: {epoch}\tloss: {}\tAccuracy: {(correct/target.size(0)) * 100}%")futureinternet-10-00080-g002

I have used 10 epochs to train the model. optimizer.zero_grad() method is used to make gradient to 0. Next, we call backword() on our loss variable to perform back-propagation. After the gradient has been calculated we optimize our model by using optimizer.step() method.


    test = get_data(train = False)

    with torch.no_grad():
        correct = 0
        total = 0
        for i,(images,target) in tqdm(enumerate(test)):
            images =
            target =

            out = resnet(images)
            _,pred = torch.max(,1)
            total += target.size(0)
            correct += (pred == target).sum().item()
        print(f"Accuracy: {(correct/total) * 100}")

Since we don’t need to calculate weight during back-propagation while testing the model we use torch.no_grad method. The rest part is the same as training.

After 10 epochs I got an accuracy of 93.23%.

Sharing is caring!

7 thoughts on “Pytorch Tutorials – Understanding and Implimenting ResNet”

  1. I could not resist commenting. Exceptionally well written! I
    have been surfing online more than three hours today, yet I never found any interesting article like yours.
    It’s pretty worth enough for me. Personally, if all website owners and
    bloggers made good content as you did, the internet
    will be much more useful than ever before. I could not refrain from commenting.
    Exceptionally well written!

  2. My spouse and I stumbled over here from a different page
    and thought I should check things out. I like what I see
    so i am just following you. Look forward to looking
    into your web page again.

  3. “Having read this I thought it was very enlightening. I appreciate you taking the time and effort to put this article together. I once again find myself personally spending way too much time both reading and commenting. But so what, it was still worth it!”

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top