## Deep Convolution Neural Network

In our last article, we have seen how a simple convolution neural network works. A **Deep Convolution Neural Network** is the network that consists of many hidden layers, for example, *AlexNet* which consists of 8 layers where the first 5 were convolutional layer and last 3 were full connected layer or *VGGNet* which consists of 16 convolution layer.

The problem with these deep neural networks was as you increase the layer we start seeing degradation problems. Or to put it in another word as we increase the depth of the network the accuracy gets saturated and starts degrading rapidly. In a deep neural network, as we perform back-propagation, repeated multiplication for finding optimal solution makes gradient very small which results in degradation. This problem is often called the vanishing gradient/exploding gradient.

## ResNet(or Residual Network)

ResNet solve this degradation problem, is by skipping connection or layer. Skipping connection means, consider input x and this input is passed through a stack of neural network layers and produce f(x) and this f(x) is then added to original input x. So our output will be:

H(x) = f(x) + x

So, instead of mapping direct function of x -> y with a function f(x), here we define a residual function using f(x) = H(x) – x. Which can be reframed to H(x) = f(x) + x, where f(x) represent stack of non-linear layers and x represent identity function. From this if the identity mapping is optimal we can easily put f(x) = 0 simply by putting value of weight to 0. So the f(x) is what authors call residual function.

This mapping ensures that the higher layer will perform at least as good as the lower layer, and not worse.

## Implementing ResNet

Now let’s implement the ResNet model. Here I will be using the ResNet18 model which consists of 18 layers. The dataset I will be using is *dog-vs-cat* which I have downloaded from Kaggle websites. Our model will classify images of dogs and cats.

### Architecture

In the above diagram first, we take input image which consists 3 channel(RGB) passed it to convolution layer of *kernel_size* = 3 and get 64 channel output. The convolution block between the curved arrow represents a *Residual Block* which will consist of:

convolution layer -> Batch Normalization -> ReLU activation -> convolution layer-> Batch Normalization.

The output of these residual blocks is then added to the initial input(i.e x) of the residual block. After adding the output is then passed to the ReLU activation function for the next layer.

The dotted arrow represents that the output dimensions of residual have changed so we also have to change the dimensions of the input which is passed to that residual block(i.e x) for adding it. Because adding is only possible if the dimensions are equal.

The last layer of this architecture is a *Linear Layer* which will take the input and gives us output i.e wheater it is dog or cat.

### Code

Let’s first our import necessary libraries:

```
from PIL import Image
import torch.optim as optim
from tqdm import tqdm
from torchvision import transforms
import torch.nn.functional as F
import torch.nn as nn
import torchvision.datasets as dt
import torch
import os
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
PREPROCESS = transforms.Compose([transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean = [0.485,0.456,0.406],std = [0.229,0.224,0.225])])
```

PyTorch provides very good class `transforms`

which are used for modifying and transforming image`transforms.Compose`

is used to combine or chained different transformations. This is used to build transformation pipeline.

Now let’s get out dataset:

```
def get_dataset(train = True):
if train:
trainset = dt.ImageFolder(root = "./train/",transform = PREPROCESS)
train_loader = torch.utils.data.DataLoader(trainset,batch_size = 8,shuffle=True)
return train_loader
else:
testset = dt.ImageFolder(root = "./test/",transform = PREPROCESS)
test_loader = torch.utils.data.DataLoader(trainset,batch_size = 8,shuffle=True)
return test_loader
```

Next let’s write our *Residual Block*:

```
class ResidualBlock(nn.Module):
expansion = 1
def __init__(self, inchannel, outchannel, stride=1):
super(ResidualBlock, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(inchannel, outchannel, kernel_size=3, stride=stride, padding=1, bias=False),
nn.BatchNorm2d(outchannel),
)
self.conv2 = nn.Sequential(
nn.Conv2d(outchannel, outchannel, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(outchannel)
)
self.skip = nn.Sequential()
if stride != 1 or inchannel != self.expansion * outchannel:
self.skip = nn.Sequential(
nn.Conv2d(inchannel, self.expansion * outchannel, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(self.expansion * outchannel)
)
def forward(self, X):
out = F.relu(self.conv1(X))
out = self.conv2(out)
out += self.skip(X)
out = F.relu(out)
return out
```

In the last article, I have explained why we use `nn.Module`

in our class so, I am going to skip that part.

We have created two convolution layer `self.conv1`

and `self.conv2`

just like in the diagram. The `self.skip`

is our shortcut layer which will be added to the output of `self.conv2`

.

The “if” part in `__init__()`

method checks whether the dimensions of `self.conv2`

will change or not. If it changes than we have to change the output dimensions of input by passing it to `nn.Conv2d`

layer. In `forward()`

method it is straight forward that how our data will flow.

Now let’s write our Model class or ResNet class:

```
class Model(nn.Module):
def __init__(self, ResidualBlock, num_classes):
super(Model, self).__init__()
self.inchannel = 64
self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(64),
)
self.layer1 = self.make_layer(ResidualBlock, 64, 2, stride=1)
self.layer2 = self.make_layer(ResidualBlock, 128, 2, stride=2)
self.layer3 = self.make_layer(ResidualBlock, 256, 2, stride=2)
self.layer4 = self.make_layer(ResidualBlock, 512, 2, stride=2)
self.fc = nn.Linear(512*ResidualBlock.expansion, num_classes)
def make_layer(self, block, channels, num_blocks, stride):
strides = [stride] + [1] * (num_blocks - 1)
layers = []
for stride in strides:
layers.append(block(self.inchannel, channels, stride))
self.inchannel = channels * block.expansion
return nn.Sequential(*layers)
def forward(self, x):
out = F.relu(self.conv1(x))
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = F.avg_pool2d(out, out.size()[3])
out = torch.flatten(out,1 )
out = self.fc(out)
return out
```

In `__init__()`

method `self.conv1`

is the layer where we will take our input image of channel 3 (RGB) and will produce 64 output channels. Then we create 4 layers using `make_layer`

method and each layer consists of 2 `ResidualBlock`

. And the last layer(`self.fc`

) is our `Linear`

layer which will give us output whether it is a *dog or cat*.

In `forward`

method before passing it to `self.fc`

layer we first `flatten`

or reshape our matrics to 1D.

Now let’s define our loss function and optimizer:

```
if __name__ == '__main__':
resnet = Model(ResidualBlock,num_classes = 2)
if torch.cuda.is_available():
resnet.cuda()
print(resnet)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(resnet.parameters(),lr = 0.01)
```

I have used here `CrossEntropyLoss() and SGD()`

optimizer.

### Training

Let’s train our model:

```
train = get_dataset(train = True)
for epoch in tqdm(range(10)):
for i,(images,target) in enumerate(train):
images = images.to(device)
target = target.to(device)
out = resnet(images)
loss = criterion(out,target)
print(loss)
# Back-propogation
optimizer.zero_grad()
loss.backward()
optimizer.step()
_,pred = torch.max(out.data,1)
correct = (pred == target).sum().item()
if i % 100 == 0:
torch.save(resnet.state_dict(),"model")
print(f" epoch: {epoch}\tloss: {loss.data}\tAccuracy: {(correct/target.size(0)) * 100}%")futureinternet-10-00080-g002
```

I have used 10 epochs to train the model. `optimizer.zero_grad()`

method is used to make gradient to 0. Next, we call `backword()`

on our loss variable to perform back-propagation. After the gradient has been calculated we optimize our model by using `optimizer.step()`

method.

### Testing

```
test = get_data(train = False)
with torch.no_grad():
correct = 0
total = 0
for i,(images,target) in tqdm(enumerate(test)):
images = images.to(device)
target = target.to(device)
out = resnet(images)
_,pred = torch.max(out.data,1)
total += target.size(0)
correct += (pred == target).sum().item()
print(f"Accuracy: {(correct/total) * 100}")
```

Since we don’t need to calculate weight during back-propagation while testing the model we use `torch.no_grad`

method. The rest part is the same as training.

After 10 epochs I got an accuracy of 93.23%.

JimI could not resist commenting. Exceptionally well written! I

have been surfing online more than three hours today, yet I never found any interesting article like yours.

It’s pretty worth enough for me. Personally, if all website owners and

bloggers made good content as you did, the internet

will be much more useful than ever before. I could not refrain from commenting.

Exceptionally well written!

AdminThank You !! For your kind words Jim 😉

toro glassMy spouse and I stumbled over here from a different page

and thought I should check things out. I like what I see

so i am just following you. Look forward to looking

into your web page again.

AdminThank you;)

สล็อตออนไลน์Your blog must offer compelling and unique content in order for it to be successful.

AdminThanks for the feedback

Erick Kolling“Having read this I thought it was very enlightening. I appreciate you taking the time and effort to put this article together. I once again find myself personally spending way too much time both reading and commenting. But so what, it was still worth it!”