ResNets for Image Recognition - Machine Learning & AI Research

The imagenet image recognition challenge can be credited to the advancement in developing convolutional neural networks (CNNs), Previous architectures which had won the challenge in their respective years such as LeNet-5, AlexNet and VGGNet-16 were able to achieve high accuracies (relative to their competitiors) by stacking multiple layers over and over multiple times which was thought to help the model understand the data better. The architectures of these CNNs was such that these started with an Image Input Layer subsequently convolving with a combination of some convolutional filters, pooling layers followed by fully connected layers at the end with a sigmoid or softmax function for the purpose of class label prediction. Over time, the number of subsequent channel layers increased based on the number of filters being used in the convolutional layers, this is a standard method of designing CNNs in every architecture, what set these models apart was the arrangement of these layers which impacted heavily the performance of the architecture in terms of efficiency.
We’ll talk about the other networks and how they worked in a later post, for this post, let’s restrict our discussion to the architecture which researchers designed to answer the question that,

“Is learning better networks as easy as stacking more layers?”

The Problem

Vanishing Gradient Problem

The previous architectures which had been quiet successful relative to their competitors were made up by stacking multiple layers to form the final architecture, a problem that arises with this was when using activation functions like the sigmoid activation function, squishes a large input space into a small input space between 0 and 1, due to which even if there is a large change in the input of the sigmoid function, there will only be a small change in the output. This is better understood through the graph, which shows the behaviours of the sigmoid function and the derivative of the sigmoid function over a range.

This problem is not so significant in shallow networks which use this activation, but as we keep on stacking more layers the problem starts to come up more significantly.

Gradients of neural networks are calculated using backpropagation, in backpropagation we find the derivative of the network by moving layer by layer from the final layer to the initial layer, and based on the chain rule of differentiation, the derivatives of each layer are multiplied down the network to compute the derivative for the initial layers.

When the hidden layers in these large networks use activation functions like sigmoid, a large number of these small derivative are multiplied together, which results in the gradient getting decreased exponentially as we move our way down to the initial layers of the network.

The initial layers which are crucial for identifying certain core elements and high level features in an input image, so it is important that the weights and biases of these layers are tuned in perfectly during the training process, but due to small gradients these layers are not effectively trained during the training process.

Accuracy Degradation

The authors also observed that, as the depth of the network was increasing, accuracy would get saturated after a point, this was expected to happen, let me tell you how,

When modelling the architecture we expect a sufficiently deep neural network to model all the intricacies of our data well, eventually this would lead to a stage when adding additional layers in the network would lead the network into completely learning the data. After this point if we go on to increase the depth of the network, the accuracy would not increase and also may start decreasing, this could simply be termed as an overfitting problem but this happens due to the additional layers in the architecture resulting in a higher training error.

Let us consider an architecture which has n layers, this architecture achieves an accuracy of x, now we consider another architecture with more layers than the previous architecture, let it have m layers clearly (m>n).

We could start by saying that this bigger network should at least be as good as the shallower network. How?

Consider replacing the first n layers of the deeper network with the trained n layers of the shallower network. Then, go on to replace the remaining m-n layers in the deeper network with an identity mapping. This should make the deeper network can easily learn the shallower models representation. Furthermore, we expect this deeper network to learn more complex representation of data if any is present.

But when practically applied, this is not what we say, in fact we say the accuracy decaying as we go on adding more and more layers.

This degradation is what led to the researchers in proposing a new repetitive block in the architecture called the Residual Block.

Residual Blocks

Instead of hoping each few stacked layers directly fit a desired underlying mapping, we explicitly let these layers fit a residual mapping.

The problem with traditional deep convolutional neural network based architectures was the fact that these architectures had a point after which adding more layers would saturate the accuracy and then eventually degrade which was not being caused by overfitting. This led to an intuition that, shallower networks are learning better than their deeper counterparts, although this sounds quite counter-intuitive, this is what was seen in practice.

A clear observation was that, shallower networks perform better than deeper networks with fewer more layers added to them, so why not go ahead and skip these extra layers and at least match the accuracy of the shallow sub-networks. This skipping of layers during training could be done using skip connections or residual connections.

Traditional neural networks were designed to work in a way in which the output of each layer is fed into the next layer. In case of a residual block, the layer feeds into the next layer and directly into the layers about 2-3 hops away from it.

This also helps the problem of vanishing gradient as during backpropagation, these skip connections could be really helpful in propagating larger gradients to the initial layers, and these layers could also learn as fast as the final layers, giving a better way to train deeper networks.

As a result of using these residual blocks and extensively working on the ImageNet classification dataset, in their CNN-based architecture, the authors show that:

Extremely deep residual networks are easy to optimize, but the simple (stack-based) architectures exhibit a higher training error when the depth of architecture is increased.
Deep Residual networks can easily enjoy accuracy gains from greatly increased depth, producing results substantially better than previous networks.

The model led to an ensemble have 3.57% top-5 error on the ImageNet classification test set, and won the 1st place in the ILSVRC 2015 classification competition.

The deeper representations of these residual networks have a excellent generalization performance on other recognition tasks, which led to the network winning 1st place on ImageNet Detection, ImageNet Localisation, COCO detection and COCO Segmentation in ILSVRC and COCO 2015 competition.

Further, I have gone ahead and implemented the DenseNet architecture with a detailed explaination of the code, in an interactive jupyter notebook, you can find it here.

123 Responses

Pingback: Densely Connected Convolutional Networks – Mayesh Mohapatra
Pingback: tadalafil best price 20 mg
Pingback: need help writing scholarship essay
Pingback: write my admission essay
Pingback: persuasive essay help
Pingback: descriptive essay help
Pingback: application essay help
Pingback: cheap write my essay
Pingback: cheapest essay writing service
Pingback: buy essay cheap online
Pingback: essay writing service cheap
Pingback: help writing a argumentative essay
Pingback: i need someone to write my essay
Pingback: write my essay service
Pingback: custom english essays
Pingback: good essay writing company
Pingback: essay homework help online
Pingback: clindamycin target pharmacy
Pingback: oakdell pharmacy sunday store hours
Pingback: abc pharmacy online
Pingback: online pharmacy adderall
Pingback: rx europe online pharmacy tramadol
Pingback: cialis prescription
Pingback: real viagra pills for sale
Pingback: best way to take cialis
Pingback: online pharmacy for cialis
Pingback: online pharmacy canada generic viagra
Pingback: order sildenafil 100mg
Pingback: where to buy cialis cheap
Pingback: best generic viagra online
Pingback: buy viagra paypal online
Pingback: generic viagra online from india
Pingback: cialis from usa
Pingback: cialis meme
Pingback: canadian pharmacy non rx cialis
Pingback: viagra from pharmacy
Pingback: can you buy misoprostol pharmacy
Pingback: cost of viagra 100 mg
Pingback: canadian online pharmacy no prescription cialis dapoxetine
Pingback: liquid cialis review
Pingback: can you order viagra online
Pingback: sun rx pharmacy
Pingback: tadalafil + dapoxetine 40mg/60mg
Pingback: sildenafil generic price uk
Pingback: sildenafil tablets coupon
Pingback: viagra online without prescription free shipping
Pingback: how to buy viagra in usa
Pingback: viagra from canada no prescription
Pingback: sildenafil 20 mg for sale
Pingback: cialis wikipedia
Pingback: canada pharmacy cialis
Pingback: cialis sample request form
Pingback: overnight cialis with dapoxetine
Pingback: gabapentin hplc
Pingback: metronidazole mania
Pingback: how long for sulfamethoxazole to work
Pingback: valtrex confusion
Pingback: tamoxifen crystals
Pingback: lyrica pain medication
Pingback: glucophage monitoring
Pingback: lisinopril medline
Pingback: lasix generico
Pingback: rybelsus pronunciation
Pingback: rybelsus and depression
Pingback: semaglutide questionnaire
Pingback: can you stop cymbalta cold turkey
Pingback: zoloft for adhd
Pingback: metronidazole farmakodinamik
Pingback: does lexapro contain maoi
Pingback: keflex for pneumonia
Pingback: buy real viagra online cheap
Pingback: duloxetine anxiety
Pingback: gabapentin lactam
Pingback: can you break open cephalexin capsules for dogs
Pingback: escitalopram first few days
Pingback: fluoxetine and acid reflux
Pingback: spiraldynamics
Pingback: Spiral Dynamics
Pingback: cephalexin for staph infection
Pingback: para que sirve el ciprofloxacin hcl 500 mg
Pingback: can you take bactrim on an empty stomach
Pingback: bactrim dosage 800/160 for skin infection
Pingback: vxi.su
Pingback: amoxicillin clav 875-125
Pingback: cozaar manufacturer coupon
Pingback: contrave and adderall
Pingback: is diclofenac an nsaid
Pingback: ezetimibe aortic stenosis
Pingback: von willebrand ddavp surgery
Pingback: effexor for anxiety
Pingback: best time of day to take citalopram
Pingback: how to deal with augmentin side effects
Pingback: flomax conversion to doxazosin
Pingback: how does depakote make you feel
Pingback: flexeril street value
Pingback: diltiazem reviews
Pingback: allopurinol for gout
Pingback: aspirin in laundry
Pingback: amitriptyline 25mg for pain
Pingback: aripiprazole tablets
Pingback: snorting buspar
Pingback: can you drink alcohol with augmentin
Pingback: celecoxib and aspirin
Pingback: will baclofen get you high
Pingback: does celexa make you sleepy
Pingback: celebrex and tramadol
Pingback: is bupropion addictive
Pingback: acarbose referencia
Pingback: trulicity semaglutide
Pingback: robaxin iv
Pingback: repaglinide nps
Pingback: actos pepa
Pingback: does remeron help anxiety
Pingback: can you buy protonix over the counter
Pingback: abilify for schizophrenia
Pingback: sitagliptin vs semaglutide
Pingback: contiflo xl tamsulosin
Pingback: price of ivermectin tablets
Pingback: voltaren scherzgel
Pingback: synthroid crash
Pingback: is 300 mg of venlafaxine a high dose
Pingback: tizanidine 4mg used for
Pingback: 100 mg spironolactone reddit