AIWiki
Malaysia

Residual Network

A deep convolutional neural network architecture introduced by Microsoft Research in 2015 that uses skip connections to enable training of very deep networks, winning the ImageNet challenge with a top-5 error rate of 3.57%.

7 min readLast updated June 2026Foundations

A Residual Network, commonly abbreviated as ResNet, is a deep convolutional neural network architecture introduced by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun of Microsoft Research in 2015. ResNet addressed one of the central obstacles to training very deep neural networks — the degradation problem — through a simple but highly effective architectural innovation: skip connections, also called shortcut connections or residual connections.

By enabling gradients to flow directly through shortcut paths during backpropagation, ResNet made it practical to train networks with hundreds or even thousands of layers, far exceeding what had been possible with sequential architectures. ResNet won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2015 with a top-5 classification error of 3.57%, surpassing human-level performance on the benchmark for the first time.

The Degradation Problem

Prior to ResNet, empirical evidence had shown that adding more layers to a neural network did not reliably improve performance and often made it worse — not because of overfitting, but due to the degradation problem in training. As networks grew deeper, training accuracy would saturate and then decrease, even when the network theoretically had sufficient capacity to represent any shallower solution.

The root cause lay in backpropagation: as gradients were multiplied across many sequential layers, they became exponentially small (the vanishing gradient problem), making earlier layers extremely slow to learn. Batch normalisation had partially addressed this, but practical networks were still limited to tens of layers.

Residual Connections

ResNet's solution was the residual block: a building block in which the output of a block equals the learned transformation F(x) added to the block's input x directly:

This additive shortcut, where x is passed unchanged and added to the block's transformed output, is an identity mapping. The network is therefore learning the residual F(x) — the difference between the desired output and the identity — rather than the full mapping from input to output.

This formulation has two key consequences. First, if the optimal mapping for a particular block is close to the identity (i.e., the block should change the input very little), it is much easier to learn a near-zero F(x) than to learn an exact identity mapping through many non-linear layers. Second, and more importantly for training, the identity shortcut provides a direct path for gradients to flow from later layers back to earlier layers, effectively bypassing the multiplicative vanishing gradient problem across blocks.

When the input and output of a block have different dimensions (due to stride-based downsampling or increased channel depth), a linear projection is applied to the shortcut connection to match dimensions.

Architecture Variants

ResNet is available in several standard configurations differentiated by the number of layers:

| Variant | Layers | Block Type | Parameters (approx.) | |---|---|---|---| | ResNet-18 | 18 | Basic block | 11M | | ResNet-34 | 34 | Basic block | 21M | | ResNet-50 | 50 | Bottleneck block | 25M | | ResNet-101 | 101 | Bottleneck block | 44M | | ResNet-152 | 152 | Bottleneck block | 60M |

For ResNet-50 and deeper, the basic two-layer residual block is replaced with a three-layer bottleneck block: a 1x1 convolution that reduces channel dimensions, a 3x3 convolution, and a 1x1 convolution that restores dimensions. This reduces computational cost while allowing the network to go deeper.

Influence on Subsequent Architectures

ResNet's skip connection concept proved to be one of the most generative ideas in deep learning history. Virtually every major deep learning architecture developed after 2015 incorporates some form of residual or skip connection:

  • DenseNet (2016): Extends the skip connection concept by connecting each layer to every subsequent layer within a dense block, creating a concatenated feature reuse.
  • U-Net and Feature Pyramid Networks: Use skip connections between encoder and decoder paths for semantic segmentation and object detection.
  • Transformer architectures: Every transformer block includes a residual connection around the attention sublayer and the feed-forward sublayer, making transformers a descendant of the residual network philosophy even in NLP.
  • EfficientNet, ConvNeXt, ResNeXt: Build on the residual block paradigm with architectural improvements to scaling, grouped convolutions, and modern training techniques.

Training and Transfer Learning

Pre-trained ResNet models — particularly ResNet-50 and ResNet-101 — became foundational tools for transfer learning in computer vision. A ResNet pre-trained on ImageNet provides rich visual feature representations that can be fine-tuned for downstream tasks including medical image analysis, satellite imagery classification, industrial defect detection, and facial recognition. The Hugging Face model hub hosts hundreds of task-specific fine-tunes of ResNet variants.

ResNet in Practice Today

While newer architectures such as Vision Transformers (ViT) have surpassed ResNets on several benchmarks when trained on very large datasets, ResNets remain widely used in production systems due to their efficiency, interpretability, and well-understood behaviour. For edge deployment, smaller ResNet variants are frequently chosen because their convolutional structure is highly optimised for GPU and NPU inference. TensorFlow Lite, CoreML, and ONNX all provide optimised ResNet implementations for on-device inference.

References

  1. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.
  2. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Identity mappings in deep residual networks. European Conference on Computer Vision (ECCV). Springer, Cham.
  3. Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. CVPR 2017.
  4. Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of ICML 2019.
  5. Analytics Vidhya. (2023). Deep Residual Learning for Image Recognition: ResNet Explained. Analytics Vidhya Blog.