What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Convolutional Neural Network

A convolutional neural network (CNN) is a type of deep neural network that uses convolutional layers to automatically learn spatial hierarchies of features from grid-structured data, most commonly images.

7 min readLast updated May 2026Foundations

A convolutional neural network (CNN) is a class of deep neural network designed to process data that has a known grid-like topology, most notably two-dimensional images. Unlike fully connected networks that treat each input feature independently, CNNs exploit the spatial structure of images through a mathematical operation called convolution, allowing them to detect local patterns — edges, textures, shapes — at multiple scales and positions. The architecture was first developed in practical form by Yann LeCun and colleagues in the 1990s and became the dominant approach in computer vision following the landmark AlexNet result at the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC).

Core Architecture

A CNN typically consists of a sequence of three layer types: convolutional layers, pooling layers, and fully connected layers, arranged to progressively transform raw pixel values into a compact, task-relevant representation.

Convolutional Layers

The convolutional layer is the defining component of a CNN. It applies a set of learnable filters (also called kernels) to the input by sliding each filter across the width and height of the input volume, computing the dot product between the filter weights and the local region of the input at each position. This operation produces a two-dimensional feature map that encodes where and how strongly a given pattern (such as a horizontal edge or a colour gradient) appears across the input.

Key properties of convolution make CNNs well-suited to image data. Parameter sharing means the same filter weights are reused at every spatial position, dramatically reducing the number of parameters compared to a fully connected layer over the same input. Local connectivity means each neuron only responds to a small region of the input, mirroring how biological visual neurons have localised receptive fields.[^1]

Pooling Layers

Pooling layers reduce the spatial dimensions of feature maps, providing a form of translational invariance and reducing computational cost. Max pooling, the most common form, partitions the feature map into non-overlapping rectangular regions and outputs the maximum value within each region. Average pooling computes the mean instead. By progressively downsampling spatial dimensions, pooling layers allow deeper layers to respond to increasingly large regions of the input — building hierarchical representations that go from edges and textures to object parts and whole objects.

Fully Connected Layers

After the sequence of convolutional and pooling operations has produced a compact spatial representation, one or more fully connected layers aggregate this information for the final classification or regression task. The output of the last fully connected layer is passed through a softmax function for multi-class classification, yielding a probability distribution over target categories.

Landmark Architectures

The field has produced a succession of increasingly powerful CNN architectures, each introducing innovations that improved accuracy or efficiency.

LeNet-5 (1998) was the first practically successful CNN, applied by LeCun to handwritten digit recognition for postal services and banking.[^2]

AlexNet (2012) demonstrated that deep CNNs trained on GPUs could dramatically outperform handcrafted feature pipelines on the ImageNet benchmark, igniting the deep learning revolution.[^3]

VGGNet (2014) showed that network depth — using very small 3×3 convolutional filters throughout — was a critical factor in achieving high accuracy.

ResNet (2015) introduced residual connections (skip connections) that allow gradients to flow through very deep networks (up to 152 layers) without vanishing, enabling a new performance frontier.

EfficientNet (2019) introduced a principled compound scaling method to uniformly scale network depth, width, and input resolution, achieving strong accuracy-efficiency trade-offs.

ConvNeXt (2022) revisited the CNN design space in light of Vision Transformer (ViT) advances, modernising ResNet architecture with techniques such as depthwise convolution and larger kernel sizes to match transformer performance while retaining the efficiency of convolution.[^4]

Applications

CNNs are foundational to modern computer vision and are deployed across a wide range of industries.

In medical imaging, CNNs classify radiographs, detect tumours in CT and MRI scans, grade diabetic retinopathy from fundus images, and segment anatomical structures, often reaching or exceeding specialist radiologist performance on specific tasks.

In autonomous vehicles, CNNs power perception systems that detect pedestrians, vehicles, traffic signs, and lane markings from camera feeds in real time.

In manufacturing quality control, CNNs inspect products on production lines, identifying surface defects, dimensional anomalies, and assembly errors far faster and more consistently than human inspectors.

In agriculture, CNN-based systems analyse drone imagery to detect crop disease, estimate yield, and monitor irrigation needs.

In natural language processing, one-dimensional convolutions have been applied to text classification and sentiment analysis, though they have largely been superseded by transformer-based models.

Relationship to Vision Transformers

By the mid-2020s, Vision Transformers (ViTs) — which apply the self-attention mechanism from transformer models to image patches — had emerged as competitive or superior alternatives to CNNs on large-scale benchmarks. However, CNNs retain significant advantages in data efficiency (performing well with smaller datasets), inference speed on hardware optimised for convolution, and interpretability. Hybrid architectures combining convolutional and attention components have become increasingly prevalent, indicating that CNNs and transformers are complementary rather than mutually exclusive.[^4]

Malaysian Context — CNNs in Industry 4.0 and Healthcare

Malaysia's transition to Industry 4.0 has placed CNN-based computer vision at the centre of smart manufacturing adoption, particularly in the semiconductor and electronics sectors that form the backbone of Malaysia's export economy. In Penang — Malaysia's Silicon Valley — multinational manufacturers including Intel Malaysia, Infineon Technologies Malaysia, and Globetronics have deployed CNN-powered automated optical inspection (AOI) systems to detect micro-defects on printed circuit boards and semiconductor wafers. These systems operate at speeds and resolutions unattainable by human inspection, directly improving yield rates and reducing production waste.

The Standard and Industrial Research Institute of Malaysia (SIRIM) has established a Smart Manufacturing Experience Centre under its Centre of Excellence for Smart Manufacturing, providing a testbed where small and medium enterprises (SMEs) can evaluate CNN-based machine vision solutions alongside collaborative robots (COBOTs) before committing to deployment. The initiative is part of Malaysia's Industry4WRD National Policy, which targets at least 1,000 SMEs adopting Industry 4.0 practices.

In healthcare, CNN-based diagnostic tools have been adopted or piloted at institutions including Hospital Kuala Lumpur and private health networks such as Sunway Medical Centre. Particular interest has centred on diabetic retinopathy screening: Malaysia has one of the highest diabetes prevalence rates in Southeast Asia, and CNN models that automatically grade retinal fundus photographs can extend specialist ophthalmology access to rural clinics and community health centres.

Universiti Malaya, Universiti Teknologi Malaysia (UTM), and Universiti Sains Malaysia (USM) maintain active CNN research programmes spanning agricultural disease detection, flood monitoring from satellite imagery, and palm oil fresh fruit bunch grading — an application with direct commercial value given Malaysia's role as one of the world's largest palm oil producers.

Malaysia's AI talent pipeline, supported by programmes from MDEC, HRD Corp, and institutions like Asia Pacific University (APU), has placed CNN fundamentals at the core of AI and data science curricula. Demand for engineers with computer vision expertise is growing across automotive, electronics, and logistics sectors as the country pursues its National Investment Aspirations to attract high-technology foreign direct investment.

References

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25.
Liu, Z., Mao, H., Wu, C. Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A ConvNet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Tags:deep learning computer vision image classification neural networks

Type	Feedforward deep neural network
Pioneered by	Yann LeCun, Yoshua Bengio
Key application	Image classification, object detection, medical imaging
Core layers	Convolutional, pooling, fully connected
Notable architectures	LeNet, AlexNet, VGG, ResNet, EfficientNet, ConvNeXt
Related	Deep learning, Computer vision, Transformer architecture

Core Architecture

Convolutional Layers

Pooling Layers

Fully Connected Layers

Landmark Architectures

Applications

Relationship to Vision Transformers

See Also

References

References