What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Deep Learning

Deep learning is a subfield of machine learning that uses multi-layered artificial neural networks to learn hierarchical representations from data, enabling state-of-the-art performance across vision, language, and speech tasks.

7 min readLast updated May 2026Foundations

Deep learning is a class of machine learning algorithms that trains artificial neural networks containing many successive processing layers to learn progressively abstract representations of data. The word "deep" refers to the depth of these layered networks, which may range from three layers in modest architectures to hundreds or thousands of layers in the most sophisticated modern systems. Deep learning has become the dominant paradigm in artificial intelligence research and industry deployment, underpinning breakthroughs in image recognition, natural language processing, speech synthesis, protein structure prediction, and autonomous systems.

Historical Development

The conceptual origins of deep learning trace back to the 1940s, when Warren McCulloch and Walter Pitts proposed the first mathematical model of a neuron. Frank Rosenblatt's perceptron in 1957 introduced the notion of a learnable linear classifier, while the backpropagation algorithm — independently derived by several researchers and popularised by David Rumelhart, Geoffrey Hinton, and Ronald Williams in 1986 — provided the mathematical foundation for training multi-layer networks by efficiently computing gradients.

Despite these theoretical advances, deep networks remained largely impractical throughout the 1990s due to the vanishing gradient problem, insufficient computational resources, and limited availability of labelled training data. A resurgence occurred in 2006 when Geoffrey Hinton and colleagues demonstrated that deep belief networks could be pre-trained effectively using unsupervised methods. The decisive turning point came in 2012, when Alex Krizhevsky, Ilya Sutskever, and Hinton's AlexNet architecture achieved a top-5 error rate of 15.3% on the ImageNet challenge, dramatically outperforming classical computer vision methods. This victory catalysed a global wave of investment and research that has continued to the present day.

How Deep Learning Works

A deep neural network consists of an input layer that receives raw data, a series of hidden layers that progressively transform that data, and an output layer that produces the model's prediction or generation. Each layer is composed of artificial neurons — mathematical functions that receive weighted inputs, sum them, apply a nonlinear activation function, and pass the result forward to the next layer. Common activation functions include the rectified linear unit (ReLU), sigmoid, and hyperbolic tangent.

During training, the network is exposed to labelled examples. A forward pass computes the network's prediction from the input; a loss function measures the discrepancy between the prediction and the correct answer. The backpropagation algorithm then computes the gradient of the loss with respect to each weight in the network by applying the chain rule of calculus in reverse. An optimiser — most commonly stochastic gradient descent or one of its adaptive variants such as Adam — uses these gradients to update the weights, nudging the network toward lower loss. This cycle repeats over many iterations and training examples until the network's predictions become accurate.

Major Architectures

Convolutional Neural Networks

Convolutional neural networks (CNNs) apply learnable filters across spatial or temporal dimensions of the input, making them inherently suited to data with local structure, such as images or audio spectrograms. The shared weights in convolutional layers drastically reduce the parameter count compared to fully connected networks of equivalent depth, enabling the training of very deep models.

Recurrent Neural Networks and LSTMs

Recurrent neural networks (RNNs) process sequential data by maintaining a hidden state that is updated at each time step. Long short-term memory (LSTM) networks introduced gating mechanisms in 1997 that allow the model to selectively remember or forget information over long sequences, overcoming the vanilla RNN's difficulty with long-range dependencies. Gated recurrent units (GRUs) offer a simplified gating scheme with comparable performance.

Transformer Architecture

The Transformer, introduced by Vaswani et al. in 2017, replaced recurrence with self-attention mechanisms that allow every position in a sequence to attend directly to every other position. Transformers scale more efficiently on modern parallel hardware and have become the foundation of virtually all state-of-the-art large language models, as well as vision transformers (ViTs) for image tasks.

Autoencoders and Generative Models

Autoencoders learn compressed representations of data by training an encoder network to map inputs to a lower-dimensional latent space and a decoder to reconstruct the original input. Variational autoencoders (VAEs) impose a probabilistic structure on the latent space, enabling generation of new samples. Generative adversarial networks (GANs) and diffusion models extend deep learning to high-fidelity image, audio, and video synthesis.

Training Considerations

Training deep networks effectively requires careful attention to data volume, hardware, and regularisation. Modern large models are trained on datasets containing billions of examples using GPU or TPU clusters with thousands of accelerators. Techniques such as batch normalisation stabilise training by normalising layer activations, while dropout randomly deactivates neurons during training to prevent overfitting. Learning rate scheduling, gradient clipping, and mixed-precision training (using 16-bit floats) are standard practice for large-scale training runs.

Applications

Deep learning powers a broad range of real-world systems. In computer vision, it enables object detection, semantic segmentation, medical image analysis, and autonomous vehicle perception. In natural language processing, it underlies machine translation, sentiment analysis, question answering, and large language models. In healthcare, deep learning models have matched or exceeded specialist-level performance in diagnosing diabetic retinopathy, skin cancer, and certain radiology findings. In scientific computing, AlphaFold 2 by DeepMind solved the protein structure prediction problem that had challenged biologists for decades, directly applying deep learning to accelerate drug discovery.

Malaysian Context — Deep Learning Adoption Across Key Sectors

Malaysia has emerged as one of Southeast Asia's most active adopters of deep learning technologies, supported by substantial public and private investment in AI infrastructure. According to the e-Conomy SEA 2025 report, Malaysia captured 32% of Southeast Asia's total AI funding between H2 2024 and H1 2025, equivalent to approximately US$759 million, reflecting investor confidence in the country's AI ecosystem. Data centre capacity grew from 120 megawatts in 2024 to 690 MW in the first half of 2025 alone, providing the computational substrate that deep learning training and inference demand.

Petronas, Malaysia's national petroleum corporation, has established an AI Centre of Excellence (AICoE) that applies deep learning to seismic interpretation, predictive equipment maintenance, and energy optimisation across upstream and downstream operations. The company treats deep learning as central to its strategy for simultaneously improving operational efficiency and advancing energy transition goals — a direct application of the technology's capacity to extract patterns from large, complex sensor datasets.

Malaysia's banking sector has been an early adopter of deep learning for fraud detection, credit risk modelling, and customer behaviour analytics. Maybank and CIMB, two of the country's largest financial institutions, have invested in AI teams that deploy convolutional and recurrent neural network models for transaction monitoring and personalised digital banking services. Bank Negara Malaysia's (BNM) published guidelines on responsible use of AI in financial services establish governance expectations that shape how these models are deployed, validated, and audited.

The Malaysia Digital Economy Corporation (MDEC) has incorporated deep learning capability into its National AI Roadmap and the Malaysia AI Governance Framework, recognising it as a foundational technology. MDEC has onboarded over 140 AI solution providers into the Malaysia Digital AI ecosystem. HRD Corp-accredited training programmes increasingly include deep learning modules, reflecting growing demand for practitioners who can design and deploy neural network-based solutions. Local universities — Universiti Malaya, Universiti Teknologi Malaysia, and Multimedia University — have established AI and data science research centres where deep learning is a primary focus.

References

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25.
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.
Malaysia Digital Economy Corporation. (2024). Malaysia AI Governance Framework. MDEC.
e-Conomy SEA. (2025). Malaysia takes 32% of regional AI funding. Google, Temasek, Bain & Company.

Tags:deep learning neural networks machine learning AI foundations

Type	Machine learning subfield
Key concept	Hierarchical feature learning via layered neural networks
Introduced	Popularised from mid-2000s; foundations in 1940s–1980s
Key architectures	CNN, RNN, Transformer, Autoencoder
Related	Neural network, Backpropagation, Transfer learning