Deep Learning
Deep learning is a subfield of machine learning that uses multi-layered artificial neural networks to learn hierarchical representations from data, enabling state-of-the-art performance across vision, language, and speech tasks.
Deep learning is a class of machine learning algorithms that trains artificial neural networks containing many successive processing layers to learn progressively abstract representations of data. The word "deep" refers to the depth of these layered networks, which may range from three layers in modest architectures to hundreds or thousands of layers in the most sophisticated modern systems. Deep learning has become the dominant paradigm in artificial intelligence research and industry deployment, underpinning breakthroughs in image recognition, natural language processing, speech synthesis, protein structure prediction, and autonomous systems.
Historical Development
The conceptual origins of deep learning trace back to the 1940s, when Warren McCulloch and Walter Pitts proposed the first mathematical model of a neuron. Frank Rosenblatt's perceptron in 1957 introduced the notion of a learnable linear classifier, while the backpropagation algorithm — independently derived by several researchers and popularised by David Rumelhart, Geoffrey Hinton, and Ronald Williams in 1986 — provided the mathematical foundation for training multi-layer networks by efficiently computing gradients.
Despite these theoretical advances, deep networks remained largely impractical throughout the 1990s due to the vanishing gradient problem, insufficient computational resources, and limited availability of labelled training data. A resurgence occurred in 2006 when Geoffrey Hinton and colleagues demonstrated that deep belief networks could be pre-trained effectively using unsupervised methods. The decisive turning point came in 2012, when Alex Krizhevsky, Ilya Sutskever, and Hinton's AlexNet architecture achieved a top-5 error rate of 15.3% on the ImageNet challenge, dramatically outperforming classical computer vision methods. This victory catalysed a global wave of investment and research that has continued to the present day.
How Deep Learning Works
A deep neural network consists of an input layer that receives raw data, a series of hidden layers that progressively transform that data, and an output layer that produces the model's prediction or generation. Each layer is composed of artificial neurons — mathematical functions that receive weighted inputs, sum them, apply a nonlinear activation function, and pass the result forward to the next layer. Common activation functions include the rectified linear unit (ReLU), sigmoid, and hyperbolic tangent.
During training, the network is exposed to labelled examples. A forward pass computes the network's prediction from the input; a loss function measures the discrepancy between the prediction and the correct answer. The backpropagation algorithm then computes the gradient of the loss with respect to each weight in the network by applying the chain rule of calculus in reverse. An optimiser — most commonly stochastic gradient descent or one of its adaptive variants such as Adam — uses these gradients to update the weights, nudging the network toward lower loss. This cycle repeats over many iterations and training examples until the network's predictions become accurate.
Major Architectures
Convolutional Neural Networks
Convolutional neural networks (CNNs) apply learnable filters across spatial or temporal dimensions of the input, making them inherently suited to data with local structure, such as images or audio spectrograms. The shared weights in convolutional layers drastically reduce the parameter count compared to fully connected networks of equivalent depth, enabling the training of very deep models.
Recurrent Neural Networks and LSTMs
Recurrent neural networks (RNNs) process sequential data by maintaining a hidden state that is updated at each time step. Long short-term memory (LSTM) networks introduced gating mechanisms in 1997 that allow the model to selectively remember or forget information over long sequences, overcoming the vanilla RNN's difficulty with long-range dependencies. Gated recurrent units (GRUs) offer a simplified gating scheme with comparable performance.
Transformer Architecture
The Transformer, introduced by Vaswani et al. in 2017, replaced recurrence with self-attention mechanisms that allow every position in a sequence to attend directly to every other position. Transformers scale more efficiently on modern parallel hardware and have become the foundation of virtually all state-of-the-art large language models, as well as vision transformers (ViTs) for image tasks.
Autoencoders and Generative Models
Autoencoders learn compressed representations of data by training an encoder network to map inputs to a lower-dimensional latent space and a decoder to reconstruct the original input. Variational autoencoders (VAEs) impose a probabilistic structure on the latent space, enabling generation of new samples. Generative adversarial networks (GANs) and diffusion models extend deep learning to high-fidelity image, audio, and video synthesis.
Training Considerations
Training deep networks effectively requires careful attention to data volume, hardware, and regularisation. Modern large models are trained on datasets containing billions of examples using GPU or TPU clusters with thousands of accelerators. Techniques such as batch normalisation stabilise training by normalising layer activations, while dropout randomly deactivates neurons during training to prevent overfitting. Learning rate scheduling, gradient clipping, and mixed-precision training (using 16-bit floats) are standard practice for large-scale training runs.
Applications
Deep learning powers a broad range of real-world systems. In computer vision, it enables object detection, semantic segmentation, medical image analysis, and autonomous vehicle perception. In natural language processing, it underlies machine translation, sentiment analysis, question answering, and large language models. In healthcare, deep learning models have matched or exceeded specialist-level performance in diagnosing diabetic retinopathy, skin cancer, and certain radiology findings. In scientific computing, AlphaFold 2 by DeepMind solved the protein structure prediction problem that had challenged biologists for decades, directly applying deep learning to accelerate drug discovery.
References
- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25.
- Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.
- Malaysia Digital Economy Corporation. (2024). Malaysia AI Governance Framework. MDEC.
- e-Conomy SEA. (2025). Malaysia takes 32% of regional AI funding. Google, Temasek, Bain & Company.