AIWiki
Malaysia

Neural Network

A neural network is a computational model inspired by biological brains, composed of interconnected layers of nodes that learn patterns from data through weighted connections.

5 min readLast updated May 2026Foundations

A neural network is a computational model loosely inspired by the structure of the human brain, in which large numbers of simple processing units called neurons are connected by weighted links and arranged in layers. By adjusting these weights during training, a neural network learns to map inputs to outputs and to approximate complex functions that traditional rule-based programming struggles to express. Neural networks form the algorithmic core of modern artificial intelligence and are the foundation of [[deep-learning]], computer vision, and natural language processing.

Origins and conceptual background

The earliest formal model of a neuron was introduced in 1943 by Warren McCulloch and Walter Pitts, who proposed a binary threshold unit that could perform logical operations. Frank Rosenblatt extended the idea in 1958 with the perceptron, a single-layer network capable of learning linearly separable classifications. Although the perceptron had well-known limitations highlighted by Minsky and Papert in 1969, the field was revitalised in the 1980s when Rumelhart, Hinton, and Williams popularised the backpropagation algorithm, which allowed efficient training of multi-layer networks.

Architecture

A neural network is typically organised into three kinds of layers: an input layer that receives raw features, one or more hidden layers that progressively transform those features, and an output layer that produces the prediction. Each connection carries a weight, and each neuron applies a non-linear activation function such as the sigmoid, hyperbolic tangent, or rectified linear unit (ReLU) to the weighted sum of its inputs. A simple feed-forward computation for one neuron is y = f(w_1 x_1 + w_2 x_2 + ... + w_n x_n + b), where f is the activation function and b is a bias term.

Networks with many hidden layers are called deep neural networks. Depth allows the model to learn hierarchical representations, in which early layers detect low-level features such as edges or character n-grams, and later layers compose them into higher-level concepts such as objects or sentence meaning.

Common variants

| Variant | Typical use | Defining feature | | --- | --- | --- | | Multilayer perceptron (MLP) | Tabular data, baseline tasks | Fully connected layers | | Convolutional neural network | Images, video | Shared weights, local receptive fields | | Recurrent neural network | Sequences, time series | Internal state across time steps | | Transformer | Language, vision, audio | Self-attention mechanism | | Graph neural network | Graphs, molecules, networks | Message passing over edges |

Training

A neural network learns by minimising a loss function that measures the gap between its predictions and the true labels. The loss is reduced through [[gradient-descent]] and its variants such as stochastic gradient descent, Adam, and RMSProp. Gradients of the loss with respect to each weight are computed by [[backpropagation]], which applies the chain rule of calculus through the layers in reverse order. Training data is typically processed in mini-batches over many epochs, and techniques such as dropout, weight decay, and batch normalisation are used to improve generalisation.

The quality of a trained network depends on factors such as dataset size and diversity, network depth and width, optimiser settings, learning rate schedule, and the choice of regularisation. Modern frontier models are trained on clusters of thousands of accelerators for weeks at a time.

Applications

Neural networks underpin a wide range of contemporary systems, including image classification, object detection, speech recognition, machine translation, recommendation engines, fraud scoring, medical imaging, autonomous driving, protein structure prediction, and large language models. Convolutional networks dominated computer vision through the 2010s, while transformer-based networks now power most state-of-the-art systems in text, audio, and increasingly in vision.

Limitations and ongoing research

Although neural networks excel at pattern recognition, they require large datasets, are computationally expensive to train, and often act as opaque "black boxes". Research areas such as [[explainable-ai]], [[federated-learning]], model compression, and biologically plausible learning algorithms aim to address these challenges. Hardware specialisation, from GPUs to TPUs and emerging neuromorphic chips, continues to expand the scale at which networks can be trained and deployed.

References

  1. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.
  2. Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.
  3. LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521, 436–444.
  4. Bank Negara Malaysia. (2024). Risk Management in Technology (RMiT) Policy Document. BNM.