What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Recurrent Neural Network

A recurrent neural network (RNN) is a class of neural network designed for sequential data, where connections between nodes form directed cycles allowing information to persist across time steps.

6 min readLast updated May 2026Foundations

A recurrent neural network (RNN) is a family of neural network architectures specifically designed to process sequential data. Unlike feedforward networks, which treat each input independently, RNNs maintain an internal hidden state that is updated at each time step, enabling the network to incorporate information from earlier in a sequence when processing later elements. This property makes RNNs naturally suited to tasks such as language modelling, speech recognition, machine translation, and time series forecasting.

Architecture

At each time step t, an RNN receives an input vector x_t and a hidden state h_(t-1) from the previous step. It produces a new hidden state h_t through a learned transformation:

h_t = f(W_h · h_(t-1) + W_x · x_t + b)

where W_h and W_x are weight matrices, b is a bias, and f is typically a non-linear activation function such as tanh or ReLU. The hidden state acts as the network's "memory" — a compact summary of the sequence seen so far. An output y_t is produced at each step by a separate learned projection, though many architectures only use the final step's output or pool across all outputs.

Training RNNs uses backpropagation through time (BPTT), an extension of standard backpropagation that unrolls the network's computation across all time steps and then computes gradients backwards through that unrolled graph.

Vanishing Gradient Problem

A fundamental limitation of vanilla RNNs is their difficulty in learning long-range dependencies. During BPTT, gradients are multiplied by the hidden-state weight matrix at every step. If this matrix has eigenvalues smaller than one, gradients shrink exponentially as they travel back in time — the vanishing gradient problem — causing the network to effectively ignore information from many steps earlier. This limits practical vanilla RNNs to relatively short sequences.

Long Short-Term Memory (LSTM)

The Long Short-Term Memory architecture, proposed by Sepp Hochreiter and Jürgen Schmidhuber in 1997, is the most widely deployed solution to the vanishing gradient problem. An LSTM unit replaces the simple hidden state with two state vectors: a hidden state h_t and a cell state C_t. The cell state acts as a long-term memory channel that can carry information across many time steps with minimal transformation.

Three gating mechanisms control information flow:

Input gate: Decides what new information to store in the cell state.
Forget gate: Decides what existing information to discard from the cell state.
Output gate: Decides what part of the cell state to expose as the hidden state output.

These gates are themselves learned sigmoid functions, allowing the network to learn when to remember and when to forget. LSTMs proved effective for sequences of hundreds to thousands of steps and dominated sequence modelling tasks throughout the 2010s.

Gated Recurrent Unit (GRU)

The Gated Recurrent Unit, introduced by Cho et al. in 2014, is a simplified variant of the LSTM that merges the cell and hidden states and uses only two gates (reset and update). GRUs have fewer parameters than LSTMs, train faster, and achieve competitive or superior performance on many tasks. They have become a common alternative when computational efficiency is a concern.

Bidirectional RNNs

Standard RNNs process sequences left to right, so the hidden state at time t only incorporates information from positions 1 through t. Bidirectional RNNs run two separate recurrent layers — one forward, one backward — and concatenate their outputs. This allows each position's representation to incorporate context from both before and after it, which is beneficial for tasks like named entity recognition or sentiment analysis where full-sentence context matters.

Transition to Transformers

From approximately 2017 onwards, transformer architectures — which use self-attention rather than recurrence — have largely supplanted RNNs for natural language processing tasks where training data is abundant. Transformers process entire sequences in parallel during training (unlike RNNs, which are inherently sequential), and they scale more effectively with compute and data. However, RNNs retain advantages in streaming and online inference scenarios where inputs arrive one at a time and full-sequence parallelism is not possible. Architectures such as Mamba (2023) revisit state-space models as efficient alternatives that combine RNN-like sequential processing with improved long-range dependency handling.

| Architecture | Long-range dependencies | Training parallelism | Parameters | |---|---|---|---| | Vanilla RNN | Poor | Sequential | Low | | LSTM | Good | Sequential | Moderate | | GRU | Good | Sequential | Low–Moderate | | Transformer | Excellent | Parallel | High |

Malaysian Context — Sequence Modelling in Local Applications

Recurrent neural networks and their LSTM variants have been applied across several sectors of the Malaysian economy. In the banking sector, Maybank and CIMB have historically used LSTM-based models for time series forecasting of financial market indicators and customer transaction sequence analysis for fraud detection. While many institutions are transitioning to transformer-based architectures for new projects, RNN-derived models remain in production where interpretability and lower computational cost are priorities.

Telekomunikasi Malaysia (TM) and telecommunications providers such as Maxis and Celcom Axiata have deployed RNN-based anomaly detection models over network traffic time series to identify congestion patterns and potential security incidents in real time. The inherently sequential nature of network packet streams makes recurrent architectures a natural fit for these use cases.

In the academic sphere, Universiti Sains Malaysia (USM), Universiti Malaya (UM), and Universiti Teknologi MARA (UiTM) have published research using LSTMs for Malay language modelling and speech recognition — a domain where sequence models are essential because Malay's morphological structure and code-switching with English require careful handling of sequential context. Researchers at UTM have explored RNNs for Bahasa Malaysia sentiment analysis in social media, and MIMOS Berhad (Malaysia's national applied research and technology centre) has investigated deep learning approaches to Malay speech synthesis that build on RNN foundations.

Malaysia's oil and gas sector — primarily through Petronas — has applied LSTM-based predictive maintenance models to equipment sensor data streams from upstream operations, using the ability of LSTMs to model temporal degradation patterns. In the agricultural technology sector, companies working on precision farming in the Felda palm oil estates have used RNNs for yield time series prediction.

HRDC (Human Resources Development Corporation) certified courses in applied machine learning frequently cover LSTM-based time series modelling as a practical skill. As Malaysia's AI talent pipeline grows under the National AI Roadmap, understanding sequence models including RNNs remains a foundational competency alongside more recent transformer-based approaches.

References

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Cho, K., van Merrienboer, B., Gulcehre, C., et al. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP 2014.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.
Gu, A., & Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752.

Tags:RNN recurrent neural network LSTM sequence modeling NLP time series

Type	Sequential neural network architecture
Introduced	1986 (Rumelhart et al.); LSTM 1997 (Hochreiter & Schmidhuber)
Key variants	Vanilla RNN, LSTM, GRU, Bidirectional RNN
Key use	Sequence modelling, speech recognition, time series
Related	LSTM, transformer architecture, attention mechanism