What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Long Short-Term Memory (LSTM)

Long Short-Term Memory is a recurrent neural network architecture designed to learn long-range dependencies in sequential data by using gating mechanisms to control information flow.

5 min readLast updated May 2026Foundations

Long Short-Term Memory (LSTM) is a specialised recurrent neural network (RNN) architecture introduced by Sepp Hochreiter and Jürgen Schmidhuber in their 1997 paper published in the journal Neural Computation. LSTMs were designed to overcome a fundamental limitation of standard RNNs: the vanishing gradient problem, which made it extremely difficult for traditional networks to learn dependencies spanning long sequences. By introducing a set of gating mechanisms that regulate information flow, LSTMs became one of the most influential architectures in deep learning for sequential data throughout the 2000s and 2010s.

Architecture

The defining innovation of an LSTM cell is its explicit memory cell state — a pathway that allows information to persist over many time steps with minimal modification. Unlike standard RNN hidden states, which are overwritten at every step, the LSTM cell state is designed to carry relevant information forward while discarding what is no longer needed.

Each LSTM unit contains three gates that control this process:

The forget gate decides what portion of the previous cell state should be discarded. It takes the previous hidden state and the current input, passes them through a sigmoid activation, and produces a value between 0 and 1 for each dimension of the cell state. A value near 0 means "forget", while a value near 1 means "keep".

The input gate determines what new information should be written into the cell state. It combines a sigmoid layer (which selects which values to update) with a tanh layer (which creates candidate values). The two are multiplied together to produce the update.

The output gate controls what portion of the cell state is exposed as the hidden state at the current time step. The cell state is passed through a tanh function and multiplied by the output gate to produce the final hidden state, which is passed to the next time step and used for any downstream predictions.

Together, these three gates give LSTMs the ability to selectively remember and forget information across sequences of hundreds or even thousands of time steps — a capability that standard RNNs lack in practice.

Gated Recurrent Unit

A notable variant of the LSTM is the Gated Recurrent Unit (GRU), introduced by Cho et al. in 2014. The GRU simplifies the LSTM by merging the forget and input gates into a single update gate and eliminating the separate cell state. GRUs have fewer parameters and can train faster, while achieving comparable performance on many tasks. The choice between LSTM and GRU is often empirical and task-dependent.

Applications

LSTMs achieved state-of-the-art results across a wide range of tasks involving sequential or temporal data.

Natural language processing: LSTMs became the dominant architecture for machine translation, sentiment analysis, named entity recognition, and language modelling during the mid-2010s, before being largely superseded by Transformer-based models after 2017.

Speech recognition: LSTMs are used in acoustic models for converting speech waveforms into phoneme sequences. Deep bidirectional LSTMs — which process sequences in both directions — were central to Google's neural speech recognition system released in 2015.

Time series forecasting: In finance, manufacturing, and energy, LSTMs model patterns in historical data to predict future values such as stock prices, electricity demand, or equipment sensor readings.

Anomaly detection: LSTMs learn the expected pattern of a time series and flag deviations, making them useful for fraud detection, network intrusion detection, and industrial predictive maintenance.

Healthcare: LSTMs are applied to electronic health records, analysing sequences of clinical events — diagnoses, medications, lab results — to predict patient outcomes such as hospital readmission or disease progression.

Relationship to Transformers

From roughly 2018 onwards, Transformer-based architectures — beginning with BERT and GPT — largely displaced LSTMs in NLP tasks. Transformers process entire sequences in parallel using self-attention rather than sequentially step by step, enabling far more efficient use of modern GPU hardware. They also scale more effectively with data and model size.

However, LSTMs have not disappeared. They remain competitive or preferred in settings where sequences are extremely long, memory-constrained hardware is used, or real-time streaming inference is required. LSTMs also underpin many production systems that predate the Transformer era and have not yet been replaced due to operational continuity requirements.

Malaysian Context — LSTMs in Industry and Research

In Malaysia, LSTM models have found practical application across several industries. Telekom Malaysia (TM) and telecommunications operators have applied LSTM-based models to network traffic prediction, anticipating demand patterns and detecting anomalies in real time. Financial institutions including Maybank and CIMB have explored LSTM architectures for fraud detection and credit risk modelling, where the sequential nature of transaction histories makes recurrent models a natural fit.

Petronas, Malaysia's national oil company, and other industrial players in the manufacturing and energy sectors have implemented LSTM-based predictive maintenance systems. These systems analyse sensor data streams from machinery to identify patterns that precede equipment failure, reducing unplanned downtime and maintenance costs.

In academia, Malaysian universities including Universiti Malaya (UM), Universiti Teknologi Malaysia (UTM), and Universiti Putra Malaysia (UPM) have published research applying LSTMs to locally relevant problems such as Bahasa Malaysia natural language processing, flood forecasting based on Malaysian river gauge data, and palm oil yield prediction.

MDEC (Malaysia Digital Economy Corporation) has, through its AI and data programmes, supported upskilling in deep learning frameworks — including LSTM implementation in TensorFlow and PyTorch — as part of Malaysia's broader AI talent development agenda under the MyDigital Blueprint. As Malaysian organisations modernise their data infrastructure, knowledge of LSTM-based sequential modelling remains a valued skill in the local data science and machine learning workforce.

References

Hochreiter, S., and Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780.
Cho, K., et al. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv:1406.1078.
Greff, K., et al. (2017). LSTM: A Search Space Odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10), 2222-2232.
Gers, F. A., Schmidhuber, J., and Cummins, F. (2000). Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10), 2451-2471.

Tags:lstm recurrent-neural-network deep-learning sequential-data

Type	Recurrent neural network architecture
Developed by	Sepp Hochreiter and Jürgen Schmidhuber
Introduced	1997
Key use	Sequential and time-series data modelling
Related	Recurrent neural network, Transformer, GRU