What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Encoder-Decoder Architecture

A neural network design pattern that compresses an input sequence into an internal representation using an encoder, and then generates an output sequence from that representation using a decoder, foundational to machine translation, summarisation, and many other sequence-to-sequence tasks.

6 min readLast updated May 2026Foundations

The encoder-decoder architecture is a neural-network design pattern in which one network (the encoder) compresses an input sequence into an intermediate representation, and a second network (the decoder) generates an output sequence by conditioning on that representation. The pattern was introduced in 2014 by Sutskever, Vinyals, and Le, and independently by Cho and colleagues, in the context of recurrent neural networks for machine translation. It remains one of the most influential abstractions in deep learning and underpins systems for translation, summarisation, speech recognition, optical character recognition, image captioning, and structured-data generation.

The basic idea

In the original sequence-to-sequence formulation, the encoder reads a source sequence one token at a time, updating its hidden state with a recurrent unit such as an LSTM or a GRU. After consuming the full input, the encoder's final hidden state forms a fixed-length vector representation of the entire input — a compressed summary. The decoder is initialised from this representation and generates the target sequence one token at a time, with each generated token conditioning on the encoder representation and on the tokens generated so far.

This abstraction cleanly separates the act of understanding the input from the act of producing the output, and it can handle source and target sequences of different lengths and even different modalities — for example, audio in and text out, or pixels in and text out.

The attention extension

The original recurrent encoder-decoder suffered from a fundamental bottleneck: every input sequence, however long, had to be compressed into a single fixed-length vector. Performance on long sentences degraded as the bottleneck saturated. In 2015, Bahdanau and colleagues introduced the attention mechanism, which allowed the decoder to attend dynamically to any position in the encoder's sequence of hidden states rather than relying on the final state alone. Attention turned the encoder output from a single vector into a set of contextualised vectors, with the decoder choosing what to look at at each generation step.

The combination of encoder, decoder, and attention quickly became the standard architecture for neural machine translation and remained dominant until 2017.

Transformer encoder-decoder

In 2017, Vaswani and colleagues replaced the recurrent units of the encoder and decoder with stacks of self-attention layers, producing the transformer encoder-decoder. The encoder applies bidirectional self-attention across all input tokens at every layer, and the decoder applies causal (masked) self-attention plus cross-attention onto the encoder output. Because attention is not sequential, the transformer can process input tokens in parallel, dramatically improving training throughput.

Transformer encoder-decoder models such as T5 (Text-to-Text Transfer Transformer), BART, mBART, MarianMT, M2M-100, and NLLB-200 are the workhorses of modern machine translation, document summarisation, and text-to-text reformatting. OpenAI's Whisper speech-recognition model uses an encoder-decoder design in which the encoder processes mel-spectrograms of audio and the decoder produces text.

Encoder-only and decoder-only variants

The encoder-decoder pattern is one of three transformer configurations. Encoder-only models such as BERT and RoBERTa drop the decoder and produce contextual embeddings, optimised for understanding tasks. Decoder-only models such as GPT, Llama, Mistral, and Claude drop the encoder and treat every task as next-token prediction; this design now dominates large language modelling because it scales well and trains efficiently on web-scale text. Encoder-decoder models retain a clear advantage in tasks with a strong asymmetry between input and output — translation, summarisation, transcription — and in tasks where the output benefits from rich bidirectional input context.

| Variant | Examples | Best at | |---------|----------|---------| | Encoder-only | BERT, RoBERTa | Classification, embedding | | Decoder-only | GPT, Llama, Claude | Generation, chat, code | | Encoder-decoder | T5, BART, Whisper, NLLB | Translation, summarisation, speech |

The encoder-decoder pattern generalises beyond text. Image-captioning models pair a convolutional or vision-transformer encoder with a text decoder. Optical character recognition models such as TrOCR and Donut use a vision encoder and a text decoder. Speech-recognition systems such as Whisper and OWSM follow the same shape. Multimodal models that translate between modalities — text-to-speech, text-to-image with a diffusion decoder, and protein-sequence-to-structure models such as AlphaFold — all build on the encoder-decoder abstraction.

Malaysian Context — Multilingual Translation and Speech for Bahasa Malaysia

The encoder-decoder architecture is particularly important for Malaysia because the country's linguistic environment combines Bahasa Malaysia, English, Mandarin (and several dialects including Cantonese and Hokkien), Tamil, Iban, Kadazan-Dusun, and a long tradition of code-mixing. Translation between these languages is a sequence-to-sequence problem in which encoder-decoder models excel.

Meta AI's NLLB-200 (No Language Left Behind), Google's MADLAD-400, and the open-source Aya project from Cohere for AI all include Bahasa Malaysia coverage and use encoder-decoder transformers. Universiti Sains Malaysia, Universiti Malaya, Universiti Kebangsaan Malaysia, and Universiti Teknologi MARA have published research on Malay-English neural machine translation, often using encoder-decoder transformer baselines. MIMOS Berhad and Dewan Bahasa dan Pustaka have collaborated on parallel corpora that feed such models.

Government and enterprise applications benefit directly. Pos Malaysia, Pejabat Pos Sarawak, and the Royal Malaysian Customs Department use sequence-to-sequence models for address normalisation and document translation. Banks including Maybank, CIMB, and Public Bank deploy encoder-decoder summarisation models for compliance document review under Bank Negara Malaysia's expectations. Hospital networks under KKM and IHH Healthcare Malaysia use encoder-decoder transcription models for clinical-notes dictation. The Whisper speech model is widely used for Bahasa Malaysia transcription, although accuracy on Sabahan and Sarawakian regional speech remains a known weakness.

Cyberjaya-based AI service providers, including AITG Sdn Bhd's Teragrid Ai Platform, build product features such as multilingual chatbots, document translators, and meeting transcribers on top of encoder-decoder foundations, often fine-tuned with PEFT methods on Malaysian corpora to improve performance on local languages and code-mixed input.

References

Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. NeurIPS 2014.
Cho, K., et al. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. EMNLP 2014.
Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015.
Vaswani, A., et al. (2017). Attention Is All You Need. NeurIPS 2017.
NLLB Team, Meta AI. (2022). No Language Left Behind: Scaling Human-Centered Machine Translation. arXiv:2207.04672.

Tags:encoder-decoder seq2seq transformer neural-network architecture

Type	Neural network design pattern
Introduced	2014 (RNN seq2seq)
Key applications	Translation, summarisation, speech, OCR
Modern variant	Transformer encoder-decoder
Notable models	T5, BART, mBART, Whisper, NLLB

The basic idea

The attention extension

Transformer encoder-decoder

Encoder-only and decoder-only variants

Cross-modal applications

See Also

References

References