What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Machine Translation

Machine translation is the automated conversion of text or speech from one natural language into another using rule-based, statistical, or neural systems.

6 min readLast updated May 2026Applications

Machine translation (MT) is the automated conversion of text or speech from one natural language into another using computer software. It is one of the oldest applications of artificial intelligence, with roots in 1950s rule-based systems, and one of the most economically important applications today. Modern systems are dominated by neural machine translation (NMT), in which deep learning models — usually transformer-based encoder-decoder architectures — learn translation directly from large bilingual corpora.

History

Machine translation began in the 1950s with rule-based systems that combined bilingual dictionaries with hand-written grammar rules. The 1954 Georgetown–IBM experiment translated about 60 Russian sentences into English and generated enthusiasm that the field would soon be solved. Progress stalled, however, and after the 1966 ALPAC report many funding programmes were curtailed.

Through the 1980s and 1990s, statistical machine translation (SMT) emerged, treating translation as a probabilistic problem learned from parallel corpora. IBM's word-based models and later phrase-based and syntax-based SMT systems dominated commercial deployments, including early versions of Google Translate. From around 2014 to 2017, neural networks replaced SMT, first with sequence-to-sequence recurrent networks and attention, and then with the transformer architecture introduced in the 2017 "Attention Is All You Need" paper. Since then, NMT and large multilingual language models have become the default.

Approaches

| Era | Approach | Defining idea | | --- | --- | --- | | 1950s–1980s | Rule-based MT (RBMT) | Hand-written rules and dictionaries | | 1990s–2010s | Statistical MT (SMT) | Word, phrase, and syntax-based statistical models | | 2014–present | Neural MT (NMT) | End-to-end neural networks, often transformer-based | | 2020–present | Multilingual LLMs | Single large model translates many languages and performs other tasks |

Hybrid systems combine elements from multiple approaches. Adaptive MT systems learn from user post-editing during live workflows, and document-level MT models consider context beyond a single sentence.

Modern architectures

Most production NMT systems use a transformer encoder-decoder. The encoder reads the source sentence and produces contextual representations, while the decoder generates the target sentence one token at a time, attending to encoder outputs and previously generated tokens. Subword tokenisation (BPE, WordPiece, SentencePiece) handles morphological richness and rare words. Large pre-trained multilingual models such as Meta's NLLB-200, Google's mT5 and PaLM 2, Microsoft and OpenAI translation endpoints, and dedicated MT systems from DeepL extend the approach to hundreds of languages.

Beyond text, speech translation systems combine automatic speech recognition, MT, and text-to-speech, or use end-to-end speech-to-speech models such as Meta's SeamlessM4T. Multimodal models translate text in images and video subtitles directly.

Evaluation

Translation quality is measured with automatic metrics and human assessment. BLEU, introduced by Papineni and colleagues in 2002, measures n-gram precision against reference translations. chrF computes character-level F-scores. Neural reference-based and reference-free metrics such as COMET, BLEURT, and MetricX correlate more strongly with human judgments than BLEU. Human evaluation typically uses Direct Assessment scoring or Multidimensional Quality Metrics (MQM). The WMT (Workshop on Machine Translation) conference holds annual shared tasks that benchmark systems across language pairs.

Use cases

Machine translation is widely deployed in consumer apps, web browsers, customer support, localisation pipelines for software and content, e-commerce listings, government and diplomatic communications, healthcare interpretation, scientific literature, and legal discovery. Translation memory tools used by professional translators — for example Trados Studio, memoQ, Phrase, and Smartling — integrate MT to accelerate post-editing workflows. In education, MT supports learners and provides accessibility for non-dominant languages.

Limitations

Despite substantial progress, machine translation has persistent challenges. Low-resource languages, including many indigenous and minority languages of Southeast Asia, suffer from limited parallel data. Idioms, humour, cultural references, and code-switched text remain difficult. NMT systems can produce confident-sounding but incorrect translations, sometimes invented entirely — a form of [[hallucination]]. Gender, dialect, and register bias also surface, particularly when translating from gender-neutral to gender-marked languages.

Bahasa Malaysia and regional languages

Bahasa Malaysia, Bahasa Indonesia, and the closely related languages of the Malay Archipelago are reasonably well supported by commercial MT, but quality varies across formality, dialect, and domain. Research groups in Singapore and Malaysia have published Malay-English and Indonesian-Malay NMT systems based on transformer architectures and curated corpora. Specialist work covers Bengkulu Malay, Kelantanese dialect, and code-switching with English. Open-source models such as Meta's NLLB-200 and various community fine-tunes support a wider range of regional languages, though performance for Iban, Kadazan-Dusun, and Bajau remains limited.

Malaysian Context — MT for a Multilingual Society

Malaysia's multilingual environment — Bahasa Malaysia, English, Mandarin, Tamil, plus indigenous languages of Sabah and Sarawak — makes machine translation a strategic capability. Dewan Bahasa dan Pustaka (DBP), the national language and literary agency, has supported corpus development and standardisation work that benefits MT training. The Universiti Sains Malaysia, Universiti Kebangsaan Malaysia, Multimedia University, and Universiti Malaya have research groups working on Malay NLP and MT, and have contributed datasets to shared tasks at ACL and SEALP.

Public-sector use cases include translation of government communications, parliamentary records, and digital citizen-services portals coordinated through MAMPU. The Ministry of Communications and Multimedia Commission (MCMC) supports localisation of broadcast and online content. The Inland Revenue Board (LHDN) and Companies Commission of Malaysia (SSM) deploy multilingual user interfaces underpinned by MT.

In the private sector, Maybank, CIMB, Petronas, Tenaga Nasional, AirAsia, Grab Malaysia, Shopee Malaysia, and Lazada Malaysia use MT for internal communications, customer support, and e-commerce listings. Penang's electronics manufacturing cluster and Cyberjaya's tech tenants rely on MT for documentation translation across English, Bahasa Malaysia, Mandarin, Japanese, and Korean.

The Malaysia Digital Economy Corporation (MDEC), the National AI Office (NAIO), and HRD Corp coordinate training and investment in Malay-language foundation models, which are expected to improve MT performance for under-served domains. The Personal Data Protection Act 2010 (PDPA) and Malaysia's AI Governance Framework constrain how translation training data — particularly user-generated content — can be collected and used.

Outlook

Large multilingual language models continue to narrow gaps between language pairs and to extend MT to under-resourced languages. Document-level context, terminology control, post-editing automation, and tighter integration with retrieval and grounding are active research areas. As MT becomes a built-in capability of operating systems and productivity software, the distinction between dedicated MT systems and general-purpose AI assistants is expected to blur further.

References

Vaswani, A. et al. (2017). Attention Is All You Need. NeurIPS.
Papineni, K. et al. (2002). BLEU: a Method for Automatic Evaluation of Machine Translation. ACL.
NLLB Team, Meta AI. (2022). No Language Left Behind: Scaling Human-Centered Machine Translation. arXiv:2207.04672.
Koehn, P. (2020). Neural Machine Translation. Cambridge University Press.
Dewan Bahasa dan Pustaka. (2024). Corpus Kebangsaan Annual Report. DBP.

Tags:machine translation NMT NLP language

Type	Natural language processing task
Dominant approach	Neural machine translation (NMT)
Common architecture	Transformer encoder-decoder
First commercial systems	1950s (rule-based)
Evaluation metrics	BLEU, chrF, COMET, METEOR
Related	NLP, transformer, sequence-to-sequence