What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Optical Character Recognition

A computer vision technology that converts images of typed, handwritten, or printed text into machine-readable digital text, increasingly powered by deep learning and transformer-based vision models.

5 min readLast updated May 2026Applications

Optical Character Recognition, abbreviated OCR, is the technology that converts images containing text — whether scanned documents, photographed receipts, screenshots, or PDFs — into machine-readable digital text. OCR has been an active area of research since the 1950s and has progressed through three distinct technological generations: rule-based pattern matching, statistical machine learning, and modern deep learning. Contemporary OCR systems, built on convolutional neural networks and vision transformers, regularly achieve accuracy rates above 96% on a broad mix of document types.

How modern OCR works

A complete OCR pipeline performs several steps. The system first preprocesses the input image to correct skew, deskew lines of text, denoise, and binarise where appropriate. It then performs layout analysis to identify regions of interest such as paragraphs, tables, figures, and form fields. Text detection localises individual lines or words, and text recognition transcribes the localised regions into a sequence of characters. A post-processing stage applies language models, dictionaries, and structured-output decoding to correct errors and produce semantically meaningful output.

Modern systems use deep learning at every stage. The detection stage typically uses architectures derived from DBNet, EAST, or DETR. The recognition stage relies on CRNN, TrOCR (a transformer-based OCR model), or end-to-end multimodal vision-language models that handle detection and recognition jointly. Vision transformers and document-understanding models such as LayoutLM, Donut, Pix2Struct, and Nougat have largely replaced the older two-stage pipelines for many enterprise workloads.

Capabilities and limitations

A modern OCR system can handle printed text in dozens of scripts, cursive handwriting with reduced accuracy, structured forms, multi-column layouts, mixed-language documents, low-resolution photographs, and scene text from natural images. The technology remains imperfect on heavily degraded documents, unusual fonts, complex mathematical notation, and tightly handwritten free-form text. Tables with merged cells, footnotes, and forms with overlapping fields continue to challenge even leading systems.

Document AI and structured extraction

OCR is often a component of a broader document-AI pipeline rather than the end product. Enterprise systems increasingly combine OCR with named entity recognition, key-value extraction, and large language model post-processing to convert documents into structured records — invoices into rows in an accounting system, identity documents into customer profiles, contracts into negotiable clauses.

| Provider | Notable OCR offering | |----------|---------------------| | Google | Document AI, Cloud Vision OCR | | AWS | Textract | | Microsoft | Azure AI Document Intelligence | | ABBYY | FineReader, Vantage | | Open-source | Tesseract, PaddleOCR, TrOCR |

In December 2025, Mistral AI released Mistral OCR 3, a smaller open-weight OCR model designed for structured document understanding at scale. The OCR market is projected to exceed forty-three billion US dollars by 2032.

Open-source landscape

Tesseract, originally developed at HP Labs and now maintained by Google, remains the most widely used open-source OCR engine for printed text. PaddleOCR, developed by Baidu, has overtaken Tesseract in many production deployments thanks to stronger Asian-language support and better handwriting handling. EasyOCR, docTR, and Surya provide lighter-weight Python-native alternatives. Transformer-based models such as TrOCR and Donut are increasingly preferred for difficult or structured documents.

Malaysian Context — Banking, Government, and Multilingual Documents

OCR is one of the most widely deployed AI technologies in Malaysia, in part because the country's bilingual and trilingual document environment — combining Bahasa Malaysia, English, Mandarin, Tamil, and Jawi script — creates strong demand for capable multilingual recognition systems. Banks including Maybank, CIMB, RHB, Public Bank, and Hong Leong Bank use OCR for cheque clearing, know-your-customer onboarding, and loan document processing under Bank Negara Malaysia's Risk Management in Technology (RMiT) policy.

Public-sector adoption is substantial. The Inland Revenue Board (Lembaga Hasil Dalam Negeri, LHDN) uses OCR for paper tax-form intake and audit document processing. The Road Transport Department (JPJ), the National Registration Department (JPN), and the Immigration Department of Malaysia (Jabatan Imigresen Malaysia) operate OCR pipelines for identity documents, MyKad, passports, and vehicle registration certificates. Pos Malaysia uses OCR for postal address recognition, with particular complexity introduced by the mixture of Roman and Jawi script on rural addresses in Kelantan and Terengganu.

The Royal Malaysian Customs Department uses OCR for import-export declaration processing. Hospital networks including IHH Healthcare, KPJ Healthcare, and Sunway Medical Centre use OCR to digitise paper medical records, lab reports, and referral letters. The Malaysian Communications and Multimedia Commission (MCMC) and MDEC have promoted OCR-enabled e-invoicing under the LHDN MyInvois mandate, which became fully effective in 2025 and 2026 for Malaysian businesses.

OCR vendors active in the Malaysian market include ABBYY through local partners, Microsoft Azure Document Intelligence, AWS Textract, Google Document AI, and homegrown specialists working with MyKad and Jawi script. Cyberjaya-based service providers, including AITG Sdn Bhd through its Teragrid Ai Platform, integrate OCR into broader document-AI workflows for Malaysian enterprise clients.

References

Smith, R. (2007). An Overview of the Tesseract OCR Engine. ICDAR 2007.
Li, M., et al. (2021). TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models. AAAI 2023.
Mistral AI. (2025). Mistral OCR 3 Model Card. Mistral AI.
Inland Revenue Board of Malaysia. (2024). MyInvois e-Invoicing Implementation Guidelines. LHDN.
Bank Negara Malaysia. (2023). Electronic Know-Your-Customer (e-KYC) Policy Document. BNM.

Tags:ocr computer-vision document-ai deep-learning

Type	Computer vision application
Inputs	Scanned images, photos, PDFs
Outputs	Machine-readable text and structure
Modern approach	Deep learning, vision transformers
Typical accuracy (2025)	Around 96.5%

How modern OCR works

Capabilities and limitations

Document AI and structured extraction

Open-source landscape

See Also

References

References