What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Object Detection

Object detection is a computer vision task that involves identifying the location and category of one or more objects within an image or video frame, producing bounding boxes and class labels for each detected instance.

6 min readLast updated May 2026Applications

Object detection is a core computer vision task that combines object classification — determining what category an object belongs to — with object localisation — determining where in an image the object is located. Unlike image classification, which assigns a single label to an entire image, object detection systems must identify and bound all object instances of interest within a scene, even when multiple objects of different classes overlap or occlude one another. The global computer vision market, driven substantially by object detection applications, reached USD 19.82 billion in 2024 and is projected to surpass USD 58 billion by 2030.[^1]

Problem Formulation

Given an input image, an object detection model produces a set of detections, where each detection consists of:

A bounding box — typically represented as (x_min, y_min, x_max, y_max) or (centre_x, centre_y, width, height) — enclosing the detected object.
A class label identifying the object's category (e.g., "car", "person", "defect").
A confidence score representing the model's certainty about the detection.

Post-processing steps such as Non-Maximum Suppression (NMS) remove redundant overlapping detections by retaining only the box with the highest confidence score where multiple predictions overlap significantly (measured by Intersection over Union, or IoU).

Architectural Approaches

Two-Stage Detectors

Two-stage detectors, pioneered by the R-CNN family, first generate a set of candidate regions of interest (RoIs) using a region proposal network, then classify and refine each region independently. Faster R-CNN and Mask R-CNN fall into this category. These detectors tend to be highly accurate but slower, making them better suited to offline analysis than real-time applications.

Single-Stage Detectors

Single-stage detectors perform detection in a single pass over the image without a separate proposal stage, making them significantly faster. The YOLO (You Only Look Once) family, introduced by Joseph Redmon et al. in 2015, is the most prominent example.[^2] YOLO divides the image into a grid and predicts bounding boxes and class probabilities directly from each grid cell. Successive versions have progressively improved accuracy and speed: YOLOv8 (Ultralytics, 2023) became the community standard, while YOLOv12 (February 2025) introduced attention-centric architecture, integrating efficient attention mechanisms alongside convolutional operations to capture global context while maintaining real-time speeds.

Transformer-Based Detectors

DETR (Detection Transformer), introduced by Facebook AI Research in 2020, replaced the hand-engineered NMS post-processing with a set-prediction formulation using transformer attention, treating detection as a direct set prediction problem.[^3] RT-DETR and RF-DETR (2024–2025) have extended this approach to real-time performance while maintaining accuracy competitive with YOLO models, representing the current frontier for detection architectures.

| Model Family | Stage | Speed | Accuracy | Notable for | |---|---|---|---|---| | Faster R-CNN | Two-stage | Moderate | High | Accuracy-focused tasks | | YOLOv8 | Single-stage | Fast | High | Balance of speed and accuracy | | YOLOv12 | Single-stage | Very fast | Very high | Attention-centric, 2025 standard | | DETR / RT-DETR | Single-stage | Fast | Very high | Transformer-based, no NMS |

Training and Evaluation

Object detection models are typically trained on large annotated datasets such as COCO (Common Objects in Context, 330,000 images, 80 categories) and Open Images. The standard evaluation metric is mean Average Precision (mAP), computed by averaging the area under the precision–recall curve across all object categories and IoU thresholds.

Data annotation is a significant cost and bottleneck. Labelling tools (e.g., Label Studio, CVAT, Roboflow) allow annotators to draw bounding boxes, while semi-automatic approaches use pre-trained models to propose initial annotations that humans then verify.

Applications

Object detection underpins a wide range of deployed systems. In autonomous driving, vehicles use real-time detection of pedestrians, vehicles, cyclists, and traffic signs from camera and lidar feeds. In industrial quality control, cameras on production lines detect surface defects, misaligned components, or foreign objects. In retail, shelf-monitoring systems track stock levels. In healthcare, detection models identify anatomical structures or lesions in radiology images. In surveillance and public safety, detection identifies people, vehicles, and prohibited items.

Malaysian Context — Object Detection in Industry and Smart City Initiatives

Object detection technology has seen substantial deployment and research interest in Malaysia, spanning smart city infrastructure, manufacturing, agriculture, and public safety. MDEC's Malaysia Digital Acceleration Grant (MDAG-AI) programme, which allocated RM2.9 million in 2025, has funded AI startups developing computer vision solutions including object detection for the manufacturing and retail sectors.

In manufacturing, Malaysia's position as a major hub for electronics and semiconductor production — particularly in Penang, Selangor, and Johor — has driven demand for automated visual inspection systems. Companies in the electronics supply chain use object detection models to identify defective components on PCBs, inspect solder quality, and verify component placement at production-line speeds. Local AI companies such as Arkmind and AI-related units within Inari Amertron have explored these deployments, and multinational manufacturers operating in Penang's free trade zones have integrated YOLO-based inspection systems.

In agriculture, the Felda Global Ventures and palm oil industry have piloted object detection for precision agriculture — detecting and counting oil palm fronds, identifying diseased trees from drone imagery, and estimating fruit ripeness on the bunch. MARDI (Malaysian Agricultural Research and Development Institute) has published research applying computer vision to crop monitoring.

Smart city projects under the MyDigital Blueprint have incorporated object detection in traffic management and public safety contexts. The National Cyber Security Agency (NACSA) and law enforcement agencies have evaluated video analytics platforms that use detection models for crowd monitoring and incident response. Traffic counting and vehicle classification at toll plazas operated by PLUS Expressways use detection-based systems that have been upgraded to deep learning approaches.

In academia, Universiti Putra Malaysia (UPM), UTM, and UTAR have published work on YOLO adaptations for local use cases, including detection of Malaysian road signs, wildlife monitoring in Borneo, and medical imaging applications at Malaysian hospitals. The growth of Malaysia's AI ecosystem, supported by Khazanah Nasional investments and regional hyperscaler infrastructure, positions the country to build deeper capability in applied computer vision research.

References

MarketsandMarkets. (2024). Computer Vision Market — Global Forecast to 2030. MarketsandMarkets Research.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. CVPR 2016, 779–788.
Carion, N., Massa, F., Synnaeve, G., et al. (2020). End-to-end object detection with transformers. ECCV 2020.
Ultralytics. (2025). YOLOv12: Attention-centric real-time object detectors. Ultralytics Documentation.

Tags:object detection computer vision YOLO deep learning image recognition

Type	Computer vision task
Output	Bounding boxes, class labels, confidence scores
Key models	YOLO series, DETR, Faster R-CNN, RF-DETR
Key use	Autonomous vehicles, surveillance, manufacturing QC
Related	Computer vision, image segmentation, convolutional neural network