What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Edge AI

Edge AI is the deployment of artificial intelligence algorithms and inference workloads directly on local devices or edge computing nodes rather than in centralised cloud data centres, enabling low-latency, privacy-preserving, and bandwidth-efficient AI applications.

7 min readLast updated May 2026Infrastructure

Edge AI refers to the deployment and execution of artificial intelligence models — in particular inference, the process of generating predictions or decisions from a trained model — on edge devices or edge computing nodes located close to the source of data, rather than sending data to centralised cloud servers for processing. Edge devices include smartphones, IoT sensors, industrial controllers, cameras, drones, autonomous vehicles, and purpose-built edge servers installed at the periphery of networks.

The distinction between edge and cloud AI is primarily one of where inference occurs. Training large models typically remains a cloud workload due to its computational demands. Edge AI focuses on efficient inference: taking a model that has been trained in the cloud and running it locally, often under tight constraints on compute, memory, power, and connectivity.

Motivation

Several practical factors drive the adoption of edge AI over purely cloud-based approaches.

Latency requirements are the most fundamental. Round-trip communication to a cloud server introduces delays of tens to hundreds of milliseconds depending on network conditions. Many edge applications cannot tolerate this latency. Autonomous vehicle perception systems must process sensor data and react within milliseconds. Industrial safety systems that detect equipment failures or worker hazards must respond faster than cloud round-trips permit. Medical monitoring devices that detect critical physiological events require immediate local action.

Bandwidth and cost are significant in deployments with large fleets of sensors or cameras generating continuous data streams. Transmitting raw video or sensor data from thousands of IoT devices to the cloud is expensive and often infeasible over constrained wireless links. Edge inference allows only actionable events or summarised metadata to be sent upstream, reducing data volume by orders of magnitude.

Privacy and data sovereignty concerns motivate edge processing in applications involving sensitive personal data. Processing medical images, biometric data, or confidential industrial data locally avoids the need to transmit it to external cloud infrastructure, reducing privacy risk and simplifying regulatory compliance.

Offline and intermittent connectivity is a practical reality for many deployments. Agricultural sensors in rural areas, maritime vessels, and mining equipment often operate in environments with unreliable or no network connectivity. Edge AI enables these systems to function autonomously during periods of disconnection.

Model Optimisation for Edge Deployment

Full-sized deep learning models trained in the cloud are often too large and computationally demanding to run efficiently on edge hardware. A standard computer vision model such as ResNet-50 may require hundreds of megabytes of storage and billions of floating-point operations per inference, far exceeding the resources of a microcontroller or a low-power IoT chip.

Several techniques reduce model size and computational requirements for edge deployment. Quantisation converts model weights and activations from 32-bit floating-point (FP32) to lower-precision formats such as 16-bit floating-point (FP16), 8-bit integer (INT8), or even 4-bit integer (INT4), reducing both memory footprint and arithmetic cost with minimal accuracy loss. Pruning removes redundant weights or entire neurons from a trained network, producing a sparse model that performs fewer computations. Knowledge distillation trains a smaller student model to mimic the behaviour of a larger teacher model, capturing much of the teacher's performance at a fraction of the cost. Efficient architectures such as MobileNet, EfficientNet, and SqueezeNet are designed specifically for resource-constrained inference and achieve strong accuracy-efficiency trade-offs.

Edge Hardware

Specialised edge AI accelerators have emerged to execute neural network inference efficiently within tight power and thermal budgets. Neural processing units (NPUs) are dedicated silicon blocks optimised for the matrix multiply operations that dominate neural network computation. They are now integrated into mobile application processors (Apple Neural Engine, Qualcomm Hexagon NPU, MediaTek APU) and purpose-built industrial edge chips.

For IoT endpoints, microcontroller-class hardware from ARM (Cortex-M series) and devices from Nordic Semiconductor, STMicroelectronics, and Renesas supports the TinyML paradigm, running stripped-down ML models within kilobytes of memory and milliwatts of power. Edge servers — more powerful nodes deployed at factory floors, base stations, or retail sites — use NVIDIA Jetson modules or Intel OpenVINO-compatible hardware to run more demanding inference workloads such as multi-camera video analytics.

Frameworks and Toolchains

Deploying models to edge devices requires conversion and optimisation toolchains. TensorFlow Lite converts TensorFlow and Keras models to a compact FlatBuffer format optimised for mobile and embedded inference. ONNX (Open Neural Network Exchange) provides an interoperability format for exchanging models between frameworks (PyTorch, TensorFlow, scikit-learn) and edge runtimes. Apple Core ML enables deployment of models to Apple devices with NPU acceleration. Intel OpenVINO optimises models for Intel CPUs, integrated graphics, and Mossfield-class edge hardware. NVIDIA TensorRT optimises models for deployment on NVIDIA edge GPUs and Jetson modules.

Applications

Edge AI has found adoption across numerous sectors. In manufacturing, edge inference enables real-time visual defect detection on production lines, predictive maintenance from vibration and acoustic sensor data, and robotic arm guidance without cloud round-trips. In retail, smart shelf systems and customer analytics cameras process video locally to avoid transmitting footage externally. In smart cities, edge-processed traffic cameras monitor flow and incidents locally. In agriculture, edge devices on tractors and drones perform crop health assessment from multispectral imagery in the field.

Malaysian Context — Edge AI in Malaysian Industry and Smart Cities

Malaysia's manufacturing sector is a significant driver of edge AI adoption, particularly in Penang, which is a major electronics manufacturing hub hosting facilities operated by Intel, AMD, Bosch, Motorola Solutions, and many other multinational companies. These facilities have been early adopters of AI-powered quality inspection and predictive maintenance, deploying edge inference on production lines to achieve the sub-100 millisecond response times required for defect detection without cloud latency.

Agilent Technologies' Penang manufacturing facility has been recognised by the World Economic Forum and McKinsey as a global advanced manufacturing lighthouse, achieving a 40 percent productivity increase and 48 percent reduction in delivery lead times through the integration of AI, digital twins, and edge computing. The facility's transformation has been cited as a model for Industry 4.0 adoption in the region.

The Malaysian government's Industry4WRD initiative, coordinated by the Ministry of International Trade and Industry (MITI), provides readiness assessments and financial incentives to help Malaysian manufacturers adopt Industry 4.0 technologies including edge AI. SmartCONNECT and the Smart Manufacturing Centre at MIMOS Berhad support manufacturers in implementing edge computing and AI in their operations.

Malaysia's ambition to build smart cities — including the ongoing smart city pilots in Putrajaya, Cyberjaya, and Iskandar Malaysia in Johor — relies on edge AI for applications such as intelligent traffic management, smart parking, and public safety monitoring. Telekom Malaysia's (TM) nationwide 5G rollout, in partnership with Digital Nasional Berhad (DNB), provides the low-latency network connectivity that makes edge AI deployments more practical across urban and semi-urban areas.

In the semiconductor and electronics manufacturing ecosystem, Penang-based companies including Globetronics and Inari Amertron have moved up the value chain toward AI-enabled testing and quality assurance, deploying edge inference systems for automated optical inspection (AOI). Malaysia's relatively low electricity costs and established manufacturing infrastructure make it an attractive location for edge AI hardware deployment at scale.

References

Li, E. et al. (2019). Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing. IEEE Transactions on Wireless Communications, 19(1), 447-457.
Warden, P., and Situnayake, D. (2019). TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers. O'Reilly Media.
MITI. (2023). Industry4WRD Implementation Framework Report. Ministry of International Trade and Industry Malaysia, Kuala Lumpur.
World Economic Forum. (2024). Global Lighthouse Network: Insights from the Forefront of the Fourth Industrial Revolution. Geneva: WEF.
Deng, S. et al. (2020). Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence. IEEE Internet of Things Journal, 7(8), 7457-7469.

Tags:edge AI on-device AI IoT inference TinyML embedded AI

Type	AI deployment paradigm
Contrast with	Cloud AI, centralised inference
Key technologies	Model compression, quantisation, TinyML, NPUs
Primary benefit	Low latency, offline operation, data privacy
Applications	Industrial IoT, smart cameras, mobile AI, robotics
Related	TinyML, ONNX, quantisation, knowledge distillation