What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

AI Safety

AI safety is a field of research and practice concerned with the development of artificial intelligence systems that behave reliably, avoid harmful outputs, and remain aligned with human values, especially as systems become more capable.

6 min readLast updated May 2026Applications

AI safety is a multidisciplinary field concerned with ensuring that artificial intelligence systems behave in ways that are beneficial, predictable, and aligned with human values. The field encompasses technical research into the reliability, robustness, interpretability, and alignment of AI systems; policy work on governance, standards, and international cooperation; and operational practices such as evaluation, red teaming, deployment controls, and incident response. AI safety draws from machine learning, computer security, control theory, philosophy, cognitive science, and law, and has grown significantly in prominence since the public release of large language models in 2022–2023.

Scope of Concerns

Near-term Risks

Near-term AI safety addresses risks present in systems being deployed today. These include hallucination — confident production of factually incorrect statements; biased outputs that disadvantage protected groups; failure modes under distribution shift, where models perform well on test data but poorly in production; security vulnerabilities such as prompt injection, training data poisoning, and model extraction; and misuse for fraud, harassment, surveillance, generation of child sexual abuse material, generation of weapons-relevant information, or facilitation of cyberattacks.

Frontier and Long-term Risks

Frontier AI safety concerns risks that may emerge from highly capable future systems. These include catastrophic misuse, where powerful models could meaningfully assist non-state actors in producing weapons of mass destruction; loss of human oversight, where systems take actions misaligned with human intent; deceptive alignment, where systems behave well during evaluation but pursue different goals when deployed; and societal-scale impacts such as labour displacement, concentration of power, and erosion of epistemic ecosystems.

Technical Research Areas

Alignment

Alignment research investigates how to train AI systems whose objectives, reasoning, and behaviour reliably reflect human intent. Techniques include reinforcement learning from human feedback (RLHF), constitutional AI, direct preference optimisation (DPO), debate, and scalable oversight methods such as recursive reward modelling. Constitutional AI, developed by Anthropic, uses an explicit set of principles and AI feedback to train models that critique and revise their own outputs.

Interpretability

Mechanistic interpretability seeks to reverse-engineer the internal computations of neural networks, identifying circuits and features that explain model behaviour. Sparse autoencoders, probing, and activation patching are among the methods used. Interpretability supports safety by enabling external verification of model reasoning, anomaly detection, and the discovery of misaligned representations before deployment.

Robustness and Evaluation

Robustness research studies how models behave under adversarial inputs, distribution shifts, and corner cases. Evaluation practices include capability evaluations to measure what models can do, behaviour evaluations to measure how they respond to specific prompts, and red teaming — structured adversarial probing by human teams or automated agents — to elicit harmful behaviour before public release. Standardised benchmarks include MMLU, GPQA, SWE-bench, BBH, and biosecurity- and cybersecurity-specific evaluation suites.

Misuse Prevention

Misuse prevention combines content filtering, refusal training, watermarking of generated outputs, rate limiting, identity verification, and post-deployment monitoring. Frontier model developers maintain abuse teams that detect and respond to attempts to use models for prohibited purposes.

Institutional and Policy Landscape

AI Safety Institutes

Several jurisdictions established AI Safety Institutes from 2023 onward, including the UK AI Safety Institute (UK AISI), the US AI Safety Institute (US AISI) within NIST, the Japan AI Safety Institute, the Singapore AI Safety Institute, the European AI Office, and analogous bodies in Canada, South Korea, India, France, and others. These institutes conduct technical evaluations of frontier models, develop methodologies, and coordinate internationally.

Voluntary Commitments and Policies

Voluntary commitments — such as the White House Voluntary AI Commitments, the Seoul Frontier AI Safety Commitments, and the AI Safety Summit Bletchley Declaration — formalised industry pledges around responsible scaling, evaluation, and transparency. Companies including Anthropic, OpenAI, Google DeepMind, Microsoft, Meta, Amazon, and others maintain their own safety policies; Anthropic's Responsible Scaling Policy (RSP) and OpenAI's Preparedness Framework define capability thresholds that trigger additional safeguards.

Regulatory Frameworks

Binding regulation includes the EU AI Act, which classifies systems by risk level and imposes obligations on providers and deployers of high-risk and general-purpose AI; sectoral regulations issued by financial, healthcare, and aviation regulators; and emerging standards from ISO/IEC, IEEE, and NIST.

Industry Practice

AI safety is increasingly embedded in industry development workflows. Frontier laboratories run multi-stage evaluation pipelines that include automated evaluations, structured red teaming, third-party audits, and government evaluations under voluntary access agreements. Deployment is staged, with limited release, controlled access, and monitoring preceding broad availability.

Malaysian Context — AI Safety in National Policy and Industry

Malaysia is integrating AI safety into its national AI agenda through a combination of guidelines, sectoral oversight, and international engagement. The Ministry of Science, Technology and Innovation (MOSTI) published the National Guidelines on Artificial Intelligence Governance and Ethics in 2024, articulating seven principles including fairness, reliability, safety, and human oversight. The National AI Office, operating under the Ministry of Digital, coordinates implementation across ministries and engages with international AI safety initiatives.

The National Cyber Security Agency (NACSA) and CyberSecurity Malaysia work on the security dimensions of AI safety, addressing risks such as model poisoning, prompt injection in critical infrastructure deployments, and AI-enabled cyberattacks. Bank Negara Malaysia (BNM) has issued policy documents on the use of artificial intelligence and machine learning in the financial sector, requiring regulated entities to assess model risks, maintain human oversight of consequential decisions, and validate models throughout their lifecycle. The Securities Commission Malaysia (SC) addresses AI safety in capital markets contexts including algorithmic trading and robo-advisory.

Malaysia is an active participant in international AI safety dialogues, including the AI Safety Summit process initiated at Bletchley Park in 2023 and continued at Seoul in 2024 and Paris in 2025, the OECD AI working groups, and the ASEAN Working Group on AI Governance. The ASEAN Guide on AI Governance and Ethics, finalised in 2024, includes safety, transparency, and accountability as core principles, with Malaysia and Singapore among the leading contributors. Malaysian universities and research centres, including those at Universiti Malaya, Universiti Sains Malaysia, and the International Centre for Education in Islamic Finance (INCEIF), conduct research at the intersection of AI safety, ethics, and regulation, contributing to a growing regional capability in AI assurance and audit.

References

Bengio, Y. et al. (2024). International Scientific Report on the Safety of Advanced AI: Interim Report. UK Department for Science, Innovation and Technology.
Anthropic. (2024). Responsible Scaling Policy v2. San Francisco: Anthropic PBC.
OpenAI. (2023). Preparedness Framework. San Francisco: OpenAI.
European Union. (2024). Regulation (EU) 2024/1689 (AI Act). Official Journal of the European Union.
MOSTI Malaysia. (2024). National Guidelines on Artificial Intelligence Governance and Ethics. Putrajaya: MOSTI.

Tags:ai safety alignment responsible ai ai governance

Scope	Research and engineering discipline
Core concerns	Alignment, robustness, interpretability, misuse
Key institutions	UK AISI, US AISI, Anthropic, Google DeepMind, OpenAI
Major frameworks	RSP, Frontier AI Safety Commitments, NIST AI RMF
Related	AI ethics, Alignment, Red teaming, Constitutional AI