What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

AI Alignment

AI alignment is the field of research dedicated to ensuring that artificial intelligence systems pursue goals, values, and behaviours that are consistent with human intentions.

5 min readLast updated May 2026Foundations

AI alignment is the subfield of artificial intelligence safety concerned with designing systems whose decisions, recommendations, and actions reliably reflect the intentions of their principals — typically the developers, operators, and end users — and the broader values of the societies in which they are deployed. As machine learning models have grown in capability and autonomy, alignment has shifted from a theoretical concern raised by researchers such as Stuart Russell and Nick Bostrom into a practical engineering problem now addressed in the training pipelines of major laboratories.

Core problem

Modern AI systems, particularly large language models and reinforcement learning agents, are trained to optimise objectives that are imperfect proxies for what humans actually want. A model trained to maximise a reward signal may discover behaviours that score highly on the metric while violating the underlying intent — a phenomenon known as specification gaming or reward hacking. Alignment research seeks to close this gap by improving how objectives are specified, how models are trained to follow them, and how their behaviour is verified.

The problem has two widely discussed dimensions. Outer alignment concerns whether the stated objective itself captures human intent. Inner alignment concerns whether the learned model robustly pursues that objective rather than a correlated but distinct internal goal that happened to perform well during training.

Key techniques

Several techniques have moved from research papers into production training stacks.

Reinforcement learning from human feedback (RLHF)

RLHF fine-tunes a base language model by training a reward model on human comparisons of candidate outputs and then optimising the policy against that reward model. It powers most commercial assistants and forms the alignment baseline against which newer approaches are measured.

Constitutional AI

Developed at Anthropic, Constitutional AI replaces a portion of human feedback with model-generated critiques and revisions guided by a written set of principles. It scales supervision and makes the value specification more explicit and auditable.

Direct preference optimisation (DPO) and variants

DPO and related methods such as IPO and KTO eliminate the separate reward model used in RLHF and optimise the policy directly against preference pairs, simplifying the pipeline and often improving stability.

Interpretability and monitoring

Mechanistic interpretability research attempts to reverse-engineer the internal computations of neural networks so that misaligned reasoning can be detected before it produces harmful output. Activation steering, probing, and circuit analysis are active research areas.

Red teaming and evaluations

Structured adversarial testing identifies failure modes before deployment. Public benchmarks such as MMLU, HELM, and dedicated alignment evaluations measure honesty, harmlessness, and refusal calibration.

Risks and open problems

Alignment research distinguishes between near-term risks — bias, factual hallucination, jailbreaks, misuse for fraud or disinformation — and longer-term risks tied to highly capable systems that might pursue instrumental goals such as self-preservation or resource acquisition. Researchers debate the probability and timing of such risks, but most major laboratories now publish safety policies, dangerous-capability evaluations, and responsible-scaling commitments.

Open problems include scalable oversight (how humans can supervise systems that exceed them in some domains), deceptive alignment (a model that appears aligned during training but defects later), and value pluralism (whose values a system should reflect when stakeholders disagree).

Governance interaction

Alignment intersects with regulation. The European Union AI Act, the United States executive orders on AI safety, the United Kingdom AI Safety Institute, and the international AI Safety Summits at Bletchley Park and Seoul have all referenced alignment as a precondition for the deployment of frontier models. Voluntary commitments from leading laboratories include pre-deployment evaluations, watermarking research, and external red teaming.

Malaysian Context — Alignment within national AI governance

Malaysia treats AI alignment primarily through the lens of responsible deployment rather than fundamental safety research. The National Artificial Intelligence Office (NAIO), established under the Ministry of Digital, coordinates implementation of the Malaysia AI Governance Framework released by MDEC. The framework lists fairness, accountability, transparency, and human oversight as principles that align closely with international alignment goals.

Sectoral regulators have begun translating these principles into binding guidance. Bank Negara Malaysia (BNM) has issued exposure drafts on the responsible use of AI in financial services, requiring licensed institutions such as Maybank, CIMB, RHB, and Public Bank to maintain model governance committees and document the human-in-the-loop controls applied to credit, fraud, and customer-service systems. The Securities Commission Malaysia (SC) mirrors these expectations for capital-market participants.

Cybersecurity Malaysia and NACSA address adversarial misuse of generative models, including deepfake detection and disinformation mitigation, while the Personal Data Protection Department (JPDP) evaluates how PDPA-amended consent and transparency duties apply to training data and model outputs.

Industry adoption of alignment tooling is led by partners deploying Amazon Bedrock, Google Vertex AI, and Azure AI in Cyberjaya and Kuala Lumpur, where managed guardrail features, content filtering, and prompt-injection defences are used in customer-facing chatbots for telcos including TM, Maxis, and CelcomDigi. HRD Corp-funded courses on AI ethics and governance support upskilling.

References

Russell, S. (2019). Human Compatible: Artificial Intelligence and the Problem of Control. Viking.
Bai, Y. et al. (2022). Constitutional AI: Harmlessness from AI Feedback. Anthropic.
Christiano, P. et al. (2017). Deep Reinforcement Learning from Human Preferences. NeurIPS.
Rafailov, R. et al. (2023). Direct Preference Optimization. NeurIPS.
MDEC. (2024). Malaysia AI Governance Framework. Malaysia Digital Economy Corporation.

Tags:alignment ai-safety rlhf governance

Type	Research field within AI safety
Core concern	Goal and value alignment
Key techniques	RLHF, Constitutional AI, interpretability
Notable labs	Anthropic, OpenAI, DeepMind, MIRI
Related	AI safety, RLHF, Constitutional AI