What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Constitutional AI

Constitutional AI is an alignment method developed by Anthropic that trains language models to follow a set of written ethical principles by using the model itself to critique and revise its own outputs, reducing dependence on human feedback for harmlessness.

6 min readLast updated May 2026Foundations

Constitutional AI (CAI) is an alignment methodology developed by Anthropic and first described in a paper published in December 2022. The technique trains large language models to behave according to a predefined set of ethical principles — referred to as a "constitution" — by leveraging the model's own generative capabilities to critique and revise its responses, rather than relying exclusively on human annotators to label harmful content at each training step.[^1] Constitutional AI is the primary alignment approach underlying Anthropic's Claude model family and represents a significant departure from pure Reinforcement Learning from Human Feedback (RLHF) pipelines in which every safety-relevant signal must be produced by a human rater.

Motivation

The primary challenge Constitutional AI was designed to address is the scalability bottleneck inherent in human-feedback-based alignment. As language models become more capable, the volume and subtlety of harmful outputs they are capable of generating grows substantially, making it increasingly difficult and expensive for human reviewers to identify and label all such content reliably. Additionally, human raters introduce inconsistency: different reviewers hold different values, apply different standards across contexts, and may be reluctant to engage with extremely disturbing content. Constitutional AI attempts to reduce this bottleneck by using a capable language model to apply a set of explicitly stated principles at scale.[^2]

A secondary motivation is transparency. Because the principles governing model behaviour are written down in natural language, Constitutional AI makes the normative commitments of the training process legible to external observers in a way that RLHF reward models — which encode human preferences implicitly — do not.

The Constitutional Training Process

Phase 1: Supervised Learning with Self-Critique (SL-CAI)

In the first phase, the model is presented with a potentially harmful prompt and asked to generate a response. It then reads a principle from the constitution — for example, "Choose the response that is least likely to contain harmful or unethical content" — and critiques its own response in light of that principle. Based on the critique, the model revises the response. This critique-revise cycle may be repeated multiple times using different constitutional principles. The revised responses are used to create a supervised fine-tuning dataset.[^3]

Phase 2: Reinforcement Learning from AI Feedback (RLAIF)

In the second phase, the model trained in Phase 1 is used to generate preference labels for pairs of responses. Given two candidate responses to a prompt, the model is asked — guided by constitutional principles — which response it prefers and why. These AI-generated preference judgements are used to train a Preference Model (PM), which is then used as the reward signal in a standard RL fine-tuning loop (typically using Proximal Policy Optimisation, PPO). This process is termed Reinforcement Learning from AI Feedback (RLAIF) to distinguish it from traditional RLHF where all preference labels are provided by humans.

The Constitution

Anthropic's published constitution draws from multiple normative sources, including the United Nations Declaration of Human Rights, principles from Apple's usage guidelines, DeepMind's Sparrow rules, and Anthropic's own statements of model intent. The principles cover domains including avoidance of harmful content, honesty and non-deception, respect for human autonomy, and broad safety considerations.[^4]

A key philosophical aspect of Constitutional AI is that the principles themselves can be debated, revised, and updated. In 2023, Anthropic published work on "Collective Constitutional AI," in which principles were selected with input from a representative sample of American adults through a structured deliberation process, exploring how democratic values could be incorporated into the constitution rather than relying solely on the values of Anthropic's staff.

Advantages and Limitations

Constitutional AI offers several advantages over pure RLHF: it requires fewer human annotations for the harmlessness dimension of alignment, the governing principles are explicit and auditable, and the technique scales more gracefully as model capabilities increase. The critique-and-revise process also tends to produce models that can explain their refusals — citing the principle being applied — rather than simply declining without explanation.

The technique is not without limitations. The model's ability to apply constitutional principles depends on its underlying capability to understand nuanced ethical reasoning, which means the approach is most effective for large, capable models. Additionally, the constitution itself reflects the values of those who wrote it, and no written set of principles can anticipate every edge case. There is also ongoing academic debate about whether RLAIF-generated preference labels introduce systematic biases inherited from the base model's pre-training data.

Malaysian Context — AI Alignment and Governance

Constitutional AI is directly relevant to Malaysia's emerging AI governance landscape, particularly the Malaysia AI Governance Framework published by MDEC, which emphasises that AI systems deployed in public and regulated sectors should be explainable, auditable, and aligned with clear ethical principles. Constitutional AI's core property — that governing principles are stated explicitly in natural language — aligns closely with the Framework's requirements for transparency in AI decision-making.

Bank Negara Malaysia (BNM) and the Securities Commission Malaysia (SC) have both issued guidance on responsible AI deployment in financial services, emphasising the need for AI systems to be consistent, fair, and explainable. Alignment techniques such as Constitutional AI, which produce models capable of citing the reasoning behind their responses, are better suited to these regulatory expectations than black-box reward models trained on opaque human preference data.

Anthropic's Claude — which implements Constitutional AI — has been adopted by several Malaysian and regional enterprises as their preferred large language model, partly because its alignment properties are well-documented. Companies operating under Malaysia's PDPA and financial sector regulations cite Claude's explicit safety principles as a due-diligence advantage when deploying AI in sensitive customer-facing applications.

The concept of a "constitution" for AI systems has also resonated with Malaysian policymakers working on the National AI Office's mandate to establish norms for government AI adoption. The idea that AI behaviour can be governed by a written charter — analogous to the Federal Constitution governing human conduct — provides an accessible conceptual frame for non-technical policymakers. Academic institutions including Universiti Malaya's Faculty of Computer Science and Information Technology have included Constitutional AI in graduate-level AI ethics curricula.

References

Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073. Anthropic.
Anthropic. (2023). Core Views on AI Safety. Anthropic.
Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. Appendix A: Training Details. arXiv:2212.08073.
Anthropic. (2023). Collective Constitutional AI: Aligning a Language Model with Public Input. Anthropic Research Blog.

Tags:constitutional-ai alignment anthropic ai-safety

Type	AI alignment technique
Developed by	Anthropic
Introduced	December 2022 (paper); implemented in Claude
Core method	Self-critique and revision guided by a written constitution
Related	RLHF, AI safety, Alignment, Claude