What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

AI Red Teaming

A structured adversarial evaluation practice in which testers attempt to elicit harmful, unsafe, or policy-violating behaviour from AI systems in order to surface risks before deployment.

5 min readLast updated May 2026Applications

AI red teaming is a structured adversarial evaluation practice in which dedicated testers — human, automated, or both — attempt to elicit harmful, unsafe, biased, or policy-violating behaviour from artificial intelligence systems before those systems are deployed. The term is borrowed from military exercises and from offensive cybersecurity, where a "red team" simulates an adversary against a "blue team" defending the system. Applied to AI, red teaming has become a central component of frontier model release processes at Anthropic, OpenAI, Google DeepMind, Meta AI and Microsoft, and is embedded in the NIST AI Risk Management Framework and EU AI Act conformity assessment expectations.

Scope and objectives

Modern AI red teaming addresses risks across several dimensions. Capability risks include the elicitation of dangerous knowledge in domains such as chemical, biological, radiological and nuclear (CBRN) weapons, offensive cybersecurity, and election interference. Alignment risks include deceptive behaviour, scheming, sycophancy and reward hacking. Application risks include prompt injection through untrusted tool outputs, data exfiltration in retrieval-augmented systems, jailbreaks that bypass refusal training and content-policy violations affecting children, minorities, or other protected categories. Multimodal red teaming additionally covers image, audio and video inputs and outputs.

Methodologies

Red teaming combines manual probing, automated attack generation and structured evaluation. Manual probing engages domain experts — biosecurity researchers, lawyers, doctors, intelligence analysts — to attempt to surface risks specific to their field. Automated red teaming uses adversarial language models to generate jailbreak prompts, fuzz tool calls and search the input space efficiently. Structured evaluations apply standardised attack suites such as Microsoft's PyRIT and the AI Red Teaming Agent, released in April 2025 and integrated with Azure AI Foundry. The open-source DeepTeam framework, released in November 2025, brings adversarial testing to organisations without dedicated security teams.

Anthropic's approach

Anthropic operates a Frontier Red Team comprising roughly fifteen full-time researchers reporting through its policy organisation, deliberately separated from the teams developing model defences so that attackers and defenders do not share incentives to minimise findings. The team publishes evaluation results, attack scenarios and technical analyses on a dedicated blog launched in August 2025. Anthropic's evaluations focus on four domains: biology (with external collaborations including SecureBio's Virology Capability Test and Sepal AI's bioterrorism planning experiments), cybersecurity, autonomy and CBRN. Each Claude model receives a system card describing red-team findings and mitigations before release.

OpenAI's approach

OpenAI publishes detailed system cards for each major release, including the ChatGPT Agent System Card published in July 2025. The company combines an internal red team with external networks of contracted experts and conducts pre-deployment evaluations against capability thresholds defined in its Preparedness Framework. Microsoft has red-teamed more than one hundred generative AI products and open-sourced PyRIT, contributing to transparency across the industry.

Standards and ecosystem

Industry-wide methodology converges around four principles: regular evaluation cycles, rapid response to newly discovered attacks, adaptive methodology that evolves with model capabilities and cross-team integration with safety, policy and engineering functions. The MITRE ATLAS framework catalogues adversarial machine learning techniques, and the NIST AI Risk Management Framework provides governance language adopted in many enterprise procurement processes.

Limitations

Red teaming surfaces risks but does not exhaustively characterise them. Persistent adversaries with budget and time will routinely defeat current safety training, leading some researchers to argue that defence-in-depth — combining model-level training with deployment-time monitoring, input filtering and output post-processing — is required for production deployments. Red teaming also faces ethical tensions: documented attack techniques can be misused, and access to dangerous capabilities must be carefully managed during evaluation.

Malaysian Context — Red teaming under the Malaysia AI Governance Framework

The Malaysia AI Governance Framework, published in 2024 by the Ministry of Digital and the National AI Office Malaysia (NAIO), explicitly references adversarial testing and red teaming as expected practice for high-risk AI systems. The Framework, together with the Personal Data Protection Act (PDPA) 2010 (as amended 2024) and the National Cyber Security Agency (NACSA) directives, sets baseline expectations for risk assessment before deploying AI in regulated sectors.

Bank Negara Malaysia (BNM) has issued guidance under its Risk Management in Technology (RMiT) policy that requires financial institutions including Maybank, CIMB, RHB and Public Bank to evaluate AI systems for prompt injection, model inversion and adversarial robustness before production use. The Securities Commission Malaysia (SC) has issued comparable expectations for fintech sandbox participants.

CyberSecurity Malaysia (CSM) and NACSA have established AI security working groups and run capability-building exercises with vendors including Microsoft, Google Cloud, AWS and IBM through the Malaysia Digital Economy Corporation (MDEC). Local cybersecurity firms LGMS, Securemetric and Firmus have expanded their service portfolios to include AI red teaming, supported by HRD Corp grants for technical reskilling. Universiti Teknologi Malaysia and Universiti Sains Islam Malaysia have published research on adversarial robustness and content safety evaluation in Bahasa Malaysia contexts, where mainstream English-language red teaming corpora often fail to surface region-specific harms.

References

Anthropic. (2025). Frontier Red Team: Methodology and Findings. Anthropic Red Blog.
OpenAI. (2025). ChatGPT Agent System Card. OpenAI.
Microsoft. (2025). PyRIT and the AI Red Teaming Agent. Microsoft Security.
National Institute of Standards and Technology. (2023). AI Risk Management Framework (AI RMF 1.0).
Ministry of Digital Malaysia. (2024). Malaysia AI Governance Framework.

Tags:red-teaming AI safety security evaluation alignment

Type	Adversarial safety evaluation
Borrowed from	Military and cybersecurity red teaming
Targets	LLMs, multimodal models, agents, content filters
Key frameworks	Microsoft PyRIT, DeepTeam, NIST AI RMF, MITRE ATLAS
Related	AI safety, alignment, jailbreaking