AIWiki
Malaysia

Adversarial Machine Learning

Adversarial machine learning is the study of attacks that exploit weaknesses in machine learning models, such as crafted inputs that cause misclassification, and of the defences designed to make models more robust.

5 min readLast updated June 2026Applications

Adversarial machine learning is the study of how machine learning systems can be deliberately manipulated, and of the techniques used to defend against such manipulation. A central finding of the field is that models which perform well on ordinary data can be highly vulnerable to inputs that have been crafted to deceive them. In the best-known example, adding a small, carefully computed perturbation to an image, often imperceptible to a human, can cause an image classifier to confidently assign the wrong label.

These crafted inputs are called adversarial examples, and they reveal that the patterns a model learns are not always the robust, human-like features one might assume. The field has become increasingly important as machine learning is deployed in security-sensitive settings such as autonomous vehicles, fraud detection, medical diagnosis, and content moderation.

Categories of attack

Adversarial attacks are commonly grouped by their goal and the stage of the pipeline they target. Evasion attacks occur at inference time, modifying an input so the model produces an incorrect output; an attacker might subtly alter network traffic so that malicious activity appears normal to an intrusion-detection system. Poisoning attacks corrupt the training data so that the resulting model behaves incorrectly or contains a hidden backdoor. Model extraction attacks query a deployed model repeatedly to steal a functional copy of it. Inference attacks, including membership inference, attempt to recover information about the training data.

A further distinction is between white-box attacks, where the attacker knows the model's internal parameters and can compute perturbations directly, and black-box attacks, where the attacker can only observe inputs and outputs and must estimate how to fool the model.

Methods and defences

Two classic techniques for generating adversarial examples are the Fast Gradient Sign Method (FGSM), which perturbs an input in the direction that most increases the model's error, and Projected Gradient Descent (PGD), a stronger iterative method that is widely used to benchmark robustness. Defending against these attacks is difficult. The most effective general approach is adversarial training, in which the model is trained on adversarial examples alongside normal data so that it learns to resist them, though this raises training cost and can reduce accuracy on clean inputs. Other defences include input preprocessing, detecting anomalous inputs, and ensemble methods, but no defence is universally robust, and the field remains an ongoing contest between attacks and countermeasures.

Relevance in 2025 and 2026

As of 2025, adversarial machine-learning research concentrates on automotive systems, healthcare, electrical power and energy systems, and large language models. Attacks on vision-language models and on the vision-LiDAR fusion used in autonomous driving have grown, and large language models face their own adversarial pressures through prompt injection and jailbreaking. Standards bodies including the United States National Institute of Standards and Technology have published taxonomies of adversarial threats to guide defenders.

| Attack | Stage | Goal | |--------|-------|------| | Evasion | Inference | Cause misclassification | | Poisoning | Training | Corrupt or backdoor the model | | Extraction | Inference | Steal the model | | Membership inference | Inference | Reveal training data |

References

  1. Goodfellow, I., Shlens, J., & Szegedy, C. (2015). Explaining and Harnessing Adversarial Examples. ICLR.
  2. Madry, A., et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. ICLR.
  3. ISACA. (2025). Combating the Threat of Adversarial Machine Learning to AI-Driven Cybersecurity. isaca.org.
  4. Springer. (2025). Adversarial Machine Learning: A Review of Methods, Tools, and Critical Industry Sectors. Artificial Intelligence Review.