AIWiki
Malaysia

AI Drug Discovery

AI drug discovery applies machine learning, deep learning, and generative modelling to accelerate the identification, design, and optimisation of therapeutic compounds across the pharmaceutical pipeline.

6 min readLast updated June 2026Applications

AI drug discovery refers to the application of artificial intelligence techniques — including machine learning, deep learning, generative modelling, and graph neural networks — to accelerate and improve the pharmaceutical drug discovery and development process. Traditional small-molecule drug discovery is expensive, time-consuming, and characterised by high attrition rates: only approximately 1 in 10,000 synthesised compounds eventually reaches market approval. AI methods aim to compress timelines, reduce costs, and improve the probability of clinical success by making the discovery process more data-driven and systematic.

The Drug Discovery Pipeline

The pharmaceutical pipeline from target identification to approved drug typically spans 10–15 years and costs over USD 2 billion per approved therapy. AI is being applied at multiple stages of this pipeline.

Target identification and validation involves finding biological molecules — typically proteins — whose activity is implicated in a disease. AI methods applied here include network biology approaches that analyse protein-protein interaction graphs using graph neural networks, transcriptomic analysis using deep learning to identify disease-associated gene expression patterns, and natural language processing to mine biomedical literature for novel target hypotheses.

Hit identification and virtual screening involve searching chemical space for small molecules that bind to and modulate the target protein. AI-based virtual screening models predict binding affinity between candidate molecules and the target, replacing or augmenting expensive and slow physical high-throughput screening assays. Deep learning models trained on datasets of known ligand-protein interactions, such as the ChEMBL and PubChem databases, can score millions of candidate compounds in hours on GPU clusters.

Lead optimisation involves improving the potency, selectivity, metabolic stability, and toxicity profile of initial hit compounds through iterative chemical modification. Predictive models for absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties allow in silico assessment of these properties before expensive wet-lab synthesis and testing.

Generative molecular design uses generative models — including variational autoencoders, generative adversarial networks, and diffusion models — to propose novel molecular structures with desired property profiles, rather than simply scoring compounds from a pre-existing library. Models are trained to navigate chemical space and generate molecules that satisfy multiple constraints simultaneously.

AlphaFold and Protein Structure Prediction

The release of AlphaFold 2 by Google DeepMind in 2021 marked a watershed moment for AI in biology. AlphaFold 2 achieved near-experimental accuracy in predicting three-dimensional protein structures from amino acid sequences, solving a 50-year-old grand challenge in computational biology. The subsequent release of the AlphaFold Protein Structure Database, which by 2024 contained predicted structures for over 200 million proteins, transformed drug discovery by making structural information freely available for virtually every protein of biological interest.

AlphaFold 3, released in 2024, extended structure prediction to protein complexes with other proteins, DNA, RNA, and small molecules, enabling direct prediction of how a drug candidate might interact with its biological target — a task previously requiring expensive and difficult X-ray crystallography or cryo-electron microscopy experiments.

Molecular Representation Learning

A fundamental challenge in AI-driven drug discovery is learning useful representations of molecules. Molecules can be represented as SMILES strings (a linearised notation), 2D molecular graphs, 3D atomic coordinate clouds, or pharmacophore fingerprints. Graph neural networks (GNNs) operate directly on molecular graphs, treating atoms as nodes and bonds as edges, and have shown strong performance on molecular property prediction benchmarks. 3D equivariant neural networks, which produce outputs invariant or equivariant to spatial rotations and reflections, are increasingly used when three-dimensional geometry is important — particularly in protein-ligand interaction modelling.

Clinical Progress

As of mid-2026, more than thirty AI-originated therapeutic programmes have entered human clinical trials. Insilico Medicine received approval to begin clinical trials for an AI-designed idiopathic pulmonary fibrosis drug in 2023, and the molecule reached Phase II trials in 2024 — a rapid timeline achieved in part through AI-accelerated discovery and optimisation. Exscientia, Recursion Pharmaceuticals, and Schrodinger are among the companies running AI-native drug discovery pipelines at scale. Major pharmaceutical companies including Pfizer, Novartis, AstraZeneca, and Merck have each established dedicated AI research centres and formed partnerships with AI-native companies to apply machine learning across their pipelines.

Challenges

AI drug discovery faces several persistent challenges. Data scarcity and quality are pervasive: publicly available datasets of experimental biological activity measurements are large in aggregate but sparse relative to the vast space of possible drug-like molecules. Distribution shift between training data and novel chemical scaffolds limits model generalisation. Interpretability of deep learning models is limited, making it difficult to provide mechanistic justifications for AI-generated predictions to regulators and medicinal chemists. Integration of AI tools into existing pharmaceutical workflows requires significant organisational change.

See Also

References

  1. Jumper, J., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583-589.
  2. Schneider, P., et al. (2020). Rethinking drug design in the artificial intelligence era. Nature Reviews Drug Discovery, 19, 353-364.
  3. Stokes, J. M., et al. (2020). A Deep Learning Approach to Antibiotic Discovery. Cell, 180(4), 688-702.
  4. Abramson, J., et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630, 493-500.
  5. National Institutes of Health. (2025). Artificial Intelligence in Drug Discovery: Progress and Prospects. NIH Report.