Machine Learning and Artificial Intelligence in Antibody Discovery: Breakthroughs, Blind Spots, and the Road Ahead

To date, Machine Learning (ML) has proved itself as a powerful tool in improving the workflow and timelines of antibody discovery— for instance, ranking, structure modelling, and optimizing known sequences. Nevertheless, the holy grail of de novo AI prediction of therapeutic-grade candidates has not been achieved. The biggest bottleneck appears to be on the availability of high-quality, standardized empirical datasets and robust benchmarking. Synthetic libraries and fully in vitro workflows can help here as they generate cleaner experimental outputs that provide an ideal input for ML and, who knows, eventually reach a discovery pipeline completely led by Artificial Intelligence (AI).

1. The current state of AI and ML in Antibody Discovery

Biotech industry can currently use Machine Learning to prioritize candidates and support structure-based reasoning:

Optimization of known binders (affinity maturation, stability engineering, liability reduction) is increasingly ML-friendly (1);
De novo antibody design is advancing rapidly, with diffusion-based approaches (e.g., RFdiffusion) being explored to generate antibodies with defined structural features (2).

All this with ML delivering the most robust value when coupled with high-throughput experimental pipelines: models learn from large, structured experimental datasets; and high throughput pipelines are still indispensable both to train those models and to validate their predictions. (1).

The picture changes when we talk about therapeutic-grade prediction. Immunogenicity, multi-specificity, manufacturability, aggregation, and in vivo performance depend on multiple experimental readouts that are often measured under different assay conditions and defined using varying success criteria across laboratories. Unsurprisingly, models that look impressive on internal benchmarks frequently fail to generalise when the protocol or the dataset shifts. (2).

2. What recent industry signals are telling us and why it matters

PEGS Europe’s 2025 Machine Learning program provided a good snapshot of the industry attention moving beyond “AI will replace discovery,” towards a more grounded vision of AI integrated with protein engineering, validation, manufacturability, and reproducible benchmarking.

Two themes stood out:

Protein engineering focus – sequence/structure optimization, multi-specific design rather than the ambition of fully automated end-to-end prediction (3).
Validation and reproducibility as top priorities—clear benchmarks, cross-labs comparability, and experimentally grounded iteration (3).

This direction reflects a more sustainable trajectory for the field. It future proofs the space against “model demos” that don’t survive in contact with real discovery constraints.

3. The wider AI landscape that impacts on antibody discovery

Not all AI breakthroughs translate into improvements in antibody discovery services. The ones that matter change what is feasible in structure prediction, sequence reasoning, and scientific workflow acceleration.

A clear example of a tool that shifts the baseline is DeepMind’s AlphaFold 3, with accurate modelling of complex interfaces and structure prediction, including antibody–antigen interactions (4). It doesn’t solve developability or clinical translation, but it strengthens structure-informed filtering and hypothesis generation.

One aspect that is easy to underestimate is the role of AI tools in the context of IP and data privacy constraints. The high commercial value around antibodies is often built around sensitive sequences and proprietary assay outputs; many teams will prefer tools that can be run in controlled environments, with clear governance, rather than sending sequences into systems they cannot audit.

4. Where AI already helps in antibody discovery

The most defensible applications of AI in antibody discovery are pragmatic and tightly coupled to experimental workflows. One of the clearest areas of impact is in earlier and more informed screening. ML is increasingly used to prioritize candidates based on developability-critical properties, such as specificity, stability, viscosity, and manufacturability, thereby increasing the likelihood of candidate quality molecules progressing into costly downstream stages.

In practice, this value only materializes when ML is part of an iterative discovery loop: high-throughput wet lab workflows generate structured training data and give experimental validation, while computational design and ranking indicate which variants should be tested next. This workflow enables more effective antibody engineering by concentrating experimental effort where it is most likely to pay off.

AI can also facilitate structure-assisted triage. The antibody/antigen complexes and the plausibility of structure can be utilized for prioritization of epitope mapping and variants. This can assist in improving experimental strategy formulation without attempting to replace experimental validation (5).

Aside from early antibody screening, another supportive role of AI is represented by the literature mining/knowledge synthesis applications. This technology allows for faster development of hypotheses, scanning of competitive landscape, and experimental design, contributing to greater efficiency in supporting progression in the science, despite the fact that it does not have the capability for clinical prediction (6).

In short, AI’s greatest benefit to science, at the moment, lies in its use in presenting options and effective support for optimised scientific workflows.

5. Limitations and blind spots

Despite its increasing relevance in antibody discovery, machine learning still has limitations which are tightly linked to the quality of data used for model training. Lack of large, standardised transferable datasets has been identified as a core barrier.

Data, however, is not the only area where there are restrictions. There are inherent biological constraints: therapeutic grade antibodies must clear multiple, sometimes conflicting criteria—affinity, specificity, manufacturability, immunogenicity, stability. Today’s models only partially capture these trade-offs, especially for de novo cases and when target class, format, or assay context changes (1).

The takeaway is ML works, but it performs best where criteria are well defined, measurements are repeatable, and experimental baselines are consistent.

6. The data dilemma

Most ML roadmaps quietly assume empirical feedback that is fast, consistent, and scalable (1). Synthetic antibody libraries and fully in vitro selection workflows are powerful levers here. Compared with approaches heavily shaped by immunization variability, in vitro systems support:

Controlled, consistent, reproducible diversity to systematically explore sequence–function relationships
Repeatable selection conditions that reduce noise and drift
Faster test/learn cycles—more iterations per unit time

With that in mind, the format simplicity of VHHs – especially when provided in the form of fully synthetic, diverse and robust libraries – makes them an optimal candidate for reliable AI-assisted wins.

7. The future of ML in Antibody Discovery

If the goal is to develop ML into AI that is reliably useful within the next 3 to 5 years, the next steps are clear:

1. Standardized datasets and evaluation: shared measurement practices, clearer labels, and unified benchmarks that stress-test generalisation (2,7);

2. Hybrid workflows by design: wet lab and ML built as a loop—models propose, experiments test, models update— using the strengths of each (1);

3. Consortia-level effort: meaningful progress requires coordinated investment across experimental platforms and computational methods—not over funding one side and hoping it compensates for the other.

Why not speak to our experts to discuss your program and identify early technical risks? We’ll listen to what “success” looks like for your project and support you in outlining the most efficient path to reach Target Product Profile.

Book a Discovery Call

References

Matsunaga, R. & Tsumoto, K. (2025). Accelerating antibody discovery and optimization with high-throughput experimentation and machine learning. Journal of Biomedical Science, 32: 46. doi:10.1186/s12929-025-01141-x
Zheng, J., Wang, Y., Liang, Q., Cui, L. & Wang, L. (2024). The Application of Machine Learning on Antibody Discovery and Optimization. Molecules, 29(24): 5923. doi:10.3390/molecules29245923
Joubbi, S., Micheli, A., Milazzo, P., Maccari, G., Ciano, G., Cardamone, D. & Medini, D. (2024). Antibody design using deep learning: from sequence and structure design to affinity maturation, Briefings in Bioinformatics, 25(4): bbae307. doi:10.1093/bib/bbae307
Abramson, J., Adler, J., Dunger, J. et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630 493–500. doi:10.1038/s41586-024-07487-w
Zhou, Y., Huang, Z., Li, W., Wei, J., Jiang, Q., Yang, W. & Huang, J. (2023). Deep learning in preclinical antibody drug discovery and development. Methods, 218: 57-71. doi:10.1016/j.ymeth.2023.07.003
Kim, J., McFee, M., Fang, Q., Abdin, O. & Kim, P.M. (2023). Computational and artificial intelligence-based methods for antibody development. Trends in Pharmacological Sciences, 44(3): 175-189. doi:10.1016/j.tips.2022.12.005
Wossnig, L., Furtmann, N., Buchanan, A., Kumar, S. & Greiff, V. (2024). Best practices for machine learning in antibody discovery and development. Drug Discovery Today, 29(7): 104025. doi:10.1016/j.drudis.2024.104025
.