By linking sequence with structure, the S2ALM model opens a new era in antibody research, accelerating the design of powerful, targeted therapies against some of the world’s deadliest diseases.

Research: S2ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning. Image Credit: Butusova Elena / Shutterstock
Antibodies, also known as immunoglobulins, are specialized proteins produced by the body's immune system to fight against harmful invaders, such as viruses and pathogens. Since these proteins need to bind to their targets, each protein has a unique structure, which is specific to its target. Due to their specificity and low adverse effects, these antibodies are widely explored for their therapeutic effects in the form of drugs.
From Wet Labs to AI Models
While traditionally studied using the tedious wet-lab methods, molecular scientists are now turning to computational models to design antibodies with greater precision in less time. In a leap towards advancing the use of AI in antibody design, a team of researchers from China has developed a groundbreaking AI model called S2ALM (Sequence-Structure multi-level pre-trained Antibody Language Model), which can analyze, predict, and design antibodies using structure-specific details.
The study was led by Professor Tingjun Hou and Professor Chang-Yu Hsieh from the College of Pharmaceutical Sciences, Zhejiang University, in collaboration with Assistant Professor Jintai Chen from AI Thrust, Information Hub, HKUST (Guangzhou), and Professor Jian Wu from the Zhejiang Key Laboratory of Medical Imaging Artificial Intelligence. The findings were published in the journal Research.
"The molecular basis of any antibody protein lies in its amino acid sequence," explains Prof. Hou. "The sequence decides its 3D structure, and the structure decides its biological function."
How the S2ALM Model Works
While most existing AI models only focus on the amino acid sequence, S2ALM is the first of its kind to integrate both sequence and structure, offering a more complete understanding of how antibodies function. To build this model, the researchers trained it on a large dataset incorporating 75 million antibody and protein sequences and 11.7 million 3D structures, including both experimentally determined and computer-predicted structures.
They also introduced two innovative learning strategies in a hierarchical pre-training paradigm (a stepwise AI training approach). The first, Sequence-Structure Matching (SSM), helps the model link sequence data with corresponding structures. The second, Cross-Level Reconstruction (CLR), enables the model to predict missing information by leveraging both sequence and structural clues.
Key Results and Therapeutic Potential
The results of this strategic combination were impressive. The S2ALM model outperformed all other leading models in several key tasks involved in antibody research and drug development. These included antigen binding capacity prediction, tracking B cell maturation (for antibody development), identification of antibody paratopes (specific antigen-binding regions), prediction of antigen-target binding strength (affinity), and the design of new antibody sequences.
One of the most striking outcomes was its ability to generate entirely new antibody candidates that could target pathogens such as SARS-CoV-2, Ebola virus, and Influenza B virus. Advanced structural predictions revealed that these AI-designed antibodies could form stable and functional 3D shapes suitable for targeting diseases.
"The success of S2ALM is three-fold; firstly, it learns from a comprehensive dataset of antibody representations; secondly, its unique learning approach incorporates detailed structural information with biological features; and thirdly, it exceeds state-of-the-art performance on extensive tasks, even in designing new antibodies," remarks Prof. Wu.
While the development of S2ALM marks a milestone in antibody research, its applications also offer real-world potential for therapeutic innovations. By reducing reliance on trial-and-error in laboratory methods, this model can accelerate the development of next-generation antibodies—bringing us one step closer to faster, reliable, and cost-effective immune-based therapies.
Source:
Journal reference:
- Mingze Yin, Hanjing Zhou, Jialu Wu, Yiheng Zhu, Yuxuan Zhan, Zitai Kong, Hongxia Xu, Chang-Yu Hsieh, Jintai Chen, Tingjun Hou, et al. S2ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning. Research. 2025;8:0721.DOI:10.34133/research.0721, https://spj.science.org/doi/10.34133/research.0721