A Bochum team has taught AI to decode evolution by guiding neural networks with phylogenetic trees, unlocking more biologically meaningful insights from complex data.

Research: A quartet-based approach for inferring phylogenetically informative features from genomic and phenomic data. Image Credit: Ruhr-Universität-Bochum
Artificial intelligence is now better than humans at identifying many patterns, but evolutionary relationships have always been complicated for it. A team from the Bioinformatics Department at Ruhr University Bochum, Germany, led by Professor Axel Mosig, has developed and tested a proof-of-concept method that trains a neural network to incorporate phylogenetic information to address this issue. The AI can relate data from different species in an evolutionary context and identify which characteristics have developed in what manner throughout the course of evolution. “Our approach lets artificial intelligence look at data through the lens of evolution, in a way,” explains Vivian Brandenburg, lead author of the report published in the Computational and Structural Biotechnology Journal on August 22, 2025.
Providing Prior Knowledge of Ancestry
“Most previous AI algorithms have a hard time analyzing biological data through an evolutionary lens, because they don’t know what to look for and get confused by random patterns,” says Axel Mosig. The team in Bochum has provided its AI with prior knowledge of the phylogenetic trees of the species being analyzed. This approach involves classifying groups of four species into the presumed correct ancestral tree during training of the AI. The tree contains information about close and distant relationships. “If all groups of four are correctly arranged, the entire ancestry tree can come into place like a puzzle,” explains Luis Hack, who also worked on the study. “The AI can then look in the sequences to identify patterns that have evolved throughout this tree.”
Novel Neural Network Loss Function
This is achieved by using a novel quartet-based loss function that enforces tree-like structure in the neural network’s latent feature space, combined with a siamese loss term that stabilizes training and prevents model collapse.
Beyond DNA: Versatile Applications
The kicker: This method works not only for genetic sequence data, but also for potentially any other type of data, such as image data or structural patterns of biomolecules from various species. In the current study, the team demonstrated the approach on 16S rRNA sequence data from bacteria and on simulated sequences, and they propose future extensions to other modalities. After the bioinformaticists from RUB initially established the approach for DNA sequence data as part of their current work, they are already exploring its applicability for image data. “For example, you could reconstruct hypothetical images of evolutionary predecessor species,” says Hack, explaining the method’s potential for future projects.
Ensuring Biological Relevance
To ensure the model truly captures biologically meaningful features, the researchers also applied interpretability techniques such as in silico mutagenesis and DeepLIFT, which helped reveal which sequence elements drove the model’s phylogenetic decisions. They note, however, that the approach assumes the traits studied follow the species tree and that careful choice of reference trees will be important when applying the method to other types of biological data.