1
|
Zhu Y, Li Y, Li C, Shen XX, Zhou X. A critical evaluation of deep-learning based phylogenetic inference programs using simulated datasets. J Genet Genomics 2025; 52:714-717. [PMID: 39824436 DOI: 10.1016/j.jgg.2025.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 01/08/2025] [Accepted: 01/09/2025] [Indexed: 01/20/2025]
Affiliation(s)
- Yixiao Zhu
- College of Agriculture and Biotechnology and Centre for Evolutionary & Organismal Biology, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yonglin Li
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Chuhao Li
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Xing-Xing Shen
- College of Agriculture and Biotechnology and Centre for Evolutionary & Organismal Biology, Zhejiang University, Hangzhou, Zhejiang 310058, China.
| | - Xiaofan Zhou
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, Guangdong 510642, China.
| |
Collapse
|
2
|
Tinh NH, Vinh LS. An efficient deep learning method for amino acid substitution model selection. J Evol Biol 2025; 38:129-139. [PMID: 39548851 DOI: 10.1093/jeb/voae141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 10/03/2024] [Accepted: 11/13/2024] [Indexed: 11/18/2024]
Abstract
Amino acid substitution models play an important role in studying the evolutionary relationships among species from protein sequences. The amino acid substitution model consists of a large number of parameters; therefore, it is estimated from hundreds or thousands of alignments. Both general models and clade-specific models have been estimated and widely used in phylogenetic analyses. The maximum likelihood method is normally used to select the best-fit model for a specific protein alignment under the study. A number of studies have discussed theoretical concerns as well as the computational burden of the maximum likelihood methods in model selection. Recently, machine learning methods have been proposed for selecting nucleotide models. In this article, we propose a method to measure substitution rates among amino acids (called summary statistics) from protein alignments to efficiently train a deep learning network of so-called ModelDetector for detecting amino acid substitution models. The ModelDetector network was trained from 2,246,400 alignments on a computer with eight cores (without GPU) in about 3.3 hr. Experiments on simulation data showed that the accuracy of the ModelDetector was comparable with that of the maximum likelihood method ModelFinder. It was orders of magnitude faster than the maximum likelihood method in inferring amino acid substitution models and able to analyze genome alignments with millions of sites in minutes. The results indicate that the deep learning network can play as a promising tool for amino acid substitution model selection.
Collapse
Affiliation(s)
- Nguyen Huy Tinh
- Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, 144 Xuan Thuy, Cau Giay, 10000 Hanoi, Vietnam
| | - Le Sy Vinh
- Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, 144 Xuan Thuy, Cau Giay, 10000 Hanoi, Vietnam
| |
Collapse
|
3
|
Kulikov N, Derakhshandeh F, Mayer C. Machine learning can be as good as maximum likelihood when reconstructing phylogenetic trees and determining the best evolutionary model on four taxon alignments. Mol Phylogenet Evol 2024; 200:108181. [PMID: 39209046 DOI: 10.1016/j.ympev.2024.108181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 07/15/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024]
Abstract
Phylogenetic tree reconstruction with molecular data is important in many fields of life science research. The gold standard in this discipline is the phylogenetic tree reconstruction based on the Maximum Likelihood method. In this study, we present neural networks to predict the best model of sequence evolution and the correct topology for four sequence alignments of nucleotide or amino acid sequence data. We trained neural networks with different architectures using simulated alignments for a wide range of evolutionary models, model parameters and branch lengths. By comparing the accuracy of model and topology prediction of the trained neural networks with Maximum Likelihood and Neighbour Joining methods, we show that for quartet trees, the neural network classifier outperforms the Neighbour Joining method and is in most cases as good as the Maximum Likelihood method to infer the best model of sequence evolution and the best tree topology. These results are consistent for nucleotide and amino acid sequence data. We also show that our method is superior for model selection than previously published methods based on convolutionary networks. Furthermore, we found that neural network classifiers are much faster than the IQ-TREE implementation of the Maximum Likelihood method. Our results show that neural networks could become a true competitor for the Maximum Likelihood method in phylogenetic reconstructions.
Collapse
Affiliation(s)
- Nikita Kulikov
- Molecular Evolutionary Biology, Department of Biology, Hamburg University, Germany; Leibniz Institute for the Analysis of Biodiversity Change (LIB), Germany.
| | - Fatemeh Derakhshandeh
- Leibniz Institute for the Analysis of Biodiversity Change (LIB), Germany; Medical Faculty, Heidelberg University, Germany
| | - Christoph Mayer
- Leibniz Institute for the Analysis of Biodiversity Change (LIB), Germany
| |
Collapse
|
4
|
Mo YK, Hahn MW, Smith ML. Applications of machine learning in phylogenetics. Mol Phylogenet Evol 2024; 196:108066. [PMID: 38565358 DOI: 10.1016/j.ympev.2024.108066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 02/16/2024] [Accepted: 03/21/2024] [Indexed: 04/04/2024]
Abstract
Machine learning has increasingly been applied to a wide range of questions in phylogenetic inference. Supervised machine learning approaches that rely on simulated training data have been used to infer tree topologies and branch lengths, to select substitution models, and to perform downstream inferences of introgression and diversification. Here, we review how researchers have used several promising machine learning approaches to make phylogenetic inferences. Despite the promise of these methods, several barriers prevent supervised machine learning from reaching its full potential in phylogenetics. We discuss these barriers and potential paths forward. In the future, we expect that the application of careful network designs and data encodings will allow supervised machine learning to accommodate the complex processes that continue to confound traditional phylogenetic methods.
Collapse
Affiliation(s)
- Yu K Mo
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Matthew W Hahn
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA; Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Megan L Smith
- Department of Biological Sciences, Mississippi State University, Starkville, MS 39762, USA.
| |
Collapse
|
5
|
Ecker N, Huchon D, Mansour Y, Mayrose I, Pupko T. A machine-learning-based alternative to phylogenetic bootstrap. Bioinformatics 2024; 40:i208-i217. [PMID: 38940166 PMCID: PMC11211842 DOI: 10.1093/bioinformatics/btae255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Currently used methods for estimating branch support in phylogenetic analyses often rely on the classic Felsenstein's bootstrap, parametric tests, or their approximations. As these branch support scores are widely used in phylogenetic analyses, having accurate, fast, and interpretable scores is of high importance. RESULTS Here, we employed a data-driven approach to estimate branch support values with a probabilistic interpretation. To this end, we simulated thousands of realistic phylogenetic trees and the corresponding multiple sequence alignments. Each of the obtained alignments was used to infer the phylogeny using state-of-the-art phylogenetic inference software, which was then compared to the true tree. Using these extensive data, we trained machine-learning algorithms to estimate branch support values for each bipartition within the maximum-likelihood trees obtained by each software. Our results demonstrate that our model provides fast and more accurate probability-based branch support values than commonly used procedures. We demonstrate the applicability of our approach on empirical datasets. AVAILABILITY AND IMPLEMENTATION The data supporting this work are available in the Figshare repository at https://doi.org/10.6084/m9.figshare.25050554.v1, and the underlying code is accessible via GitHub at https://github.com/noaeker/bootstrap_repo.
Collapse
Affiliation(s)
- Noa Ecker
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Dorothée Huchon
- School of Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
- The Steinhardt Museum of Natural History and National Research Center, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Yishay Mansour
- The Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
6
|
Zou Y, Zhang Z, Zeng Y, Hu H, Hao Y, Huang S, Li B. Common Methods for Phylogenetic Tree Construction and Their Implementation in R. Bioengineering (Basel) 2024; 11:480. [PMID: 38790347 PMCID: PMC11117635 DOI: 10.3390/bioengineering11050480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/04/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
A phylogenetic tree can reflect the evolutionary relationships between species or gene families, and they play a critical role in modern biological research. In this review, we summarize common methods for constructing phylogenetic trees, including distance methods, maximum parsimony, maximum likelihood, Bayesian inference, and tree-integration methods (supermatrix and supertree). Here we discuss the advantages, shortcomings, and applications of each method and offer relevant codes to construct phylogenetic trees from molecular data using packages and algorithms in R. This review aims to provide comprehensive guidance and reference for researchers seeking to construct phylogenetic trees while also promoting further development and innovation in this field. By offering a clear and concise overview of the different methods available, we hope to enable researchers to select the most appropriate approach for their specific research questions and datasets.
Collapse
Affiliation(s)
- Yue Zou
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (Y.Z.); (Z.Z.); (Y.Z.); (H.H.); (Y.H.)
| | - Zixuan Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (Y.Z.); (Z.Z.); (Y.Z.); (H.H.); (Y.H.)
| | - Yujie Zeng
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (Y.Z.); (Z.Z.); (Y.Z.); (H.H.); (Y.H.)
| | - Hanyue Hu
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (Y.Z.); (Z.Z.); (Y.Z.); (H.H.); (Y.H.)
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (Y.Z.); (Z.Z.); (Y.Z.); (H.H.); (Y.H.)
| | - Sheng Huang
- Animal Nutrition Institute, Chongqing Academy of Animal Science, Chongqing 402460, China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (Y.Z.); (Z.Z.); (Y.Z.); (H.H.); (Y.H.)
| |
Collapse
|
7
|
Williams TA, Davin AA, Szánthó LL, Stamatakis A, Wahl NA, Woodcroft BJ, Soo RM, Eme L, Sheridan PO, Gubry-Rangin C, Spang A, Hugenholtz P, Szöllősi GJ. Phylogenetic reconciliation: making the most of genomes to understand microbial ecology and evolution. THE ISME JOURNAL 2024; 18:wrae129. [PMID: 39001714 PMCID: PMC11293204 DOI: 10.1093/ismejo/wrae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 07/01/2024] [Accepted: 07/12/2024] [Indexed: 07/15/2024]
Abstract
In recent years, phylogenetic reconciliation has emerged as a promising approach for studying microbial ecology and evolution. The core idea is to model how gene trees evolve along a species tree and to explain differences between them via evolutionary events including gene duplications, transfers, and losses. Here, we describe how phylogenetic reconciliation provides a natural framework for studying genome evolution and highlight recent applications including ancestral gene content inference, the rooting of species trees, and the insights into metabolic evolution and ecological transitions they yield. Reconciliation analyses have elucidated the evolution of diverse microbial lineages, from Chlamydiae to Asgard archaea, shedding light on ecological adaptation, host-microbe interactions, and symbiotic relationships. However, there are many opportunities for broader application of the approach in microbiology. Continuing improvements to make reconciliation models more realistic and scalable, and integration of ecological metadata such as habitat, pH, temperature, and oxygen use offer enormous potential for understanding the rich tapestry of microbial life.
Collapse
Affiliation(s)
- Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol BS81TQ, United Kingdom
| | - Adrian A Davin
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 113-0033 Tokyo, Japan
| | - Lénárd L Szánthó
- MTA-ELTE “Lendület” Evolutionary Genomics Research Group, Eötvös University, 1117 Budapest, Hungary
- Model-Based Evolutionary Genomics Unit, Okinawa Institute of Science and Technology Graduate University, 904-0495 Okinawa, Japan
| | - Alexandros Stamatakis
- Biodiversity Computing Group, Institute of Computer Science, Foundation for Research and Technology Hellas, 70013 Heraklion, Greece
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany
| | - Noah A Wahl
- Biodiversity Computing Group, Institute of Computer Science, Foundation for Research and Technology Hellas, 70013 Heraklion, Greece
| | - Ben J Woodcroft
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, QLD 4102, Australia
| | - Rochelle M Soo
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Laura Eme
- Unité d’Ecologie, Systématique et Evolution, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Paul O Sheridan
- School of Biological and Chemical Sciences, University of Galway, Galway H91 TK33, Ireland
| | - Cecile Gubry-Rangin
- School of Biological Sciences, University of Aberdeen, Aberdeen AB24 3FX, United Kingdom
| | - Anja Spang
- Department of Marine Microbiology and Biogeochemistry, NIOZ, Royal Netherlands Institute for Sea Research, PO Box 59, 1790 AB Den Burg, The Netherlands
- Department of Evolutionary & Population Biology, Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam, The Netherlands
| | - Philip Hugenholtz
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Gergely J Szöllősi
- MTA-ELTE “Lendület” Evolutionary Genomics Research Group, Eötvös University, 1117 Budapest, Hungary
- Model-Based Evolutionary Genomics Unit, Okinawa Institute of Science and Technology Graduate University, 904-0495 Okinawa, Japan
- Institute of Evolution, HUN REN Centre for Ecological Research, 1121 Budapest, Hungary
| |
Collapse
|