1
|
Brazão JM, Foster PG, Cox CJ. Data-specific substitution models improve protein-based phylogenetics. PeerJ 2023; 11:e15716. [PMID: 37576497 PMCID: PMC10416777 DOI: 10.7717/peerj.15716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 06/16/2023] [Indexed: 08/15/2023] Open
Abstract
Calculating amino-acid substitution models that are specific for individual protein data sets is often difficult due to the computational burden of estimating large numbers of rate parameters. In this study, we tested the computational efficiency and accuracy of five methods used to estimate substitution models, namely Codeml, FastMG, IQ-TREE, P4 (maximum likelihood), and P4 (Bayesian inference). Data-specific substitution models were estimated from simulated alignments (with different lengths) that were generated from a known simulation model and simulation tree. Each of the resulting data-specific substitution models was used to calculate the maximum likelihood score of the simulation tree and simulated data that was used to calculate the model, and compared with the maximum likelihood scores of the known simulation model and simulation tree on the same simulated data. Additionally, the commonly-used empirical models, cpREV and WAG, were assessed similarly. Data-specific models performed better than the empirical models, which under-fitted the simulated alignments, had the highest difference to the simulation model maximum-likelihood score, clustered further from the simulation model in principal component analysis ordination, and inferred less accurate trees. Data-specific models and the simulation model shared statistically indistinguishable maximum-likelihood scores, indicating that the five methods were reasonably accurate at estimating substitution models by this measure. Nevertheless, tree statistics showed differences between optimal maximum likelihood trees. Unlike other model estimating methods, trees inferred using data-specific models generated with IQ-TREE and P4 (maximum likelihood) were not significantly different from the trees derived from the simulation model in each analysis, indicating that these two methods alone were the most accurate at estimating data-specific models. To show the benefits of using data-specific protein models several published data sets were reanalysed using IQ-TREE-estimated models. These newly estimated models were a better fit to the data than the empirical models that were used by the original authors, often inferred longer trees, and resulted in different tree topologies in more than half of the re-analysed data sets. The results of this study show that software availability and high computation burden are not limitations to generating better-fitting data-specific amino-acid substitution models for phylogenetic analyses.
Collapse
Affiliation(s)
- João M. Brazão
- Centro de Ciências do Mar, Universidade do Algarve, Faro, Algarve, Portugal
| | - Peter G. Foster
- Department of Life Sciences, Natural History Museum, London, United Kingdom
| | - Cymon J. Cox
- Centro de Ciências do Mar, Universidade do Algarve, Faro, Algarve, Portugal
| |
Collapse
|
2
|
Prillo S, Deng Y, Boyeau P, Li X, Chen PY, Song YS. CherryML: scalable maximum likelihood estimation of phylogenetic models. Nat Methods 2023; 20:1232-1236. [PMID: 37386188 PMCID: PMC10644697 DOI: 10.1038/s41592-023-01917-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 05/18/2023] [Indexed: 07/01/2023]
Abstract
Phylogenetic models of molecular evolution are central to numerous biological applications spanning diverse timescales, from hundreds of millions of years involving orthologous proteins to just tens of days relating to single cells within an organism. A fundamental problem in these applications is estimating model parameters, for which maximum likelihood estimation is typically employed. Unfortunately, maximum likelihood estimation is a computationally expensive task, in some cases prohibitively so. To address this challenge, we here introduce CherryML, a broadly applicable method that achieves several orders of magnitude speedup by using a quantized composite likelihood over cherries in the trees. The massive speedup offered by our method should enable researchers to consider more complex and biologically realistic models than previously possible. Here we demonstrate CherryML's utility by applying it to estimate a general 400 × 400 rate matrix for residue-residue coevolution at contact sites in three-dimensional protein structures; we estimate that using current state-of-the-art methods such as the expectation-maximization algorithm for the same task would take >100,000 times longer.
Collapse
Affiliation(s)
- Sebastian Prillo
- Computer Science Division, University of California, Berkeley, CA, USA
| | - Yun Deng
- Graduate Group in Computational Biology, University of California, Berkeley, CA, USA
| | - Pierre Boyeau
- Computer Science Division, University of California, Berkeley, CA, USA
| | - Xingyu Li
- Computer Science Division, University of California, Berkeley, CA, USA
| | - Po-Yen Chen
- Computer Science Division, University of California, Berkeley, CA, USA
| | - Yun S Song
- Computer Science Division, University of California, Berkeley, CA, USA.
- Department of Statistics, University of California, Berkeley, CA, USA.
| |
Collapse
|
3
|
Dang CC, Vinh LS. Estimating amino acid substitution models for metazoan evolutionary studies. J Evol Biol 2023; 36:499-506. [PMID: 36598184 DOI: 10.1111/jeb.14147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 10/12/2022] [Accepted: 11/21/2022] [Indexed: 01/05/2023]
Abstract
Amino acid substitution models represent the substitution rates among amino acids during the evolution of protein sequences. The models are a prerequisite for maximum likelihood or Bayesian methods to analyse the phylogenetic relationships among species based on their protein sequences. Estimating amino acid substitution models requires large protein datasets and intensive computation. In this paper, we presented the estimation of both time-reversible model (Q.met) and time non-reversible model (NQ.met) for multicellular animals (Metazoa). Analyses showed that the Q.met and NQ.met models were significantly better than existing models in analysing metazoan protein sequences. Moreover, the time non-reversible model NQ.met enables us to reconstruct the rooted phylogenetic tree for Metazoa. We recommend researchers to employ the Q.met and NQ.met models in analysing metazoan protein sequences.
Collapse
Affiliation(s)
- Cuong Cao Dang
- University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam
| | - Le Sy Vinh
- University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam
| |
Collapse
|
4
|
Dang CC, Minh BQ, McShea H, Masel J, James JE, Vinh LS, Lanfear R. OUP accepted manuscript. Syst Biol 2022; 71:1110-1123. [PMID: 35139203 PMCID: PMC9366462 DOI: 10.1093/sysbio/syac007] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 01/30/2022] [Indexed: 11/12/2022] Open
Affiliation(s)
- Cuong Cao Dang
- Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, 144 Xuan Thuy, Cau Giay, Hanoi 10000, Vietnam
| | - Bui Quang Minh
- Computational Phylogenomics Lab, School of Computing, Australian National University, Canberra, Australian Capital Territory 2601, Australia
| | - Hanon McShea
- Department of Earth System Science, School of Earth, Energy, and Environmental Sciences, Stanford University, Palo Alto, CA 94305, USA
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Jennifer Eleanor James
- Department of Ecology and Genetics, Plant Ecology and Evolution, Evolutionary Biology Center, Uppsala University, Uppsala, SE-752 36, Sweden
| | - Le Sy Vinh
- Correspondence to be sent to: Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, 144 Xuan Thuy, Cau Giay, Hanoi 10000, Vietnam; E-mail: Cuong Cao Dang and Bui Quang Minh contributed equally to the work.
| | - Robert Lanfear
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| |
Collapse
|
5
|
Yang C, Bai Y, Dong K, Yang J, Lai XH, Lu S, Zhang G, Cheng Y, Jin D, Zhang S, Lv X, Huang Y, Xu J. Actinomyces marmotae sp. nov. and Actinomyces procaprae sp. nov. isolated from wild animals and reclassification of Actinomyces liubingyangii and Actinomyces tangfeifanii as Boudabousia liubingyangii comb. nov. and Boudabousia tangfeifanii comb. nov., respectively. Int J Syst Evol Microbiol 2021; 71. [PMID: 33560201 DOI: 10.1099/ijsem.0.004696] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Four Gram-stain-positive, catalase-negative, non-spore-forming, rod-shaped bacterial strains (zg-325T, zg329, dk561T and dk752) were isolated from the respiratory tract of marmot (Marmota himalayana) and the faeces of Tibetan gazelle (Procapra picticaudata) from the Qinghai-Tibet Plateau of PR China. The results of 16S rRNA gene sequence-based phylogenetic analyses indicated that strains zg-325T and dk561T represent members of the genus Actinomyces, most similar to Actinomyces denticolens DSM 20671T and Actinomyces ruminicola B71T, respectively. The DNA G+C contents of strains zg-325T and dk561T were 71.6 and 69.3 mol%, respectively. The digital DNA-DNA hybridization values of strains zg-325T and dk561T with their most closely related species were below the 70 % threshold for species demarcation. The four strains grew best at 35 °C in air containing 5 % CO2 on brain heart infusion (BHI) agar with 5 % sheep blood. All four strains had C18:1ω9c and C16:0 as the major cellular fatty acids. MK-8 and MK-9 were the major menaquinones in zg-325T while MK-10 was predominant in dk561T. The major polar lipids included diphosphatidylglycerol and phosphatidylinositol. On the basis of several lines of evidence from phenotypic and phylogenetic analyses, zg-325T and dk561T represent novel species of the genus Actinomyces, for which the name Actinomyces marmotae sp. nov. and Actinomyces procaprae sp. nov. are proposed. The type strains are zg-325T (=GDMCC 1.1724T=JCM 34091T) and dk561T (=CGMCC 4.7566T=JCM 33484T). We also propose, on the basis of the phylogenetic results herein, the reclassification of Actinomyces liubingyangii and Actinomyces tangfeifanii as Boudabousia liubingyangii comb. nov. and Boudabousia tangfeifanii comb. nov., respectively.
Collapse
Affiliation(s)
- Caixin Yang
- State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, PR China
- Department of Epidemiology, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi Province 030001, PR China
| | - Yibo Bai
- State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, PR China
- Department of Epidemiology, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi Province 030001, PR China
| | - Kui Dong
- State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, PR China
- Department of Epidemiology, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi Province 030001, PR China
| | - Jing Yang
- Research Units of Discovery of Unknown Bacteria and Function, Chinese Academy of Medical Sciences, Beijing 102206, PR China
- Shanghai Institute for Emerging and Re-emerging Infectious Diseases, Shanghai Public Health Clinical Center, Shanghai 201508, PR China
- State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, PR China
| | - Xin-He Lai
- Henan Key Laboratory of Biomolecular Recognition and Sensing, College of Chemistry and Chemical Engineering, Henan Joint International Research Laboratory of Chemo/Biosensing and Early Diagnosis of Major Diseases, Shangqiu Normal University, Shangqiu 476000, PR China
| | - Shan Lu
- Research Units of Discovery of Unknown Bacteria and Function, Chinese Academy of Medical Sciences, Beijing 102206, PR China
- Shanghai Institute for Emerging and Re-emerging Infectious Diseases, Shanghai Public Health Clinical Center, Shanghai 201508, PR China
- State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, PR China
| | - Gui Zhang
- State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, PR China
| | - Yanpeng Cheng
- State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, PR China
- Department of Epidemiology, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi Province 030001, PR China
| | - Dong Jin
- Research Units of Discovery of Unknown Bacteria and Function, Chinese Academy of Medical Sciences, Beijing 102206, PR China
- Shanghai Institute for Emerging and Re-emerging Infectious Diseases, Shanghai Public Health Clinical Center, Shanghai 201508, PR China
- State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, PR China
| | - Sihui Zhang
- State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, PR China
- Department of Epidemiology, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi Province 030001, PR China
| | - Xianglian Lv
- State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, PR China
- Department of Epidemiology, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi Province 030001, PR China
| | - Ying Huang
- State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, PR China
| | - Jianguo Xu
- Department of Epidemiology, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi Province 030001, PR China
- State Key Laboratory of Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing 102206, PR China
- Research Units of Discovery of Unknown Bacteria and Function, Chinese Academy of Medical Sciences, Beijing 102206, PR China
- Shanghai Institute for Emerging and Re-emerging Infectious Diseases, Shanghai Public Health Clinical Center, Shanghai 201508, PR China
- Institute of Public Health, Nankai University, Tianjin, PR China
| |
Collapse
|
6
|
Chang H, Nie Y, Zhang N, Zhang X, Sun H, Mao Y, Qiu Z, Huang Y. MtOrt: an empirical mitochondrial amino acid substitution model for evolutionary studies of Orthoptera insects. BMC Evol Biol 2020; 20:57. [PMID: 32429841 PMCID: PMC7236349 DOI: 10.1186/s12862-020-01623-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 05/05/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Amino acid substitution models play an important role in inferring phylogenies from proteins. Although different amino acid substitution models have been proposed, only a few were estimated from mitochondrial protein sequences for specific taxa such as the mtArt model for Arthropoda. The increasing of mitochondrial genome data from broad Orthoptera taxa provides an opportunity to estimate the Orthoptera-specific mitochondrial amino acid empirical model. RESULTS We sequenced complete mitochondrial genomes of 54 Orthoptera species, and estimated an amino acid substitution model (named mtOrt) by maximum likelihood method based on the 283 complete mitochondrial genomes available currently. The results indicated that there are obvious differences between mtOrt and the existing models, and the new model can better fit the Orthoptera mitochondrial protein datasets. Moreover, topologies of trees constructed using mtOrt and existing models are frequently different. MtOrt does indeed have an impact on likelihood improvement as well as tree topologies. The comparisons between the topologies of trees constructed using mtOrt and existing models show that the new model outperforms the existing models in inferring phylogenies from Orthoptera mitochondrial protein data. CONCLUSIONS The new mitochondrial amino acid substitution model of Orthoptera shows obvious differences from the existing models, and outperforms the existing models in inferring phylogenies from Orthoptera mitochondrial protein sequences.
Collapse
Affiliation(s)
- Huihui Chang
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China
| | - Yimeng Nie
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China
| | - Nan Zhang
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China
| | - Xue Zhang
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China
| | - Huimin Sun
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China
| | - Ying Mao
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China
| | - Zhongying Qiu
- School of Basic Medical Sciences & Shaanxi Key Laboratory of Brain Disorders, Xi'an Medical University, Xi'an, 710021, China
| | - Yuan Huang
- College of Life Sciences, Shaanxi Normal University, No. 620, West Chang'an Avenue, Xi'an, 710119, Shaanxi, China.
| |
Collapse
|
7
|
FLAVI: An Amino Acid Substitution Model for Flaviviruses. J Mol Evol 2020; 88:445-452. [PMID: 32356020 DOI: 10.1007/s00239-020-09943-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 04/15/2020] [Indexed: 10/24/2022]
Abstract
Amino acid substitution models represent substitution rates among amino acids during the evolution. The models play an important role in analyzing protein sequences, especially inferring phylogenies. The rapid evolution of flaviviruses is expanding the threat in public health. A number of models have been estimated for some viruses, however, they are unable to properly represent amino acid substitution patterns of flaviviruses. In this study, we collected protein sequences from the flavivirus genus to specifically estimate an amino acid substitution model, called FLAVI, for flaviviruses. Experiments showed that the collected dataset was sufficient to estimate a stable model. More importantly, the FLAVI model was remarkably better than other existing models in analyzing flavivirus protein sequences. We recommend researchers to use the FLAVI model when studying protein sequences of flaviviruses or closely related viruses.
Collapse
|
8
|
Kuzminkova AA, Sokol AD, Ushakova KE, Popadin KY, Gunbin KV. mtProtEvol: the resource presenting molecular evolution analysis of proteins involved in the function of Vertebrate mitochondria. BMC Evol Biol 2019; 19:47. [PMID: 30813887 PMCID: PMC6391778 DOI: 10.1186/s12862-019-1371-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Heterotachy is the variation in the evolutionary rate of aligned sites in different parts of the phylogenetic tree. It occurs mainly due to epistatic interactions among the substitutions, which are highly complex and make it difficult to study protein evolution. The vast majority of computational evolutionary approaches for studying these epistatic interactions or their evolutionary consequences in proteins require high computational time. However, recently, it has been shown that the evolution of residue solvent accessibility (RSA) is tightly linked with changes in protein fitness and intra-protein epistatic interactions. This provides a computationally fast alternative, based on comparison of evolutionary rates of amino acid replacements with the rates of RSA evolutionary changes in order to recognize any shifts in epistatic interaction. RESULTS Based on RSA information, data randomization and phylogenetic approaches, we constructed a software pipeline, which can be used to analyze the evolutionary consequences of intra-protein epistatic interactions with relatively low computational time. We analyzed the evolution of 512 protein families tightly linked to mitochondrial function in Vertebrates and created "mtProtEvol", the web resource with data on protein evolution. In strict agreement with lifespan and metabolic rate data, we demonstrated that different functional categories of mitochondria-related proteins subjected to selection on accelerated and decelerated RSA rates in rodents and primates. For example, accelerated RSA evolution in rodents has been shown for Krebs cycle enzymes, respiratory chain and reactive oxygen species metabolism, while in primates these functions are stress-response, translation and mtDNA integrity. Decelerated RSA evolution in rodents has been demonstrated for translational machinery and oxidative stress response components. CONCLUSIONS mtProtEvol is an interactive resource focused on evolutionary analysis of epistatic interactions in protein families involved in Vertebrata mitochondria function and available at http://bioinfodbs.kantiana.ru/mtProtEvol /. This resource and the devised software pipeline may be useful tool for researchers in area of protein evolution.
Collapse
Affiliation(s)
- Anastasia A. Kuzminkova
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Anastasia D. Sokol
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Kristina E. Ushakova
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Konstantin Yu. Popadin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Konstantin V. Gunbin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- Center of Brain Neurobiology and Neurogenetics, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
9
|
Le VS, Dang CC, Le QS. Improved mitochondrial amino acid substitution models for metazoan evolutionary studies. BMC Evol Biol 2017; 17:136. [PMID: 28606055 PMCID: PMC5469158 DOI: 10.1186/s12862-017-0987-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 06/03/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Amino acid substitution models play an essential role in inferring phylogenies from mitochondrial protein data. However, only few empirical models have been estimated from restricted mitochondrial protein data of a hundred species. The existing models are unlikely to represent appropriately the amino acid substitutions from hundred thousands metazoan mitochondrial protein sequences. RESULTS We selected 125,935 mitochondrial protein sequences from 34,448 species in the metazoan kingdom to estimate new amino acid substitution models targeting metazoa, vertebrates and invertebrate groups. The new models help to find significantly better likelihood phylogenies in comparison with the existing models. We noted remarkable distances from phylogenies with the existing models to the maximum likelihood phylogenies that indicate a considerable number of incorrect bipartitions in phylogenies with the existing models. Finally, we used the new models and mitochondrial protein data to certify that Testudines, Aves, and Crocodylia form one separated clade within amniotes. CONCLUSIONS We introduced new mitochondrial amino acid substitution models for metazoan mitochondrial proteins. The new models outperform the existing models in inferring phylogenies from metazoan mitochondrial protein data. We strongly recommend researchers to use the new models in analysing metazoan mitochondrial protein data.
Collapse
Affiliation(s)
- Vinh Sy Le
- University of Engineering and Technology, Vietnam National University Hanoi, Hanoi, Vietnam.
| | - Cuong Cao Dang
- University of Engineering and Technology, Vietnam National University Hanoi, Hanoi, Vietnam
| | - Quang Si Le
- School of Pharmacy and Biomedical Sciences, University of Portsmouth, Winston Churchill Avenue Portsmouth, Portsmouth, PO1 2UP, UK.
| |
Collapse
|