1
|
Dumm W, Barker M, Howard-Snyder W, DeWitt Iii WS, Matsen Iv FA. Representing and extending ensembles of parsimonious evolutionary histories with a directed acyclic graph. J Math Biol 2023; 87:75. [PMID: 37878119 PMCID: PMC10600060 DOI: 10.1007/s00285-023-02006-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 09/12/2023] [Accepted: 09/26/2023] [Indexed: 10/26/2023]
Abstract
In many situations, it would be useful to know not just the best phylogenetic tree for a given data set, but the collection of high-quality trees. This goal is typically addressed using Bayesian techniques, however, current Bayesian methods do not scale to large data sets. Furthermore, for large data sets with relatively low signal one cannot even store every good tree individually, especially when the trees are required to be bifurcating. In this paper, we develop a novel object called the "history subpartition directed acyclic graph" (or "history sDAG" for short) that compactly represents an ensemble of trees with labels (e.g. ancestral sequences) mapped onto the internal nodes. The history sDAG can be built efficiently and can also be efficiently trimmed to only represent maximally parsimonious trees. We show that the history sDAG allows us to find many additional equally parsimonious trees, extending combinatorially beyond the ensemble used to construct it. We argue that this object could be useful as the "skeleton" of a more complete uncertainty quantification.
Collapse
Affiliation(s)
- Will Dumm
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- Howard Hughes Medical Institute, Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Mary Barker
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- Howard Hughes Medical Institute, Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - William Howard-Snyder
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, USA
| | - William S DeWitt Iii
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California, USA
| | - Frederick A Matsen Iv
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
- Howard Hughes Medical Institute, Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA.
- Department of Statistics, University of Washington, Seattle, Washington, USA.
| |
Collapse
|
2
|
Wong EB, Kamaruddin N, Mokhtar M, Yusof N, Khairuddin RFR. Assessing sequence heterogeneity in Chlorellaceae DNA barcode markers for phylogenetic inference. J Genet Eng Biotechnol 2023; 21:104. [PMID: 37851281 PMCID: PMC10584744 DOI: 10.1186/s43141-023-00550-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 09/20/2023] [Indexed: 10/19/2023]
Abstract
Phylogenetic inference is an important approach that allows the recovery of the evolutionary history and the origin of the Chlorellaceae species. Despite the species' potential for biofuel feedstock production, their high phenotypic plasticity and similar morphological structures among the species have muddled the taxonomy and identification of the Chlorellaceae species. This study aimed to decipher Chlorellaceae DNA barcode marker heterogeneity by examining the sequence divergence and genomic properties of 18S rRNA, ITS (ITS1-5.8S rRNA-ITS2-28S rRNA), and rbcL from 655 orthologous sequences of 64 species across 31 genera in the Chlorellaceae family. The study assessed the distinct evolutionary properties of the DNA markers that may have caused the discordance between individual trees in the phylogenetic inference using the Robinson-Foulds distance and the Shimodaira-Hasegawa test. Our findings suggest that using the supermatrix approach improves the congruency between trees by reducing stochastic error and increasing the confidence of the inferred Chlorellaceae phylogenetic tree. This study also found that the phylogenies inferred through the supermatrix approach might not always be well supported by all markers. The study highlights that assessing sequence heterogeneity prior to the phylogenetic inference could allow the approach to accommodate sequence evolutionary properties and support species identification from the most congruent phylogeny, which can better represent the evolution of Chlorellaceae species.
Collapse
Affiliation(s)
- Ee Bhei Wong
- Department of Biology, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
| | - Nurhaida Kamaruddin
- Department of Biology, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
| | - Marina Mokhtar
- Department of Biology, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
| | - Norjan Yusof
- Department of Biology, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
| | - Raja Farhana R Khairuddin
- Department of Biology, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia.
- Centre of Research for Computational Sciences and Informatics for Biology, Bioindustry, Environment, Agriculture, and Healthcare (CRYSTAL), Universiti Malaya, Kuala Lumpur, Malaysia.
| |
Collapse
|
3
|
Ito Y, Tanaka N. Phylogeny of Alisma (Alismataceae) revisited: implications for polyploid evolution and species delimitation. J Plant Res 2023; 136:613-629. [PMID: 37402089 DOI: 10.1007/s10265-023-01477-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 06/27/2023] [Indexed: 07/05/2023]
Abstract
Alisma L. is a genus of aquatic and wetland plants belonging to family Alismataceae. At present, it is thought to contain ten species. Variation in ploidy level is known in the genus, with diploids, tetraploids and hexaploids recorded. Previous molecular phylogenetic studies of Alisma have generated a robust backbone that reveals important aspects of the evolutionary history of this cosmopolitan genus, yet questions remain unresolved about the formation of the polyploid taxa and the taxonomy of one particularly challenging, widely distributed species complex. Here we directly sequenced, or cloned and sequenced, nuclear DNA (nrITS and phyA) and chloroplast DNA (matK, ndhF, psbA-trnH and rbcL) of multiple samples of six putative species and two varieties, and conducted molecular phylogenetic analyses. Alisma canaliculatum and its two varieties known in East Asia and A. rariflorum endemic to Japan possess closely related but heterogeneous genomes, strongly indicating that the two species were generated from two diploid progenitors, and are possibly siblings of one another. This evolutionary event may have occurred in Japan. Alisma canaliculatum var. canaliculatum is segregated into two types, each of which are geographically slightly differentiated in Japan. We reconstructed a single phylogeny based on the multi-locus data using Homologizer and then applied species delimitation analysis (STACEY). This allowed us to discern A. orientale as apparently endemic to the Southeast Asian Massif and distinct from the widespread A. plantago-aquatica. The former species was most likely formed through parapatric speciation at the southern edge of the distribution of the latter.
Collapse
Affiliation(s)
- Yu Ito
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1 Nagaotoge-Cho, Hirakata, Osaka, 573-0101, Japan.
| | - Norio Tanaka
- Department of Botany, National Museum of Nature and Science, Tsukuba, 305-0005, Japan
| |
Collapse
|
4
|
Manter DK, Hamm AK, Deel HL. Community structure and abundance of ACC deaminase containing bacteria in soils with 16S-PICRUSt2 inference or direct acdS gene sequencing. J Microbiol Methods 2023:106740. [PMID: 37301376 DOI: 10.1016/j.mimet.2023.106740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 05/17/2023] [Accepted: 05/24/2023] [Indexed: 06/12/2023]
Abstract
Bacteria containing the enzyme 1-aminocyclopropane-1-carboxylate deaminase (ACCD+) can reduce plant ethylene levels and increase root development and elongation resulting in increased resiliency to drought and other plant stressors. Although these bacteria are ubiquitous in the soil, non-culture-based methods for their enumeration and identification are not well developed. In this study we compare two culture-independent approaches for identifying ACCD+ bacteria. First, quantitative PCR (qPCR) and direct acdS sequencing with newly designed gene-specific primers; and second, phylogenetic construction of 16S rRNA amplicon libraries with the PICRUSt2 tool. Using soils from eastern Colorado, we showed complementary yet differing results in ACCD+ abundance and community structure responding to water availability. Across all sites, gene abundances estimated from qPCR with the acdS gene-specific primers and phylogenetic reconstruction using PICRUSt2 were significantly correlated. However, PICRUSt2 identified members of the Acidobacteria, Proteobacteria, and Bacteroidetes phyla (now known as Acidobacteriota, Pseudomonadota, and Bacteroidota according to the International Code of Nomenclature of Prokaryotes) as ACCD+ bacteria, whereas the acdS primers amplified only members of the Proteobacteria phyla. Despite these differences, both measures showed that bacterial abundance of ACCD+ decreased as soil water content decreased along a potential evapotranspiration (PET) gradient at three sites in eastern Colorado. One major advantage of using 16S sequencing and PICRUSt2 in metagenomic studies is the ability to get a potential functional profile of all known KEGG (Kyoto Encyclopedia of Genes and Genomes) enzymes within the bacterial community of a single soil sample. The 16S-PICRUSt2 method paints a broader picture of the biological and biochemical function of the soil microbiome compared to direct acdS sequencing; however, phylogenetic analysis based on 16S gene relatedness may not reflect that of the functional gene of interest.
Collapse
Affiliation(s)
- Daniel K Manter
- United States Department of Agriculture, Agricultural Research Service, Soil Management and Sugarbeet Research Unit, 2150 Centre Ave Bldg D, Fort Collins, CO 80526, USA.
| | - Alison K Hamm
- United States Department of Agriculture, Agricultural Research Service, Soil Management and Sugarbeet Research Unit, 2150 Centre Ave Bldg D, Fort Collins, CO 80526, USA
| | - Heather L Deel
- United States Department of Agriculture, Agricultural Research Service, Soil Management and Sugarbeet Research Unit, 2150 Centre Ave Bldg D, Fort Collins, CO 80526, USA
| |
Collapse
|
5
|
Dai M, Li J, Li J, Lu H, Huang C, Lv S, Huang H, Xin R. Genetic characteristics of a novel HIV-1 recombinant lineage (CRF103_01B) and its prevalence in northern China. Virus Genes 2023:10.1007/s11262-023-01994-0. [PMID: 37079189 DOI: 10.1007/s11262-023-01994-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 04/07/2023] [Indexed: 04/21/2023]
Abstract
During the routine surveillance of HIV-1 pretreatment drug resistance in Beijing, five men who have sex with men (MSM) and a woman were observed to get infected by newly identified CRF103_01B strain. To elucidate the genetic characteristics, the near full-length genome (NFLG) was obtained. Phylogenetic inference indicated that CRF103_01B NFLG was composed of six mosaic segments. Segments IV and V of CRF103_01B were located among the clusters subtype B and CRF01_AE (group 5), respectively. The CRF103_01B strain was deduced to originate from Beijing MSM population around 2002.3-2006.4 and continued to spread among MSM population at a low level, then to the general population via heterosexual contact in northern China. Molecular epidemiology surveillance of CRF103_01B should be reinforced.
Collapse
Affiliation(s)
- Man Dai
- China Medical University, Shenyang, 110122, China
- Beijing Center for Disease Prevention and Control, Beijing, 100013, China
| | - Jia Li
- Beijing Center for Disease Prevention and Control, Beijing, 100013, China
| | - Jie Li
- Beijing Center for Disease Prevention and Control, Beijing, 100013, China
| | - Hongyan Lu
- Beijing Center for Disease Prevention and Control, Beijing, 100013, China
| | - Chun Huang
- Beijing Center for Disease Prevention and Control, Beijing, 100013, China
| | - Shiyun Lv
- Beijing Youan Hospital, Capital Medical University, Beijing, 100069, China
| | - Huihuang Huang
- The Fifth Medical Center of PLA General Hospital, Beijing, 100039, China.
| | - Ruolei Xin
- Beijing Center for Disease Prevention and Control, Beijing, 100013, China.
| |
Collapse
|
6
|
Wascher M, Kubatko LS. On the effects of selection and mutation on species tree inference. Mol Phylogenet Evol 2023; 179:107650. [PMID: 36441104 DOI: 10.1016/j.ympev.2022.107650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 10/17/2022] [Accepted: 10/18/2022] [Indexed: 11/24/2022]
Abstract
The effect of selection acting on regions of the genome on the accuracy of species-level phylogenetic inference using methods that do not explicitly model selection is an open question that is relevant to most, if not all, phylogenomic studies. To address this, we derive a mathematical approximation to the Wright-Fisher model with mutation and selection in the limit as the population size becomes large. In contrast to previous approximations based on diffusion processes, our approximation can be used to study the distribution of coalescent times for an arbitrary number of lineages, allowing calculation of the probability distribution of gene genealogies under the coalescent model. We use these calculations to show that direct selection at strengths typically encountered in practice has only a small effect on the distribution of coalescent times, and hence on the distribution of gene trees. This implies that many coalescent-based methods for estimating the species tree topology will be robust to the presence of selection in a subset of the underlying genes. Selection will, however, bias the estimation of speciation times, causing them to underestimate the true speciation times. Our model captures the effects of selection on the genealogies that generate the observed sequence data, but does not model selective pressures that act only on the subsequent sequences or that negatively impact gene tree estimation.
Collapse
|
7
|
Dougherty K, Hudak KA. Phylogeny and domain architecture of plant ribosome inactivating proteins. Phytochemistry 2022; 202:113337. [PMID: 35934106 DOI: 10.1016/j.phytochem.2022.113337] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 07/01/2022] [Accepted: 07/20/2022] [Indexed: 06/15/2023]
Abstract
Ribosome inactivating proteins (RIPs) are rRNA N-glycosylases (EC 3.2.2.22) best known for hydrolyzing an adenine base from the conserved sarcin/ricin loop of ribosomal RNA. Protein translation is inhibited by ribosome depurination; therefore, RIPs are generally considered toxic to cells. The expression of some RIPs is upregulated by biotic and abiotic stress, though the connection between RNA depurination and defense response is not well understood. Despite their prevalence in approximately one-third of flowering plant orders, our knowledge of RIPs stems primarily from biochemical analyses of individuals or genomics-scale analyses of small datasets from a limited number of species. Here, we performed an unbiased search for proteins with RIP domains and identified several-fold more RIPs than previously known - more than 800 from 120 species, many with novel associated domains and physicochemical characteristics. Based on protein domain configuration, we established 15 distinct groups, suggesting diverse functionality. Surprisingly, most of these RIPs lacked a signal peptide, indicating they may be localized to the nucleocytoplasm of cells, raising questions regarding their toxicity against conspecific ribosomes. Our phylogenetic analysis significantly extends previous models for RIP evolution in plants, predicting an original single-domain RIP that later evolved to acquire a signal peptide and different protein domains. We show that RIPs are distributed throughout 21 plant orders with many species maintaining genes for more than one RIP group. Our analyses provide the foundation for further characterization of these new RIP types, to understand how these enzymes function in plants.
Collapse
Affiliation(s)
- Kyra Dougherty
- Department of Biology, York University, Toronto, Canada.
| | | |
Collapse
|
8
|
Dougherty K, Hudak KA. Computational curation and analysis of publicly available protein sequence data from a single protein family. MethodsX 2022; 9:101846. [PMID: 36164433 PMCID: PMC9508561 DOI: 10.1016/j.mex.2022.101846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 08/29/2022] [Indexed: 11/26/2022] Open
Abstract
The wealth of sequence data available on public databases is increasing at an exponential rate, and while tremendous efforts are being made to make access to these resources easier, these data can be challenging for researchers to reuse because submissions are made from numerous laboratories with different biological objectives, resulting in inconsistent naming conventions and sequence content. Researchers can manually inspect each sequence and curate a dataset by hand but automating some of these steps will reduce this burden. This paper is a step-by-step guide describing how to identify all proteins containing a specific domain with the Conserved Protein Domain Architecture Retrieval Tool, download all associated amino acid sequences from NCBI Entrez, tabulate, and clean the data. I will also describe how to extract the full taxonomic information and computationally predict some physicochemical properties of the proteins based on amino acid sequence. The resulting data are applicable to a wide range of bioinformatic analyses where publicly available data are utilized. • Step-by-step guide to gathering, cleaning, and parsing data from publicly available databases for computational analysis, plus supplementation of taxonomic data and physicochemical characteristics from sequence data. • This strategy allows for reuse of existing large-scale publicly available data for different downstream applications to answer novel biological questions.
Collapse
|
9
|
Hickson J, Athayde LFA, Miranda TG, Junior PAS, Dos Santos AC, da Cunha Galvão LM, da Câmara ACJ, Bartholomeu DC, de Souza RDCM, Murta SMF, Nahum LA. Trypanosoma cruzi iron superoxide dismutases: insights from phylogenetics to chemotherapeutic target assessment. Parasit Vectors 2022; 15:194. [PMID: 35668508 PMCID: PMC9169349 DOI: 10.1186/s13071-022-05319-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 05/10/2022] [Indexed: 11/25/2022] Open
Abstract
Background Components of the antioxidant defense system in Trypanosoma cruzi are potential targets for new drug development. Superoxide dismutases (SODs) constitute key components of antioxidant defense systems, removing excess superoxide anions by converting them into oxygen and hydrogen peroxide. The main goal of the present study was to investigate the genes coding for iron superoxide dismutase (FeSOD) in T. cruzi strains from an evolutionary perspective. Methods In this study, molecular biology methods and phylogenetic studies were combined with drug assays. The FeSOD-A and FeSOD-B genes of 35 T. cruzi strains, belonging to six discrete typing units (Tcl–TcVI), from different hosts and geographical regions were amplified by PCR and sequenced using the Sanger method. Evolutionary trees were reconstructed based on Bayesian inference and maximum likelihood methods. Drugs that potentially interacted with T. cruzi FeSODs were identified and tested against the parasites. Results Our results suggest that T. cruzi FeSOD types are members of distinct families. Gene copies of FeSOD-A (n = 2), FeSOD-B (n = 4) and FeSOD-C (n = 4) were identified in the genome of the T. cruzi reference clone CL Brener. Phylogenetic inference supported the presence of two functional variants of each FeSOD type across the T. cruzi strains. Phylogenetic trees revealed a monophyletic group of FeSOD genes of T. cruzi TcIV strains in both distinct genes. Altogether, our results support the hypothesis that gene duplication followed by divergence shaped the evolution of T. cruzi FeSODs. Two drugs, mangafodipir and polaprezinc, that potentially interact with T. cruzi FeSODs were identified and tested in vitro against amastigotes and trypomastigotes: mangafodipir had a low trypanocidal effect and polaprezinc was inactive. Conclusions Our study contributes to a better understanding of the molecular biodiversity of T. cruzi FeSODs. Herein we provide a successful approach to the study of gene/protein families as potential drug targets. Graphical Abstract ![]()
Supplementary Information The online version contains supplementary material available at 10.1186/s13071-022-05319-2.
Collapse
Affiliation(s)
- Jéssica Hickson
- René Rachou Institute, Oswaldo Cruz Foundation (Functional genomics of parasites group; Biosystems informatics, bioengineering and genomic group), Belo Horizonte, Minas Gerais, Brazil
| | - Lucas Felipe Almeida Athayde
- René Rachou Institute, Oswaldo Cruz Foundation (Functional genomics of parasites group; Biosystems informatics, bioengineering and genomic group), Belo Horizonte, Minas Gerais, Brazil.,Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Thainá Godinho Miranda
- René Rachou Institute, Oswaldo Cruz Foundation (Functional genomics of parasites group; Biosystems informatics, bioengineering and genomic group), Belo Horizonte, Minas Gerais, Brazil.,Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Policarpo Ademar Sales Junior
- René Rachou Institute, Oswaldo Cruz Foundation (Functional genomics of parasites group; Biosystems informatics, bioengineering and genomic group), Belo Horizonte, Minas Gerais, Brazil
| | - Anderson Coqueiro Dos Santos
- Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Lúcia Maria da Cunha Galvão
- Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil.,Department of Clinical and Toxicological Analysis, Federal University of Rio Grande do Norte State, Natal, Rio Grande do Norte, Brazil
| | - Antônia Cláudia Jácome da Câmara
- Department of Clinical and Toxicological Analysis, Federal University of Rio Grande do Norte State, Natal, Rio Grande do Norte, Brazil
| | - Daniella Castanheira Bartholomeu
- Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Rita de Cássia Moreira de Souza
- René Rachou Institute, Oswaldo Cruz Foundation (Functional genomics of parasites group; Biosystems informatics, bioengineering and genomic group), Belo Horizonte, Minas Gerais, Brazil
| | - Silvane Maria Fonseca Murta
- René Rachou Institute, Oswaldo Cruz Foundation (Functional genomics of parasites group; Biosystems informatics, bioengineering and genomic group), Belo Horizonte, Minas Gerais, Brazil.
| | - Laila Alves Nahum
- René Rachou Institute, Oswaldo Cruz Foundation (Functional genomics of parasites group; Biosystems informatics, bioengineering and genomic group), Belo Horizonte, Minas Gerais, Brazil. .,Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil. .,Promove College of Technology, Belo Horizonte, Minas Gerais, Brazil.
| |
Collapse
|
10
|
Allman ES, Baños H, Rhodes JA. Identifiability of species network topologies from genomic sequences using the logDet distance. J Math Biol 2022; 84:35. [PMID: 35385988 DOI: 10.1007/s00285-022-01734-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 01/12/2022] [Accepted: 03/02/2022] [Indexed: 10/18/2022]
Abstract
Inference of network-like evolutionary relationships between species from genomic data must address the interwoven signals from both gene flow and incomplete lineage sorting. The heavy computational demands of standard approaches to this problem severely limit the size of datasets that may be analyzed, in both the number of species and the number of genetic loci. Here we provide a theoretical pointer to more efficient methods, by showing that logDet distances computed from genomic-scale sequences retain sufficient information to recover network relationships in the level-1 ultrametric case. This result is obtained under the Network Multispecies Coalescent model combined with a mixture of General Time-Reversible sequence evolution models across individual gene trees. It applies to both unlinked site data, such as for SNPs, and to sequence data in which many contiguous sites may have evolved on a common tree, such as concatenated gene sequences. Thus under standard stochastic models statistically justifiable inference of network relationships from sequences can be accomplished without consideration of individual genes or gene trees.
Collapse
|
11
|
Abstract
Three competing 'methods' have been endorsed for inferring phylogenetic hypotheses: parsimony, likelihood, and Bayesianism. The latter two have been claimed superior because they take into account rates of sequence substitution. Can rates of substitution be justified on its own accord in inferences of explanatory hypotheses? Answering this question requires addressing four issues: (1) the aim of scientific inquiry, (2) the nature of why-questions, (3) explanatory hypotheses as answers to why-questions, and (4) acknowledging that neither parsimony, likelihood, nor Bayesianism are inferential actions leading to explanatory hypotheses. The aim of scientific inquiry is to acquire causal understanding of effects. Observation statements of organismal characters lead to implicit or explicit why-questions. Those questions, conveyed in data matrices, assume the truth of observation statements, which is contrary to subsequently invoking substitution rates within inferences to phylogenetic hypotheses. Inferences of explanatory hypotheses are abductive in form, such that some version of an evolutionary theory(ies) is/are included or implied. If rates of sequence evolution are to be considered, it must be done prior to, rather than within abduction, which requires renaming those putatively-shared nucleotides subject to substitution rates. There are, however, no epistemic grounds for renaming characters to accommodate rates, calling into question the legitimacy of causally accounting for sequence data.
Collapse
Affiliation(s)
- Kirk Fitzhugh
- Natural History Museum of Los Angeles County, 900 Exposition Blvd, Los Angeles, CA, 90007, USA.
| |
Collapse
|
12
|
Sousa-Paula LCD, da Silva LG, da Silva Junior WJ, Figueirêdo Júnior CAS, Costa CHN, Pessoa FAC, Dantas-Torres F. Genetic structure of allopatric populations of Lutzomyia longipalpis sensu lato in Brazil. Acta Trop 2021; 222:106031. [PMID: 34224718 DOI: 10.1016/j.actatropica.2021.106031] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/04/2021] [Accepted: 06/26/2021] [Indexed: 11/29/2022]
Abstract
Lutzomyia longipalpis sensu lato is a complex of phlebotomine sand fly species, which are widespread in the Neotropics. They have a great medico-veterinary importance due their role as vectors of Leishmania infantum, the causative agent of visceral leishmaniasis. Morphological variations of Lu. longipalpis s.l. males were reported in the late 1960s in Brazil. Male populations can present either one pair of spots on third abdominal tergites or two pairs on third and fourth ones, namely 1S and 2S phenotypes, respectively. Since then, there has been much interest on the taxonomic status of Lu. longipalpis s.l. Thereafter, several lines of evidence have been congruent in suggesting the existence of an uncertain number of cryptic species within Lu. longipalpis s.l. in Brazil. Herein, a 525 bp-fragment of the period gene was used for assessing the genetic structure and phylogenetic relationship of Lu. longipalpis s.l. populations in Brazil. We performed two set of analyses, first we originally sequenced three populations (Passira, Santarém and Teresina) of Lu. longipalpis s.l. and compared them. Thereafter, we performed a global analysis including in our dataset other three pairs of sympatric populations of Lu. longipalpis s.l. from three Brazilian localities available in GenBank. Fixed single nucleotide polymorphisms (SNPs) sharing, maximum likelihood inference, genetic structure and haplotype analyses revealed the presence of two genetic groups, one composed of Teresina population, and the other encompassing Passira and Santarém populations. The global analysis reflected the first of its kind, and two prominent groups were observed: the clade I comprising Teresina 1S, Bodocó 1S, Caririaçu 1S and Sobral 1S; and the clade II encompassing Passira 2S, Santarém 1S, Bodocó 2S, Caririaçu 2S and Sobral 2S. Genetic differentiation data suggested a limited gene flow between populations of the clade I versus clade II. Our results disclosed the presence of two prominent genetic groups, which could reasonably represent populations of Lu. longipalpis s.l. whose males produce the same courtship song.
Collapse
Affiliation(s)
- Lucas Christian de Sousa-Paula
- Laboratory of Immunoparasitology, Department of Immunology, Aggeu Magalhães Institute, Oswaldo Cruz Foundation (Fiocruz Pernambuco), Avenida Professor Moraes Rego, s/n, Recife, Pernambuco 50740465, Brazil
| | | | - Wilson José da Silva Junior
- Laboratory of Bioinformatics and Evolutionary Biology, Department of Genetics, Federal University of Pernambuco, Recife, Pernambuco, Brazil
| | | | | | - Felipe Arley Costa Pessoa
- Laboratório de Ecologia e Doenças Transmissíveis na Amazônia, Leônidas e Maria Deane Institute, Oswaldo Cruz Foundation (FIOCRUZ), Manaus, Amazonas, Brazil
| | - Filipe Dantas-Torres
- Laboratory of Immunoparasitology, Department of Immunology, Aggeu Magalhães Institute, Oswaldo Cruz Foundation (Fiocruz Pernambuco), Avenida Professor Moraes Rego, s/n, Recife, Pernambuco 50740465, Brazil.
| |
Collapse
|
13
|
Abstract
Inference of the evolutionary histories of species, commonly represented by a species tree, is complicated by the divergent evolutionary history of different parts of the genome. Different loci on the genome can have different histories from the underlying species tree (and each other) due to processes such as incomplete lineage sorting (ILS), gene duplication and loss, and horizontal gene transfer. The multispecies coalescent is a commonly used model for performing inference on species and gene trees in the presence of ILS. This paper introduces Lily-T and Lily-Q, two new methods for species tree inference under the multispecies coalescent. We then compare them to two frequently used methods, SVDQuartets and ASTRAL, using simulated and empirical data. Both methods generally showed improvement over SVDQuartets, and Lily-Q was superior to Lily-T for most simulation settings. The comparison to ASTRAL was more mixed-Lily-Q tended to be better than ASTRAL when the length of recombination-free loci was short, when the coalescent population parameter [Formula: see text] was small, or when the internal branch lengths were longer.
Collapse
Affiliation(s)
- Andrew Richards
- Department of Statistics, The Ohio State University, Columbus, USA
| | - Laura Kubatko
- Department of Statistics, The Ohio State University, Columbus, USA.
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, USA.
| |
Collapse
|
14
|
Partida VGS, Dias HM, Corcino DSM, Van Sluys MA. Sucrose-phosphate phosphatase from sugarcane reveals an ancestral tandem duplication. BMC Plant Biol 2021; 21:23. [PMID: 33413115 PMCID: PMC7792115 DOI: 10.1186/s12870-020-02795-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Accepted: 12/14/2020] [Indexed: 05/30/2023]
Abstract
BACKGROUND Sugarcane is capable to store large amounts of sucrose in the culm at maturity hence it became a major source of sucrose for the food and the renewable energy industries. Sucrose, the main disaccharide produced by photosynthesis, is mainly stored in the vacuole of the cells of non-photosynthetic tissues. Two pathways are known to release free sucrose in plant cells, one is de novo synthesis dependent on sucrose phosphate synthase (SPS) and sucrose phosphate phosphatase (S6PP) while the other is regulatory and dependent on sucrose synthase (SuSy) activity. The molecular understanding of genes that give rise to the expression of the enzyme sucrose phosphate phosphatase, responsible for the release of sucrose in the last synthetic step lag behind the regulatory SuSy gene. RESULTS Sugarcane genome sequencing effort disclosed the existence of a tandem duplication and the present work further support that both S6PP.1 and S6PP_2D isoforms are actively transcribed in young sugarcane plants but significantly less at maturity. Two commercial hybrids (SP80-3280 and R570) and both Saccharum spontaneum (IN84-58) and S.officinarum (BADILLA) exhibit transcriptional activity at three-month-old plants of the tandem S6PP_2D in leaves, culm, meristem and root system with a cultivar-specific distribution. Moreover, this tandem duplication is shared with other grasses and is ancestral in the group. CONCLUSION Detection of a new isoform of S6PP resulting from the translation of 14 exon-containing transcript (S6PP_2D) will contribute to the knowledge of sucrose metabolism in plants. In addition, expression varies along plant development and between sugarcane cultivars and parental species.
Collapse
|
15
|
Fahmi M, Kharisma VD, Ansori ANM, Ito M. Retrieval and Investigation of Data on SARS-CoV-2 and COVID-19 Using Bioinformatics Approach. Adv Exp Med Biol 2021; 1318:839-857. [PMID: 33973215 DOI: 10.1007/978-3-030-63761-3_47] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Sudden emergence and a rapid outbreak of SARS-CoV-2 accompanied by a devastating impact on the economy and public health has driven extensive scientific mobilization to study and elucidate the various associated concerns about SARS-CoV-2. Bioinformatics plays a crucial role in addressing and providing solutions to questions about SARS-CoV-2. It helps shorten the duration for the vaccine development process and the discovery of potential clinical interventions through the simulation and information retrieval, and the development of well-ordered information hubs and resources, which are essential to derive data and meaningful findings from the current massive information about SARS-CoV-2. Advanced algorithms in this field also provide approaches that are essential to elucidate the relationship, origin, and evolutionary process of SARS-CoV-2. Here, we report essential bioinformatics entities, such as database and platform development, molecular evolution and phylogenetic analyses, and vaccine designs, that are useful to solve the SARS-CoV-2 conundrum.
Collapse
Affiliation(s)
- Muhamad Fahmi
- Advanced Life Sciences Program, Graduate School of Life Sciences, Ritsumeikan University, Kusatsu, Shiga, Japan.,Systematic Review and Meta-analysis Expert Group (SRMEG), Universal Scientific Education and Research Network (USERN), Kusatsu, Japan
| | - Viol Dhea Kharisma
- Master Program in Biology, Department of Biology, Faculty of Mathematic and Natural Sciences, Universitas Brawijaya, Malang, Indonesia.,Computational Virology and Complexity Science Research Unit, Division of Molecular Biology and Genetics, Generasi Biologi Indonesia (GENBINESIA) Foundation, Gresik, Indonesia.,Systematic Review and Meta-analysis Expert Group (SRMEG), Universal Scientific Education and Research Network (USERN), Malang, Indonesia
| | - Arif Nur Muhammad Ansori
- Doctoral Program in Veterinary Science, Faculty of Veterinary Medicine, Universitas Airlangga, Kampus C Universitas Airlangga, Surabaya, Indonesia.,Systematic Review and Meta-analysis Expert Group (SRMEG), Universal Scientific Education and Research Network (USERN), Surabaya, Indonesia
| | - Masahiro Ito
- Advanced Life Sciences Program, Graduate School of Life Sciences, Ritsumeikan University, Kusatsu, Shiga, Japan. .,Systematic Review and Meta-analysis Expert Group (SRMEG), Universal Scientific Education and Research Network (USERN), Kusatsu, Japan.
| |
Collapse
|
16
|
Villalobos-Cid M, Salinas F, Inostroza-Ponta M. Total evidence or taxonomic congruence? A comparison of methods for combining biological evidence. J Bioinform Comput Biol 2020; 18:2050040. [PMID: 33155874 DOI: 10.1142/s0219720020500407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Phylogenetic inference proposes an evolutionary hypothesis for a group of taxa which is usually represented as a phylogenetic tree. The use of several distinct biological evidence has shown to produce more resolved phylogenies than single evidence approaches. Currently, two conflicting paradigms are applied to combine biological evidence: taxonomic congruence (TC) and total evidence (TE). Although the literature recommends the application of these paradigms depending on the congruence of the input data, the resultant evolutionary hypotheses could vary according to the strategy used to combine the biological evidence biasing the resultant topologies of the trees. In this work, we evaluate the ability of different strategies associated with both paradigms to produce integrated evolutionary hypotheses by considering different features of the data: missing biological evidence, diversity among sequences, complexity, and congruence. Using datasets from the literature, we compare the resultant trees with reference hypotheses obtained by applying two inference criteria: maximum parsimony and likelihood. The results show that methods associated with TE paradigm are more robust compared to TC methods, obtaining trees with more similar topologies in relation to reference trees. These results are obtained regardless of (1) the features of the data, (2) the estimated evolutionary rates, and (3) the criteria used to infer the reference evolutionary hypotheses.
Collapse
Affiliation(s)
- Manuel Villalobos-Cid
- Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Avenida Ecuador #3659, Estación Central 9170124, Chile
| | - Francisco Salinas
- Instituto de Bioquímica y Microbiología, Facultad de Ciencias, Universidad Austral de Chile, Campus Isla Teja, Valdivia, Chile.,Millennium Institute for Integrative Biology (iBio), Santiago, Chile
| | - Mario Inostroza-Ponta
- Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Avenida Ecuador #3659, Estación Central 9170124, Chile
| |
Collapse
|
17
|
Reydon TAC. Taxa hold little information about organisms: Some inferential problems in biological systematics. Hist Philos Life Sci 2019; 41:40. [PMID: 31591647 DOI: 10.1007/s40656-019-0281-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 09/17/2019] [Indexed: 06/10/2023]
Abstract
The taxa that appear in biological classifications are commonly seen as representing information about the traits of their member organisms. This paper examines in what way taxa feature in the storage and retrieval of such information. I will argue that taxa do not actually store much information about the traits of their member organisms. Rather, I want to suggest, taxa should be understood as functioning to localize organisms in the genealogical network of life on Earth. Taxa store information about where organisms are localized in the network, which is important background information when it comes to establishing knowledge about organismal traits, but it is not itself information about these traits. The view of species and higher taxa that is proposed here follows from examining three problems that occur in contemporary biological systematics and are discussed here: the problem of generalization over taxa, the problem of phylogenetic inference, and the problematic nature of the Tree of Life.
Collapse
Affiliation(s)
- Thomas A C Reydon
- Institute of Philosophy & Centre for Ethics and Law in the Life Sciences (CELLS), Leibniz University Hannover, Im Moore 21, 30167, Hannover, Germany.
| |
Collapse
|
18
|
Zhang LN, Ma PF, Zhang YX, Zeng CX, Zhao L, Li DZ. Using nuclear loci and allelic variation to disentangle the phylogeny of Phyllostachys (Poaceae, Bambusoideae). Mol Phylogenet Evol 2019; 137:222-235. [PMID: 31112779 DOI: 10.1016/j.ympev.2019.05.011] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 05/16/2019] [Accepted: 05/17/2019] [Indexed: 11/18/2022]
Abstract
With the development of sequencing technologies, the use of multiple nuclear genes has become conventional for resolving difficult phylogenies. However, this technique also presents challenges due to gene-tree discordance, as a result of incomplete lineage sorting (ILS) and reticulate evolution. Although alleles can show sequence variation within individuals, which contain information regarding the evolution of organisms, they continue to be ignored in almost all phylogenetic analyses using randomly phased genome sequences. Here, we tried to incorporate alleles from multiple nuclear loci to study the phylogeny of the economically important bamboo genus Phyllostachys (Poaceae, Bambusoideae). Obtaining a total of 3926 sequences, we documented extensive allelic variation for 61 genes from 39 sampled species. Using datasets consisting of selected alleles, we demonstrated substantial discordance among phylogenetic relationships inferred from different alleles, as well as between concatenation and coalescent methods. Furthermore, ILS and hybridization were suggested to be underlying causes of the discordant phylogenetic signals. Taking these possible causes for conflicting phylogenetic results into consideration, we recovered the monophyly of Phyllostachys and its two morphology-defined sections. Our study also suggests that alleles deserve more attention in phylogenetic studies, since ignoring them can yield highly supported but spurious phylogenies. Meanwhile, alleles are helpful for unraveling complex evolutionary processes, particularly hybridization.
Collapse
Affiliation(s)
- Li-Na Zhang
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; College of Life Science, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| | - Peng-Fei Ma
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| | - Yu-Xiao Zhang
- Yunnan Academy of Biodiversity, Southwest Forestry University, Kunming, Yunnan 650224, China
| | - Chun-Xia Zeng
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| | - Lei Zhao
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| | - De-Zhu Li
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China.
| |
Collapse
|
19
|
Molaro A, Drinnenberg IA. Studying the Evolution of Histone Variants Using Phylogeny. Methods Mol Biol 2018; 1832:273-91. [PMID: 30073533 DOI: 10.1007/978-1-4939-8663-7_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
Histones wrap DNA to form nucleosomes that package eukaryotic genomes. Histone variants have evolved for diverse functions including gene expression, DNA repair, epigenetic silencing, and chromosome segregation. With the rapid increase of newly sequenced genomes the repertoire of histone variants expands, demonstrating a great diversification of these proteins across eukaryotes. In this chapter, we are providing guidelines for the computational characterization and annotation of histone variants. We describe methods to predict the characteristic histone fold domain and list features specific to known histone variants that can be used to categorize newly identified histone fold proteins. We continue describing procedures to retrieve additional related histone variants for comparative sequence analyses and phylogenetic reconstructions to refine the annotation and to determine the evolutionary trajectories of the variant in question.
Collapse
|
20
|
Carriço JA, Crochemore M, Francisco AP, Pissis SP, Ribeiro-Gonçalves B, Vaz C. Fast phylogenetic inference from typing data. Algorithms Mol Biol 2018; 13:4. [PMID: 29467814 PMCID: PMC5815242 DOI: 10.1186/s13015-017-0119-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 12/22/2017] [Indexed: 11/10/2022] Open
Abstract
Background Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. With the mainstream use of High Throughput Sequencing, the amount of data being accumulated in these databases is huge, storing thousands of different profiles. On the other hand, computing genetic evolutionary distances among a set of typing profiles or taxa dominates the running time of many phylogenetic inference methods. It is important also to note that most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles. Results We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method, and how it can be used to speedup querying local phylogenetic patterns over large typing databases.
Collapse
|
21
|
Hoang DT, Vinh LS, Flouri T, Stamatakis A, von Haeseler A, Minh BQ. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol Biol 2018; 18:11. [PMID: 29390973 PMCID: PMC5796505 DOI: 10.1186/s12862-018-1131-3] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Accepted: 01/25/2018] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The nonparametric bootstrap is widely used to measure the branch support of phylogenetic trees. However, bootstrapping is computationally expensive and remains a bottleneck in phylogenetic analyses. Recently, an ultrafast bootstrap approximation (UFBoot) approach was proposed for maximum likelihood analyses. However, such an approach is still missing for maximum parsimony. RESULTS To close this gap we present MPBoot, an adaptation and extension of UFBoot to compute branch supports under the maximum parsimony principle. MPBoot works for both uniform and non-uniform cost matrices. Our analyses on biological DNA and protein showed that under uniform cost matrices, MPBoot runs on average 4.7 (DNA) to 7 times (protein data) (range: 1.2-20.7) faster than the standard parsimony bootstrap implemented in PAUP*; but 1.6 (DNA) to 4.1 times (protein data) slower than the standard bootstrap with a fast search routine in TNT (fast-TNT). However, for non-uniform cost matrices MPBoot is 5 (DNA) to 13 times (protein data) (range:0.3-63.9) faster than fast-TNT. We note that MPBoot achieves better scores more frequently than PAUP* and fast-TNT. However, this effect is less pronounced if an intensive but slower search in TNT is invoked. Moreover, experiments on large-scale simulated data show that while both PAUP* and TNT bootstrap estimates are too conservative, MPBoot bootstrap estimates appear more unbiased. CONCLUSIONS MPBoot provides an efficient alternative to the standard maximum parsimony bootstrap procedure. It shows favorable performance in terms of run time, the capability of finding a maximum parsimony tree, and high bootstrap accuracy on simulated as well as empirical data sets. MPBoot is easy-to-use, open-source and available at http://www.cibiv.at/software/mpboot .
Collapse
Affiliation(s)
- Diep Thi Hoang
- University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam
| | - Le Sy Vinh
- University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT UK
| | - Alexandros Stamatakis
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Karlsruhe Institute of Technology, Institute for Theoretical Informatics, Karlsruhe, Germany
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University Vienna, Campus Vienna Biocenter 5, A-1030 Vienna, Austria
- Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria
| | - Bui Quang Minh
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University Vienna, Campus Vienna Biocenter 5, A-1030 Vienna, Austria
| |
Collapse
|
22
|
Margos G, Notter I, Fingerle V. Species Identification and Phylogenetic Analysis of Borrelia burgdorferi Sensu Lato Using Molecular Biological Methods. Methods Mol Biol 2018; 1690:13-33. [PMID: 29032533 DOI: 10.1007/978-1-4939-7383-5_2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Bacterial species identification is required in different disciplines and-depending on the purpose-levels of specificity or resolution of typing may vary. Nowadays, molecular methods are the mainstay for bacterial identification and sequence-based analyses are of ever-growing importance. For diagnostics, immediate results are needed and often real-time PCR of one or two loci is the method of choice while for epidemiological or evolutionary studies sequence data of several loci improve phylogenetic resolution to required levels. Multilocus sequence typing (MLST) and multilocus sequence analyses (MLSA) utilize sequences information of several housekeeping loci (eight for Borrelia) to distinguish between species. This method has been widely used for bacterial species and strain identification and will be described in this chapter.As more and more diversity is being detected in the Borrelia burgdorferi sensu lato species complex, the importance of accurate species and strain typing has come to the fore. This is particularly significant with a view of differentiating human pathogenic and non-pathogenic strains or species and understanding the epidemiology, ecology, population structure, and evolution of species.
Collapse
Affiliation(s)
- Gabriele Margos
- Bavarian Health and Food Safety Authority, National Reference Center for Borrelia, Veterinärstr. 2, 85764, Oberschleissheim, Germany.
| | - Isabell Notter
- Bavarian Health and Food Safety Authority, National Reference Center for Borrelia, Veterinärstr. 2, 85764, Oberschleissheim, Germany
| | - Volker Fingerle
- Bavarian Health and Food Safety Authority, National Reference Center for Borrelia, Veterinärstr. 2, 85764, Oberschleissheim, Germany
| |
Collapse
|
23
|
Abstract
The early stages of phylogenetic inference from morphological data involve a sequence of choices about which analytical methods to employ. At each stage, the selection of one method over another can dramatically impact tree inference. Phylogenetic hypotheses are sensitive to decisions relating to which taxa and characters to select for analysis, whether and how to delimit character states, which taxa to use as outgroups, and how to account for character dependence. Using extant hominoids as a test case, I quantify the degree to which phylogenetic inferences are sensitive to the choice of method used to transform continuously scaled variables into categorical traits. I demonstrate that the character coding strategy significantly impacts hypotheses of character state identity and phylogenetic branching patterns. To avoid biasing evolutionary hypotheses, I recommend that continuously scaled characters be analyzed without prior discretization.
Collapse
Affiliation(s)
- Steven Worthington
- Institute for Quantitative Social Science, Harvard University, Cambridge, MA, USA
| |
Collapse
|
24
|
Kaehler BD. Full reconstruction of non-stationary strand-symmetric models on rooted phylogenies. J Theor Biol 2017; 420:144-151. [PMID: 28286217 DOI: 10.1016/j.jtbi.2017.03.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Revised: 03/06/2017] [Accepted: 03/08/2017] [Indexed: 10/20/2022]
Abstract
Understanding the evolutionary relationship among species is of fundamental importance to the biological sciences. The location of the root in any phylogenetic tree is critical as it gives an order to evolutionary events. None of the popular models of nucleotide evolution currently used in likelihood or Bayesian methods are able to infer the location of the root without exogenous information. It is known that the most general Markov models of nucleotide substitution also cannot identify the location of the root or be fitted to multiple sequence alignments with fewer than three sequences. We prove that the location of the root and the full model can be identified and statistically consistently estimated for a non-stationary, strand-symmetric substitution model given a multiple sequence alignment with two or more sequences. We also generalise earlier work to provide a practical means of overcoming the computationally intractable problem of labelling hidden states in a phylogenetic model.
Collapse
Affiliation(s)
- Benjamin D Kaehler
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia.
| |
Collapse
|
25
|
Abstract
Molecular evolution can reveal the relationship between sets of homologous sequences and the patterns of change that occur during their evolution. An important aspect of these studies is the inference of a phylogenetic tree, which explicitly describes evolutionary relationships between homologous sequences. This chapter provides an introduction to evolutionary trees and how to infer them from sequence data using some commonly used inferential methodology. It focuses on statistical methods for inferring trees and how to assess the confidence one should have in any resulting tree, with a particular emphasis on the underlying assumptions of the methods and how they might affect the tree estimate. There is also some discussion of the underlying algorithms used to perform tree search and recommendations regarding the performance of different algorithms. Finally, there are a few practical guidelines, including how to combine multiple software packages to improve inference, and a comparison between Bayesian and Maximum likelihood phylogenetics.
Collapse
Affiliation(s)
- Simon Whelan
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden.
| | - David A Morrison
- Department of Organism Biology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
26
|
Ito Y, Tanaka N, Albach DC, Barfod AS, Oxelman B, Muasya AM. Molecular phylogeny of the cosmopolitan aquatic plant genus Limosella (Scrophulariaceae) with a particular focus on the origin of the Australasian L. curdieana. J Plant Res 2017; 130:107-116. [PMID: 27864639 DOI: 10.1007/s10265-016-0872-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Accepted: 09/28/2016] [Indexed: 06/06/2023]
Abstract
Limosella is a small aquatic genus of Scrophulariaceae of twelve species, of which one is distributed in northern circumpolar regions, two in southern circumpolar regions, two in the Americas, one endemic to Australia, and six in tropical or southern Africa or both. The Australasian L. curdieana has always been considered distinct but its close phylogenetic relationships have never been inferred. Here, we investigated the following alternative phylogenetic hypotheses based on comparative leaf morphology and habitat preferences or floral morphology: (1) L. curdieana is sister to the African L. grandiflora; or (2) it is closely related to a group of other African species and the northern circumpolar L. aquatica. We tested these hypotheses in a phylogenetic framework using DNA sequence data from four plastid DNA regions and the nuclear ITS region. These were analyzed using maximum parsimony and Bayesian inference. We obtained moderately resolved, partially conflicting phylogenies, supporting that accessions of L. grandiflora form the sister group to the rest of the genus and that L. curdieana groups with the African taxa, L. africana and L. major, and L. aquatica. Thus, the molecular evidence supports the second hypothesis. A biogeographic analysis suggests an out-of-southern Africa scenario and several dispersal events in the Southern Hemisphere. Past dispersal from southern Africa to Australasia is suggested, yet it cannot be excluded that a route via tropical Africa and temperate Asia has existed.
Collapse
Affiliation(s)
- Yu Ito
- Biological Sciences, University of Canterbury, Christchurch, 8020, New Zealand.
- Xishuangbanna Tropical Botanical Garden, The Chinese Academy of Sciences, Kunming, 650223, People's Republic of China.
| | - Norio Tanaka
- Tsukuba Botanical Garden, National Museum of Nature and Science, Tokyo, Japan
| | - Dirk C Albach
- Institute of Biology and Environmental Sciences (IBU), Carl von Ossietzky-University Oldenburg, 26111, Oldenburg, Germany
| | - Anders S Barfod
- Department of Bioscience, Aarhus University, 8000, Aarhus C, Denmark
| | - Bengt Oxelman
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
| | - A Muthama Muasya
- Department of Biological Sciences, University of Cape Town, Private Bag X3, Rondebosch, Cape Town, 7701, South Africa
| |
Collapse
|
27
|
Abstract
BACKGROUND Molecular evolution studies involve many different hard computational problems solved, in most cases, with heuristic algorithms that provide a nearly optimal solution. Hence, diverse software tools exist for the different stages involved in a molecular evolution workflow. RESULTS We present MEvoLib, the first molecular evolution library for Python, providing a framework to work with different tools and methods involved in the common tasks of molecular evolution workflows. In contrast with already existing bioinformatics libraries, MEvoLib is focused on the stages involved in molecular evolution studies, enclosing the set of tools with a common purpose in a single high-level interface with fast access to their frequent parameterizations. The gene clustering from partial or complete sequences has been improved with a new method that integrates accessible external information (e.g. GenBank's features data). Moreover, MEvoLib adjusts the fetching process from NCBI databases to optimize the download bandwidth usage. In addition, it has been implemented using parallelization techniques to cope with even large-case scenarios. CONCLUSIONS MEvoLib is the first library for Python designed to facilitate molecular evolution researches both for expert and novel users. Its unique interface for each common task comprises several tools with their most used parameterizations. It has also included a method to take advantage of biological knowledge to improve the gene partition of sequence datasets. Additionally, its implementation incorporates parallelization techniques to enhance computational costs when handling very large input datasets.
Collapse
Affiliation(s)
- Jorge Álvarez-Jarreta
- Depto. de Informática e Ingeniería de Sistemas (DIIS), Universidad de Zaragoza, María de Luna 1, Zaragoza, 50018, Spain. .,Instituto de Investigación en Ingeniería de Aragón (I3A), Universidad de Zaragoza, Mariano Esquillor s/n, Zaragoza, 50018, Spain.
| | - Eduardo Ruiz-Pesini
- Depto. de Bioquímica, Biología Molecular y Celular, Universidad de Zaragoza, Miguel Server 177, Zaragoza, 50013, Spain.,Instituto de Investigación Sanitaria de Aragón (IIS Aragón), San Juan Bosco 13, Zaragoza, 50009, Spain.,CIBER de enfermedades raras, Instituto de Salud Carlos III, Monforte de Lemos 5, Madrid, 28029, Spain.,Fundación ARAID, María de Luna 11, Zaragoza, 50018, Spain
| |
Collapse
|
28
|
Hejase HA, Liu KJ. A scalability study of phylogenetic network inference methods using empirical datasets and simulations involving a single reticulation. BMC Bioinformatics 2016; 17:422. [PMID: 27737628 PMCID: PMC5064893 DOI: 10.1186/s12859-016-1277-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 09/22/2016] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Branching events in phylogenetic trees reflect bifurcating and/or multifurcating speciation and splitting events. In the presence of gene flow, a phylogeny cannot be described by a tree but is instead a directed acyclic graph known as a phylogenetic network. Both phylogenetic trees and networks are typically reconstructed using computational analysis of multi-locus sequence data. The advent of high-throughput sequencing technologies has brought about two main scalability challenges: (1) dataset size in terms of the number of taxa and (2) the evolutionary divergence of the taxa in a study. The impact of both dimensions of scale on phylogenetic tree inference has been well characterized by recent studies; in contrast, the scalability limits of phylogenetic network inference methods are largely unknown. RESULTS In this study, we quantify the performance of state-of-the-art phylogenetic network inference methods on large-scale datasets using empirical data sampled from natural mouse populations and a range of simulations using model phylogenies with a single reticulation. We find that, as in the case of phylogenetic tree inference, the performance of leading network inference methods is negatively impacted by both dimensions of dataset scale. In general, we found that topological accuracy degrades as the number of taxa increases; a similar effect was observed with increased sequence mutation rate. The most accurate methods were probabilistic inference methods which maximize either likelihood under coalescent-based models or pseudo-likelihood approximations to the model likelihood. The improved accuracy obtained with probabilistic inference methods comes at a computational cost in terms of runtime and main memory usage, which become prohibitive as dataset size grows past twenty-five taxa. None of the probabilistic methods completed analyses of datasets with 30 taxa or more after many weeks of CPU runtime. CONCLUSIONS We conclude that the state of the art of phylogenetic network inference lags well behind the scope of current phylogenomic studies. New algorithmic development is critically needed to address this methodological gap.
Collapse
Affiliation(s)
- Hussein A. Hejase
- Department of Computer Science and Engineering, Michigan State University, 428 S. Shaw Lane, East Lansing, MI USA
| | - Kevin J. Liu
- Department of Computer Science and Engineering, Michigan State University, 428 S. Shaw Lane, East Lansing, MI USA
| |
Collapse
|
29
|
Abstract
The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods.
Collapse
Affiliation(s)
- Alex Gavryushkin
- Centre for Computational Evolution, The University of Auckland, New Zealand.
| | - Alexei J Drummond
- Centre for Computational Evolution, The University of Auckland, New Zealand
| |
Collapse
|
30
|
García-Pereira MJ, Carvajal-Rodríguez A, Whelan S, Caballero A, Quesada H. Impact of deep coalescence and recombination on the estimation of phylogenetic relationships among species using AFLP markers. Mol Phylogenet Evol 2014; 76:102-9. [PMID: 24631855 DOI: 10.1016/j.ympev.2014.03.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2013] [Revised: 02/26/2014] [Accepted: 03/04/2014] [Indexed: 10/25/2022]
Abstract
Deep coalescence and the nongenealogical pattern of descent caused by recombination have emerged as a common problem for phylogenetic inference at the species level. Here we use computer simulations to assess whether AFLP-based phylogenies are robust to the uncertainties introduced by these factors. Our results indicate that phylogenetic signal can prevail even in the face of extensive deep coalescence allowing recovering the correct species tree topology. The impact of recombination on tree accuracy was related to total tree depth and species effective population size. The correct tree topology could be recovered upon many simulation settings due to a trade-off between the conflicting signals resulting from intra-locus recombination and the benefits of the joint consideration of unlinked loci that better matched overall the true species tree. Errors in tree topology were not only determined by deep coalescence, but also by the timing of divergence and the tree-building errors arising from an insufficient number of characters. DNA sequences generally outperformed AFLPs upon any simulated scenario, but this difference in performance was nearly negligible when a sufficient number of AFLP characters were sampled. Our simulations suggest that the impact of deep coalescence and intra-locus recombination on the reliability of AFLP trees could be minimal for effective population sizes equal to or lower than 10,000 (typical of many vertebrates and tree plants) given tree depths above 0.02 substitutions per site.
Collapse
Affiliation(s)
- María Jesús García-Pereira
- Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, 36310 Vigo, Spain.
| | - Antonio Carvajal-Rodríguez
- Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, 36310 Vigo, Spain.
| | - Simon Whelan
- Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala 75236-SE, Sweden.
| | - Armando Caballero
- Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, 36310 Vigo, Spain.
| | - Humberto Quesada
- Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, 36310 Vigo, Spain.
| |
Collapse
|