1
|
Patané JSL, Martins J, Setubal JC. A Guide to Phylogenomic Inference. Methods Mol Biol 2024; 2802:267-345. [PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. Phylogenomics has significant applications in fields such as evolutionary biology, systematics, comparative genomics, and conservation genetics, providing valuable insights into the origins and relationships of species and contributing to our understanding of biological diversity and evolution. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Laboratório de Genética e Cardiologia Molecular, Instituto do Coração/Heart Institute Hospital das Clínicas - Faculdade de Medicina da Universidade de São Paulo São Paulo, São Paulo, SP, Brazil
| | - Joaquim Martins
- Integrative Omics group, Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, SP, Brazil
| | - João Carlos Setubal
- Departmento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
2
|
Gu X. Genome distance and phylogenetic inference accommodating gene duplication, loss and new gene input. Mol Phylogenet Evol 2023; 189:107916. [PMID: 37742882 DOI: 10.1016/j.ympev.2023.107916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 08/06/2023] [Accepted: 09/04/2023] [Indexed: 09/26/2023]
Abstract
With the rapid growth of entire genome data, phylogenomics focuses on analyzing evolutionary histories and relationships of species, i.e., the tree of life. For decades it has been realized that the genome-wide phylogenetic inference can be approached based upon the dynamic pattern of gene content (the presence/absence of gene families), or extended gene content (absence, presence as a single-copy, or duplicates). Those methods, conceptually or technically, invoked the birth-and-death process to model the evolutionary process (gene duplication or gene loss. One common drawback is that the mechanism of new gene input, including de novo origin of new genes and the lateral gene transfer, has not been explicitly considered. In this paper, the author developed a new genome distance approach for genome phylogeny inference under the origin-birth-death stochastic process. The model takes gene duplication, gene loss and new gene input into account simultaneously. Computer simulations found that the two-genome approach is statistically difficult to distinguish between two proliferation parameters, i.e., the rate of gene duplication and the rate of new gene input. Nevertheless, it has also demonstrated the statistical feasibility for using the loss-genome distance to infer the genome phylogeny, which can avoid the large sampling problem. The strategy to study the universal tree of life was discussed and exemplified by an example.
Collapse
Affiliation(s)
- Xun Gu
- The Laurence H. Baker Center in Bioinformatics on Biological Statistics, Department of Genetics, Development and Cell Biology, Program of Ecological and Evolutionary Biology, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
3
|
Hördt A, López MG, Meier-Kolthoff JP, Schleuning M, Weinhold LM, Tindall BJ, Gronow S, Kyrpides NC, Woyke T, Göker M. Analysis of 1,000+ Type-Strain Genomes Substantially Improves Taxonomic Classification of Alphaproteobacteria. Front Microbiol 2020; 11:468. [PMID: 32373076 PMCID: PMC7179689 DOI: 10.3389/fmicb.2020.00468] [Citation(s) in RCA: 296] [Impact Index Per Article: 59.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 03/04/2020] [Indexed: 11/13/2022] Open
Abstract
The class Alphaproteobacteria is comprised of a diverse assemblage of Gram-negative bacteria that includes organisms of varying morphologies, physiologies and habitat preferences many of which are of clinical and ecological importance. Alphaproteobacteria classification has proved to be difficult, not least when taxonomic decisions rested heavily on a limited number of phenotypic features and interpretation of poorly resolved 16S rRNA gene trees. Despite progress in recent years regarding the classification of bacteria assigned to the class, there remains a need to further clarify taxonomic relationships. Here, draft genome sequences of a collection of genomes of more than 1000 Alphaproteobacteria and outgroup type strains were used to infer phylogenetic trees from genome-scale data using the principles drawn from phylogenetic systematics. The majority of taxa were found to be monophyletic but several orders, families and genera, including taxa recognized as problematic long ago but also quite recent taxa, as well as a few species were shown to be in need of revision. According proposals are made for the recognition of new orders, families and genera, as well as the transfer of a variety of species to other genera and of a variety of genera to other families. In addition, emended descriptions are given for many species mainly involving information on DNA G+C content and (approximate) genome size, both of which are confirmed as valuable taxonomic markers. Similarly, analysis of the gene content was shown to provide valuable taxonomic insights in the class. Significant incongruities between 16S rRNA gene and whole genome trees were not found in the class. The incongruities that became obvious when comparing the results of the present study with existing classifications appeared to be caused mainly by insufficiently resolved 16S rRNA gene trees or incomplete taxon sampling. Another probable cause of misclassifications in the past is the partially low overall fit of phenotypic characters to the sequence-based tree. Even though a significant degree of phylogenetic conservation was detected in all characters investigated, the overall fit to the tree varied considerably.
Collapse
Affiliation(s)
- Anton Hördt
- Department of Bioinformatics, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Brunswick, Germany
| | - Marina García López
- Department of Bioinformatics, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Brunswick, Germany
| | - Jan P. Meier-Kolthoff
- Department of Bioinformatics, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Brunswick, Germany
| | - Marcel Schleuning
- Department of Bioinformatics, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Brunswick, Germany
| | - Lisa-Maria Weinhold
- Department of Bioinformatics, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Brunswick, Germany
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czechia
| | - Brian J. Tindall
- Department of Microorganisms, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Brunswick, Germany
| | - Sabine Gronow
- Department of Microorganisms, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Brunswick, Germany
| | - Nikos C. Kyrpides
- Department of Energy, Joint Genome Institute, Berkeley, CA, United States
| | - Tanja Woyke
- Department of Energy, Joint Genome Institute, Berkeley, CA, United States
| | - Markus Göker
- Department of Bioinformatics, Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Brunswick, Germany
| |
Collapse
|
4
|
Pett W, Adamski M, Adamska M, Francis WR, Eitel M, Pisani D, Wörheide G. The Role of Homology and Orthology in the Phylogenomic Analysis of Metazoan Gene Content. Mol Biol Evol 2019; 36:643-649. [PMID: 30690573 DOI: 10.1093/molbev/msz013] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Resolving the relationships of animals (Metazoa) is crucial to our understanding of the origin of key traits such as muscles, guts, and nerves. However, a broadly accepted metazoan consensus phylogeny has yet to emerge. In part, this is because the genomes of deeply diverging and fast-evolving lineages may undergo significant gene turnover, reducing the number of orthologs shared with related phyla. This can limit the usefulness of traditional phylogenetic methods that rely on alignments of orthologous sequences. Phylogenetic analysis of gene content has the potential to circumvent this orthology requirement, with binary presence/absence of homologous gene families representing a source of phylogenetically informative characters. Applying binary substitution models to the gene content of 26 complete animal genomes, we demonstrate that patterns of gene conservation differ markedly depending on whether gene families are defined by orthology or homology, that is, whether paralogs are excluded or included. We conclude that the placement of some deeply diverging lineages may exceed the limit of resolution afforded by the current methods based on comparisons of orthologous protein sequences, and novel approaches are required to fully capture the evolutionary signal from genes within genomes.
Collapse
Affiliation(s)
- Walker Pett
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA
| | - Marcin Adamski
- Computational Biology and Bioinformatics Unit, Research School of Biology, The Australian National University, Canberra, Australia
| | - Maja Adamska
- Computational Biology and Bioinformatics Unit, Research School of Biology, The Australian National University, Canberra, Australia
| | - Warren R Francis
- Department of Earth & Environmental Sciences & GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Michael Eitel
- Department of Earth & Environmental Sciences & GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Davide Pisani
- School of Earth Sciences, University of Bristol, Bristol, United Kingdom.,School of Biological Sciences, University of Bristol, Bristol, United Kingdom
| | - Gert Wörheide
- Department of Earth & Environmental Sciences & GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, Germany.,SNSB-Bayerische Staatssammlung für Paläontologie und Geologie, München, Germany
| |
Collapse
|
5
|
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. The abundance of genomic data for an enormous variety of organisms has enabled phylogenomic inference of many groups, and this has motivated the development of many computer programs implementing the associated methods. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - Joaquim Martins
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - João C Setubal
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil.
| |
Collapse
|
6
|
Abstract
The patchy distribution of genes across the prokaryotes may be caused by multiple gene losses or lateral transfer. Probabilistic models of gene gain and loss are needed to distinguish between these possibilities. Existing models allow only single genes to be gained and lost, despite the empirical evidence for multi-gene events. We compare birth-death models (currently the only widely-used models, in which only one gene can be gained or lost at a time) to blocks models (allowing gain and loss of multiple genes within a family). We analyze two pairs of genomes: two E. coli strains, and the distantly-related Archaeoglobus fulgidus (archaea) and Bacillus subtilis (gram positive bacteria). Blocks models describe the data much better than birth-death models. Our models suggest that lateral transfers of multiple genes from the same family are rare (although transfers of single genes are probably common). For both pairs, the estimated median time that a gene will remain in the genome is not much greater than the time separating the common ancestors of the archaea and bacteria. Deep phylogenetic reconstruction from sequence data will therefore depend on choosing genes likely to remain in the genome for a long time. Phylogenies based on the blocks model are more biologically plausible than phylogenies based on the birth-death model.
Collapse
Affiliation(s)
- Matthew Spencer
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Andrew J. Roger
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
7
|
McInerney J, Pisani D, O'Connell MJ. The ring of life hypothesis for eukaryote origins is supported by multiple kinds of data. Philos Trans R Soc Lond B Biol Sci 2016; 370:20140323. [PMID: 26323755 DOI: 10.1098/rstb.2014.0323] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The literature is replete with manuscripts describing the origin of eukaryotic cells. Most of the models for eukaryogenesis are either autogenous (sometimes called slow-drip), or symbiogenic (sometimes called big-bang). In this article, we use large and diverse suites of 'Omics' and other data to make the inference that autogeneous hypotheses are a very poor fit to the data and the origin of eukaryotic cells occurred in a single symbiosis.
Collapse
Affiliation(s)
- James McInerney
- Department of Biology, National University of Ireland Maynooth, Co. Kildare, Republic of Ireland Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Davide Pisani
- School of Biological Sciences and School of Earth Sciences, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol BS8 1TG, UK
| | - Mary J O'Connell
- School of Biotechnology, Dublin City University, Glasnevin, Dublin 9, Republic of Ireland
| |
Collapse
|
8
|
Rosenfeld JA, Foox J, DeSalle R. Insect genome content phylogeny and functional annotation of core insect genomes. Mol Phylogenet Evol 2016; 97:224-232. [DOI: 10.1016/j.ympev.2015.10.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Revised: 09/02/2015] [Accepted: 10/13/2015] [Indexed: 10/22/2022]
|
9
|
Murray GGR, Weinert LA, Rhule EL, Welch JJ. The Phylogeny of Rickettsia Using Different Evolutionary Signatures: How Tree-Like is Bacterial Evolution? Syst Biol 2015; 65:265-79. [PMID: 26559010 PMCID: PMC4748751 DOI: 10.1093/sysbio/syv084] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 11/04/2015] [Indexed: 11/14/2022] Open
Abstract
Rickettsia is a genus of intracellular bacteria whose hosts and transmission strategies are both impressively diverse, and this is reflected in a highly dynamic genome. Some previous studies have described the evolutionary history of Rickettsia as non-tree-like, due to incongruity between phylogenetic reconstructions using different portions of the genome. Here, we reconstruct the Rickettsia phylogeny using whole-genome data, including two new genomes from previously unsampled host groups. We find that a single topology, which is supported by multiple sources of phylogenetic signal, well describes the evolutionary history of the core genome. We do observe extensive incongruence between individual gene trees, but analyses of simulations over a single topology and interspersed partitions of sites show that this is more plausibly attributed to systematic error than to horizontal gene transfer. Some conflicting placements also result from phylogenetic analyses of accessory genome content (i.e., gene presence/absence), but we argue that these are also due to systematic error, stemming from convergent genome reduction, which cannot be accommodated by existing phylogenetic methods. Our results show that, even within a single genus, tests for gene exchange based on phylogenetic incongruence may be susceptible to false positives.
Collapse
Affiliation(s)
- Gemma G R Murray
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK; and
| | - Lucy A Weinert
- Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge CB3 0ES, UK
| | - Emma L Rhule
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK; and
| | - John J Welch
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK; and
| |
Collapse
|
10
|
Origins of major archaeal clades correspond to gene acquisitions from bacteria. Nature 2014; 517:77-80. [PMID: 25317564 PMCID: PMC4285555 DOI: 10.1038/nature13805] [Citation(s) in RCA: 169] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Accepted: 08/28/2014] [Indexed: 01/28/2023]
Abstract
The mechanisms that underlie the origin of major prokaryotic groups are poorly understood. In principle, the origin of both species and higher taxa among prokaryotes should entail similar mechanisms — ecological interactions with the environment paired with natural genetic variation involving lineage-specific gene innovations and lineage-specific gene acquisitions1,2,3,4. To investigate the origin of higher taxa in archaea, we have determined gene distributions and gene phylogenies for the 267,568 protein coding genes of 134 sequenced archaeal genomes in the context of their homologs from 1,847 reference bacterial genomes. Archaea-specific gene families define 13 traditionally recognized archaeal higher taxa in our sample. Here we report that the origins of these 13 groups unexpectedly correspond to 2,264 group-specific gene acquisitions from bacteria. Interdomain gene transfer is highly asymmetric, transfers from bacteria to archaea are more than 5-fold more frequent than vice versa. Gene transfers identified at major evolutionary transitions among prokaryotes specifically implicate gene acquisitions for metabolic functions from bacteria as key innovations in the origin of higher archaeal taxa.
Collapse
|
11
|
Grant JR, Katz LA. Phylogenomic study indicates widespread lateral gene transfer in Entamoeba and suggests a past intimate relationship with parabasalids. Genome Biol Evol 2014; 6:2350-60. [PMID: 25146649 PMCID: PMC4217692 DOI: 10.1093/gbe/evu179] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/14/2014] [Indexed: 12/13/2022] Open
Abstract
Lateral gene transfer (LGT) has impacted the evolutionary history of eukaryotes, though to a lesser extent than in bacteria and archaea. Detecting LGT and distinguishing it from single gene tree artifacts is difficult, particularly when considering very ancient events (i.e., over hundreds of millions of years). Here, we use two independent lines of evidence--a taxon-rich phylogenetic approach and an assessment of the patterns of gene presence/absence--to evaluate the extent of LGT in the parasitic amoebozoan genus Entamoeba. Previous work has suggested that a number of genes in the genome of Entamoeba spp. were acquired by LGT. Our approach, using an automated phylogenomic pipeline to build taxon-rich gene trees, suggests that LGT is more extensive than previously thought. Our analyses reveal that genes have frequently entered the Entamoeba genome via nonvertical events, including at least 116 genes acquired directly from bacteria or archaea, plus an additional 22 genes in which Entamoeba plus one other eukaryote are nested among bacteria and/or archaea. These genes may make good candidates for novel therapeutics, as drugs targeting these genes are less likely to impact the human host. Although we recognize the challenges of inferring intradomain transfers given systematic errors in gene trees, we find 109 genes supporting LGT from a eukaryote to Entamoeba spp., and 178 genes unique to Entamoeba spp. and one other eukaryotic taxon (i.e., presence/absence data). Inspection of these intradomain LGTs provide evidence of a common sister relationship between genes of Entamoeba (Amoebozoa) and parabasalids (Excavata). We speculate that this indicates a past close relationship (e.g., symbiosis) between ancestors of these extant lineages.
Collapse
Affiliation(s)
- Jessica R Grant
- Department of Biological Sciences, Smith College, Northampton, MA
| | - Laura A Katz
- Department of Biological Sciences, Smith College, Northampton, MA Program in Organismic and Evolutionary Biology, University of Massachusetts
| |
Collapse
|
12
|
Shifman A, Ninyo N, Gophna U, Snir S. Phylo SI: a new genome-wide approach for prokaryotic phylogeny. Nucleic Acids Res 2013; 42:2391-404. [PMID: 24243847 PMCID: PMC3936750 DOI: 10.1093/nar/gkt1138] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The evolutionary history of all life forms is usually represented as a vertical tree-like process. In prokaryotes, however, the vertical signal is partly obscured by the massive influence of horizontal gene transfer (HGT). The HGT creates widespread discordance between evolutionary histories of different genes as genomes become mosaics of gene histories. Thus, the Tree of Life (TOL) has been questioned as an appropriate representation of the evolution of prokaryotes. Nevertheless a common hypothesis is that prokaryotic evolution is primarily tree-like, and a routine effort is made to place new isolates in their appropriate location in the TOL. Moreover, it appears desirable to exploit non–tree-like evolutionary processes for the task of microbial classification. In this work, we present a novel technique that builds on the straightforward observation that gene order conservation (‘synteny’) decreases in time as a result of gene mobility. This is particularly true in prokaryotes, mainly due to HGT. Using a ‘synteny index’ (SI) that measures the average synteny between a pair of genomes, we developed the phylogenetic reconstruction tool ‘Phylo SI’. Phylo SI offers several attractive properties such as easy bootstrapping, high sensitivity in cases where phylogenetic signal is weak and computational efficiency. Phylo SI was tested both on simulated data and on two bacterial data sets and compared with two well-established phylogenetic methods. Phylo SI is particularly efficient on short evolutionary distances where synteny footprints remain detectable, whereas the nucleotide substitution signal is too weak for reliable sequence-based phylogenetic reconstruction. The method is publicly available at http://research.haifa.ac.il/ssagi/software/PhyloSI.zip.
Collapse
Affiliation(s)
- Anton Shifman
- Department of Evolutionary & Environmental Biology, University of Haifa, Haifa 31905 Israel, Department of Molecular Microbiology and Biotechnology Tel Aviv University, Tel Aviv 69978, Israel and National Evolutionary Synthesis Center, 2024 W. Main Street A200, Durham, NC 27705, USA
| | | | | | | |
Collapse
|
13
|
Romance of the three domains: how cladistics transformed the classification of cellular organisms. Protein Cell 2013; 4:664-76. [PMID: 23873078 PMCID: PMC4875529 DOI: 10.1007/s13238-013-3050-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 07/01/2013] [Indexed: 11/23/2022] Open
Abstract
Cladistics is a biological philosophy that uses genealogical relationship among species and an inferred sequence of divergence as the basis of classification. This review critically surveys the chronological development of biological classification from Aristotle through our postgenomic era with a central focus on cladistics. In 1957, Julian Huxley coined cladogenesis to denote splitting from subspeciation. In 1960, the English translation of Willi Hennig’s 1950 work, Systematic Phylogenetics, was published, which received strong opposition from pheneticists, such as numerical taxonomists Peter Sneath and Robert Sokal, and evolutionary taxonomist, Ernst Mayr, and sparked acrimonious debates in 1960–1980. In 1977–1990, Carl Woese pioneered in using small subunit rRNA gene sequences to delimitate the three domains of cellular life and established major prokaryotic phyla. Cladistics has since dominated taxonomy. Despite being compatible with modern microbiological observations, i.e. organisms with unusual phenotypes, restricted expression of characteristics and occasionally being uncultivable, increasing recognition of pervasiveness and abundance of horizontal gene transfer has challenged relevance and validity of cladistics. The mosaic nature of eukaryotic and prokaryotic genomes was also gradually discovered. In the mid-2000s, high-throughput and whole-genome sequencing became routine and complex geneologies of organisms have led to the proposal of a reticulated web of life. While genomics only indirectly leads to understanding of functional adaptations to ecological niches, computational modeling of entire organisms is underway and the gap between genomics and phenetics may soon be bridged. Controversies are not expected to settle as taxonomic classifications shall remain subjective to serve the human scientist, not the classified.
Collapse
|
14
|
[Phylogenetic application and analysis of horizontal transfer based on the prokaryote eno gene]. YI CHUAN = HEREDITAS 2012; 34:907-18. [PMID: 22805218 DOI: 10.3724/sp.j.1005.2012.00907] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The phenomenon of conflicting gene trees has become a remarkable and difficult problem. Application of multiple genes has been a widespread practice to reconstruct phylogenies in phylogenetic studies. Enolase is a key glycolytic enzyme, The enzymes from a large variety of organisms, including archaebacteria, eubacteria and eukaryotes, were studied. We downloaded eno sequences from the genomes of bacteria and archaea that have been completely sequenced. The comprehensive homology search and phylogenetic analysis of the eno were used, and nineteen horizontally transferred genes were identified. The results of analysis showed lots of differences between the features of horizontal transferred genes and the ones of whole genomic genes, such as nucleotide composition, gene combination, codon usage bias, and selection pressure. These results reconfirmed that the horizontally transferred genes were exogenous. The result revealed that prokaryote eno genes were highly conserved, medium-sized, is a good material in the phylogenetic. This paper can provide a reference in study of life habit and evolutionary history of donor and receptor, and enolase structure and function.
Collapse
|
15
|
Woodhams M, Steane DA, Jones RC, Nicolle D, Moulton V, Holland BR. Novel Distances for Dollo Data. Syst Biol 2012; 62:62-77. [DOI: 10.1093/sysbio/sys071] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Affiliation(s)
- Michael Woodhams
- School of Mathematics and Physics; 2 CRC for Forestry; 3 School of Plant Science, University of Tasmania, Private Bag 55, Hobart 7001, Australia; 4 Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia; 5 Currency Creek Arboretum, P.O. Box 808, Melrose Park, South Australia 5039, Australia; 6 School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
| | - Dorothy A. Steane
- School of Mathematics and Physics; 2 CRC for Forestry; 3 School of Plant Science, University of Tasmania, Private Bag 55, Hobart 7001, Australia; 4 Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia; 5 Currency Creek Arboretum, P.O. Box 808, Melrose Park, South Australia 5039, Australia; 6 School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
- School of Mathematics and Physics; 2 CRC for Forestry; 3 School of Plant Science, University of Tasmania, Private Bag 55, Hobart 7001, Australia; 4 Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia; 5 Currency Creek Arboretum, P.O. Box 808, Melrose Park, South Australia 5039, Australia; 6 School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
- School of Mathematics and Physics; 2 CRC for Forestry; 3 School of Plant Science, University of Tasmania, Private Bag 55, Hobart 7001, Australia; 4 Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia; 5 Currency Creek Arboretum, P.O. Box 808, Melrose Park, South Australia 5039, Australia; 6 School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
| | - Rebecca C. Jones
- School of Mathematics and Physics; 2 CRC for Forestry; 3 School of Plant Science, University of Tasmania, Private Bag 55, Hobart 7001, Australia; 4 Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia; 5 Currency Creek Arboretum, P.O. Box 808, Melrose Park, South Australia 5039, Australia; 6 School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
| | - Dean Nicolle
- School of Mathematics and Physics; 2 CRC for Forestry; 3 School of Plant Science, University of Tasmania, Private Bag 55, Hobart 7001, Australia; 4 Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia; 5 Currency Creek Arboretum, P.O. Box 808, Melrose Park, South Australia 5039, Australia; 6 School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
| | - Vincent Moulton
- School of Mathematics and Physics; 2 CRC for Forestry; 3 School of Plant Science, University of Tasmania, Private Bag 55, Hobart 7001, Australia; 4 Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia; 5 Currency Creek Arboretum, P.O. Box 808, Melrose Park, South Australia 5039, Australia; 6 School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
| | - Barbara R. Holland
- School of Mathematics and Physics; 2 CRC for Forestry; 3 School of Plant Science, University of Tasmania, Private Bag 55, Hobart 7001, Australia; 4 Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia; 5 Currency Creek Arboretum, P.O. Box 808, Melrose Park, South Australia 5039, Australia; 6 School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
| |
Collapse
|
16
|
Meinel T, Krause A. Meta-analysis of general bacterial subclades in whole-genome phylogenies using tree topology profiling. Evol Bioinform Online 2012; 8:489-525. [PMID: 22915837 PMCID: PMC3422217 DOI: 10.4137/ebo.s9642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.
Collapse
Affiliation(s)
- Thomas Meinel
- Charité-University Medicine Berlin, Institute for Physiology, Structural Bioinformatics Group, Thielallee 71, 14195 Berlin, Germany
| | | |
Collapse
|
17
|
Abstract
Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA. 1987a. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol. 4:167–181) and its extensions (Cavender, J. 1989. Mechanized derivation of linear invariants. Mol Biol Evol. 6:301–316; Nguyen T, Speed TP. 1992. A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol. 35:60–76), we explicitly enumerate all linear invariants that solely contain rooting information and derive algorithms for rooting gene trees directly from gene and genomic sequences. These new EP linear rooting invariants allow one to determine rooted trees, even in the complete absence of outgroups and gene paralogs. EP rooting invariants are explicitly derived for three taxon trees, and rules for their extension to four or more taxa are provided. The method is demonstrated using 18S ribosomal DNA to illustrate how the new animal phylogeny (Aguinaldo AMA et al. 1997. Evidence for a clade of nematodes, arthropods, and other moulting animals. Nature 387:489–493; Lake JA. 1990. Origin of the metazoa. Proc Natl Acad Sci USA 87:763–766) may be rooted directly from sequences, even when they are short and paralogs are unavailable. These results are consistent with the current root (Philippe H et al. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255–260).
Collapse
Affiliation(s)
- Janet S. Sinsheimer
- Human Genetics Department, University of California, Los Angeles
- Biomathematics Department, University of California, Los Angeles
- Biostatistics Department, University of California, Los Angeles
| | | | - James A. Lake
- Human Genetics Department, University of California, Los Angeles
- Molecular, Cell and Developmental Biology, University of California, Los Angeles
- *Corresponding author: E-mail:
| |
Collapse
|
18
|
Rosenfeld JA, DeSalle R. E value cutoff and eukaryotic genome content phylogenetics. Mol Phylogenet Evol 2012; 63:342-50. [PMID: 22306824 DOI: 10.1016/j.ympev.2012.01.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2011] [Revised: 01/02/2012] [Accepted: 01/03/2012] [Indexed: 10/14/2022]
Abstract
Genome content analysis has been used as a source of phylogenetic information in large prokaryotic tree of life studies. Recently the sequencing of many eukaryotic genomes has allowed for the similar use of genome content analysis for these organisms too. In this communication we examine the utility of genome content analysis for recovering phylogenetic patterns in several eukaryotic groups. By constructing multiple matrices using different e value cutoffs we examine the dynamics of altering the e value cutoff on five eukaryotic genome data sets. Our analysis indicates that the e value cutoff that is used as a criterion in the construction of the genome content matrix is a critical factor in both the accuracy and information content of the analysis. Strikingly, genome content by itself is not a reliable or accurate source of characters for phylogenetic analysis of the taxa in the five data sets we analyzed. We discuss two problems--small genome attraction and genome duplications as being involved in the rather poor performance of genome content data in recovering eukaryotic phylogeny.
Collapse
Affiliation(s)
- Jeffrey A Rosenfeld
- IST/High Performance and Research Computing, University of Medicine and Dentistry of New Jersey, Newark, NJ 07103, United States.
| | | |
Collapse
|
19
|
Anderson CNK, Liu L, Pearl D, Edwards SV. Tangled trees: the challenge of inferring species trees from coalescent and noncoalescent genes. Methods Mol Biol 2012; 856:3-28. [PMID: 22399453 DOI: 10.1007/978-1-61779-585-5_1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Phylogenies based on different genes can produce conflicting phylogenies; methods that resolve such ambiguities are becoming more popular, and offer a number of advantages for phylogenetic analysis. We review so-called species tree methods and the biological forces that can undermine them by violating important aspects of the underlying models. Such forces include horizontal gene transfer, gene duplication, and natural selection. We review ways of detecting loci influenced by such forces and offer suggestions for identifying or accommodating them. The way forward involves identifying outlier loci, as is done in population genetic analysis of neutral and selected loci, and removing them from further analysis, or developing more complex species tree models that can accommodate such loci.
Collapse
Affiliation(s)
- Christian N K Anderson
- Department of Organismic and Evolutionary Biology & Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | | | | | | |
Collapse
|
20
|
McInerney JO, Martin WF, Koonin EV, Allen JF, Galperin MY, Lane N, Archibald JM, Embley TM. Planctomycetes and eukaryotes: a case of analogy not homology. Bioessays 2011; 33:810-7. [PMID: 21858844 PMCID: PMC3795523 DOI: 10.1002/bies.201100045] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2011] [Revised: 07/13/2011] [Accepted: 07/15/2011] [Indexed: 11/11/2022]
Abstract
Planctomycetes, Verrucomicrobia and Chlamydia are prokaryotic phyla, sometimes grouped together as the PVC superphylum of eubacteria. Some PVC species possess interesting attributes, in particular, internal membranes that superficially resemble eukaryotic endomembranes. Some biologists now claim that PVC bacteria are nucleus-bearing prokaryotes and are considered evolutionary intermediates in the transition from prokaryote to eukaryote. PVC prokaryotes do not possess a nucleus and are not intermediates in the prokaryote-to-eukaryote transition. Here we summarise the evidence that shows why all of the PVC traits that are currently cited as evidence for aspiring eukaryoticity are either analogous (the result of convergent evolution), not homologous, to eukaryotic traits; or else they are the result of horizontal gene transfers.
Collapse
Affiliation(s)
- James O McInerney
- Department of Biology, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland.
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K. Statistics and truth in phylogenomics. Mol Biol Evol 2011; 29:457-72. [PMID: 21873298 DOI: 10.1093/molbev/msr202] [Citation(s) in RCA: 176] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Phylogenomics refers to the inference of historical relationships among species using genome-scale sequence data and to the use of phylogenetic analysis to infer protein function in multigene families. With rapidly decreasing sequencing costs, phylogenomics is becoming synonymous with evolutionary analysis of genome-scale and taxonomically densely sampled data sets. In phylogenetic inference applications, this translates into very large data sets that yield evolutionary and functional inferences with extremely small variances and high statistical confidence (P value). However, reports of highly significant P values are increasing even for contrasting phylogenetic hypotheses depending on the evolutionary model and inference method used, making it difficult to establish true relationships. We argue that the assessment of the robustness of results to biological factors, that may systematically mislead (bias) the outcomes of statistical estimation, will be a key to avoiding incorrect phylogenomic inferences. In fact, there is a need for increased emphasis on the magnitude of differences (effect sizes) in addition to the P values of the statistical test of the null hypothesis. On the other hand, the amount of sequence data available will likely always remain inadequate for some phylogenomic applications, for example, those involving episodic positive selection at individual codon positions and in specific lineages. Again, a focus on effect size and biological relevance, rather than the P value, may be warranted. Here, we present a theoretical overview and discuss practical aspects of the interplay between effect sizes, bias, and P values as it relates to the statistical inference of evolutionary truth in phylogenomics.
Collapse
Affiliation(s)
- Sudhir Kumar
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University, Arizona, USA.
| | | | | | | | | |
Collapse
|
22
|
McInerney JO, Pisani D, Bapteste E, O'Connell MJ. The Public Goods Hypothesis for the evolution of life on Earth. Biol Direct 2011; 6:41. [PMID: 21861918 PMCID: PMC3179745 DOI: 10.1186/1745-6150-6-41] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2011] [Accepted: 08/23/2011] [Indexed: 02/01/2023] Open
Abstract
It is becoming increasingly difficult to reconcile the observed extent of horizontal gene transfers with the central metaphor of a great tree uniting all evolving entities on the planet. In this manuscript we describe the Public Goods Hypothesis and show that it is appropriate in order to describe biological evolution on the planet. According to this hypothesis, nucleotide sequences (genes, promoters, exons, etc.) are simply seen as goods, passed from organism to organism through both vertical and horizontal transfer. Public goods sequences are defined by having the properties of being largely non-excludable (no organism can be effectively prevented from accessing these sequences) and non-rival (while such a sequence is being used by one organism it is also available for use by another organism). The universal nature of genetic systems ensures that such non-excludable sequences exist and non-excludability explains why we see a myriad of genes in different combinations in sequenced genomes. There are three features of the public goods hypothesis. Firstly, segments of DNA are seen as public goods, available for all organisms to integrate into their genomes. Secondly, we expect the evolution of mechanisms for DNA sharing and of defense mechanisms against DNA intrusion in genomes. Thirdly, we expect that we do not see a global tree-like pattern. Instead, we expect local tree-like patterns to emerge from the combination of a commonage of genes and vertical inheritance of genomes by cell division. Indeed, while genes are theoretically public goods, in reality, some genes are excludable, particularly, though not only, when they have variant genetic codes or behave as coalition or club goods, available for all organisms of a coalition to integrate into their genomes, and non-rival within the club. We view the Tree of Life hypothesis as a regionalized instance of the Public Goods hypothesis, just like classical mechanics and euclidean geometry are seen as regionalized instances of quantum mechanics and Riemannian geometry respectively. We argue for this change using an axiomatic approach that shows that the Public Goods hypothesis is a better accommodation of the observed data than the Tree of Life hypothesis.
Collapse
Affiliation(s)
- James O McInerney
- Molecular Evolution and Bioinformatics Unit, Department of Biology, National University of Ireland Maynooth, County Kildare, Ireland.
| | | | | | | |
Collapse
|
23
|
Kurt Lienau E, DeSalle R, Allard M, Brown EW, Swofford D, Rosenfeld JA, Sarkar IN, Planet PJ. The mega-matrix tree of life: using genome-scale horizontal gene transfer and sequence evolution data as information about the vertical history of life. Cladistics 2011; 27:417-427. [PMID: 34875790 DOI: 10.1111/j.1096-0031.2010.00337.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome-based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information available in fully sequenced genomes can recover a reasonable ToL. Consequently, we used empirically defined gene homology definitions from a previous study that delineate xenologous gene families (gene families derived from a common transfer event) to generate a massively concatenated, combined-data ToL matrix derived from 323 404 translated open reading frames arranged into 12 381 gene homologue groups coded as amino acid data and 63 336, 64 105, 65 153, 66 922 and 67 109 gene homologue groups coded as gene presence/absence data for 166 fully sequenced genomes. This whole-genome gene presence/absence and amino acid sequence ToL data matrix is composed of 4867 184 characters (a combined data-type mega-matrix). Phylogenetic analysis of this mega-matrix yielded a fully resolved ToL that classifies all three commonly accepted domains of life as monophyletic and groups most taxa in traditionally recognized locations with high support. Most importantly, these results corroborate the existence of a common evolutionary history for these taxa present in both data types that is evident only when these data are analysed in combination. © The Willi Hennig Society 2010.
Collapse
Affiliation(s)
- E Kurt Lienau
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St, New York, NY 10024, USA.,Department of Biology, Graduate School of Arts and Science, New York University, 100 Washington Square East, New York, NY 10003, USA.,Division of Microbiology, Center for Food Safety and Nutrition, Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD 20740, USA
| | - Rob DeSalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St, New York, NY 10024, USA
| | - Marc Allard
- Division of Microbiology, Center for Food Safety and Nutrition, Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD 20740, USA
| | - Eric W Brown
- Division of Microbiology, Center for Food Safety and Nutrition, Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD 20740, USA
| | - David Swofford
- Duke Institute for Genomes and Science Policy, 366 BioSci, Duke University, Durham, NC 27708, USA
| | - Jeffrey A Rosenfeld
- Department of Biology, Graduate School of Arts and Science, New York University, 100 Washington Square East, New York, NY 10003, USA
| | - Indra N Sarkar
- Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA
| | - Paul J Planet
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St, New York, NY 10024, USA.,Department of Pediatrics, Children's Hospital of New York, Columbia University, College of Physicians and Surgeons, New York, NY 10032, USA
| |
Collapse
|
24
|
Abstract
BACKGROUND Genome sequencing has revolutionized our view of the relationships among genomes, particularly in revealing the confounding effects of lateral genetic transfer (LGT). Phylogenomic techniques have been used to construct purported trees of microbial life. Although such trees are easily interpreted and allow the use of a subset of genomes as "proxies" for the full set, LGT and other phenomena impact the positioning of different groups in genome trees, confounding and potentially invalidating attempts to construct a phylogeny-based taxonomy of microorganisms. Network and graph approaches can reveal complex sets of relationships, but applying these techniques to large data sets is a significant challenge. Notwithstanding the question of what exactly it might represent, generating and interpreting a Tree or Network of All Genomes will only be feasible if current algorithms can be improved upon. RESULTS Complex relationships among even the most-similar genomes demonstrate that proxy-based approaches to simplifying large sets of genomes are not alone sufficient to solve the analysis problem. A phylogenomic analysis of 1173 sequenced bacterial and archaeal genomes generated phylogenetic trees for 159,905 distinct homologous gene sets. The relationships inferred from this set can be heavily dependent on the inclusion of other taxa: for example, phyla such as Spirochaetes, Proteobacteria and Firmicutes are recovered as cohesive groups or split depending on the presence of other specific lineages. Furthermore, named groups such as Acidithiobacillus, Coprothermobacter and Brachyspira show a multitude of affiliations that are more consistent with their ecology than with small subunit ribosomal DNA-based taxonomy. Network and graph representations can illustrate the multitude of conflicting affinities, but all methods impose constraints on the input data and create challenges of construction and interpretation. CONCLUSIONS These complex relationships highlight the need for an inclusive approach to genomic data, and current methods with minor alterations will likely scale to allow the analysis of data sets with 10,000 or more genomes. The main challenges lie in the visualization and interpretation of genomic relationships, and the redefinition of microbial taxonomy when subsets of genomic data are so evidently in conflict with one another, and with the "canonical" molecular taxonomy.
Collapse
Affiliation(s)
- Robert G Beiko
- Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 1W5 Canada.
| |
Collapse
|
25
|
Leigh JW, Lapointe FJ, Lopez P, Bapteste E. Evaluating phylogenetic congruence in the post-genomic era. Genome Biol Evol 2011; 3:571-87. [PMID: 21712432 PMCID: PMC3156567 DOI: 10.1093/gbe/evr050] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/27/2011] [Indexed: 12/04/2022] Open
Abstract
Congruence is a broadly applied notion in evolutionary biology used to justify multigene phylogeny or phylogenomics, as well as in studies of coevolution, lateral gene transfer, and as evidence for common descent. Existing methods for identifying incongruence or heterogeneity using character data were designed for data sets that are both small and expected to be rarely incongruent. At the same time, methods that assess incongruence using comparison of trees test a null hypothesis of uncorrelated tree structures, which may be inappropriate for phylogenomic studies. As such, they are ill-suited for the growing number of available genome sequences, most of which are from prokaryotes and viruses, either for phylogenomic analysis or for studies of the evolutionary forces and events that have shaped these genomes. Specifically, many existing methods scale poorly with large numbers of genes, cannot accommodate high levels of incongruence, and do not adequately model patterns of missing taxa for different markers. We propose the development of novel incongruence assessment methods suitable for the analysis of the molecular evolution of the vast majority of life and support the investigation of homogeneity of evolutionary process in cases where markers do not share identical tree structures.
Collapse
Affiliation(s)
- Jessica W Leigh
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand.
| | | | | | | |
Collapse
|
26
|
Sangaralingam A, Susko E, Bryant D, Spencer M. On the artefactual parasitic eubacteria clan in conditioned logdet phylogenies: heterotachy and ortholog identification artefacts as explanations. BMC Evol Biol 2010; 10:343. [PMID: 21062453 PMCID: PMC2992526 DOI: 10.1186/1471-2148-10-343] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2010] [Accepted: 11/09/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Phylogenetic reconstruction methods based on gene content often place all the parasitic and endosymbiotic eubacteria (parasites for short) together in a clan. Many other lines of evidence point to this parasites clan being an artefact. This artefact could be a consequence of the methods used to construct ortholog databases (due to some unknown bias), the methods used to estimate the phylogeny, or both.We test the idea that the parasites clan is an ortholog identification artefact by analyzing three different ortholog databases (COG, TRIBES, and OFAM), which were constructed using different methods, and are thus unlikely to share the same biases. In each case, we estimate a phylogeny using an improved version of the conditioned logdet distance method. If the parasites clan appears in trees from all three databases, it is unlikely to be an ortholog identification artefact.Accelerated loss of a subset of gene families in parasites (a form of heterotachy) may contribute to the difficulty of estimating a phylogeny from gene content data. We test the idea that heterotachy is the underlying reason for the estimation of an artefactual parasites clan by applying two different mixture models (phylogenetic and non-phylogenetic), in combination with conditioned logdet. In these models, there are two categories of gene families, one of which has accelerated loss in parasites. Distances are estimated separately from each category by conditioned logdet. This should reduce the tendency for tree estimation methods to group the parasites together, if heterotachy is the underlying reason for estimation of the parasites clan. RESULTS The parasites clan appears in conditioned logdet trees estimated from all three databases. This makes it less likely to be an artefact of database construction. The non-phylogenetic mixture model gives trees without a parasites clan. However, the phylogenetic mixture model still results in a tree with a parasites clan. Thus, it is not entirely clear whether heterotachy is the underlying reason for the estimation of a parasites clan. Simulation studies suggest that the phylogenetic mixture model approach may be unsuccessful because the model of gene family gain and loss it uses does not adequately describe the real data. CONCLUSIONS The most successful methods for estimating a reliable phylogenetic tree for parasitic and endosymbiotic eubacteria from gene content data are still ad-hoc approaches such as the SHOT distance method. however, the improved conditioned logdet method we developed here may be useful for non-parasites and can be accessed at http://www.liv.ac.uk/~cgrbios/cond_logdet.html.
Collapse
Affiliation(s)
- Ajanthah Sangaralingam
- Centre of Haemato-Oncology, Institute of Cancer, Bart's and the London School of Medicine (QMUL), Charterhouse Square, London EC1M 6BQ, UK
| | | | | | | |
Collapse
|
27
|
Gribaldo S, Poole AM, Daubin V, Forterre P, Brochier-Armanet C. The origin of eukaryotes and their relationship with the Archaea: are we at a phylogenomic impasse? Nat Rev Microbiol 2010; 8:743-52. [PMID: 20844558 DOI: 10.1038/nrmicro2426] [Citation(s) in RCA: 114] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The origin of eukaryotes and their evolutionary relationship with the Archaea is a major biological question and the subject of intense debate. In the context of the classical view of the universal tree of life, the Archaea and the Eukarya have a common ancestor, the nature of which remains undetermined. Alternative views propose instead that the Eukarya evolved directly from a bona fide archaeal lineage. Several recent large-scale phylogenomic studies using an array of approaches are divided in supporting either one or the other scenario, despite analysing largely overlapping data sets of universal genes. We examine the reasons for such a lack of consensus and consider how alternative approaches may enable progress in answering this fascinating and as-yet-unresolved question.
Collapse
|
28
|
En route to a genome-based classification of Archaea and Bacteria? Syst Appl Microbiol 2010; 33:175-82. [PMID: 20409658 DOI: 10.1016/j.syapm.2010.03.003] [Citation(s) in RCA: 250] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Revised: 03/10/2010] [Accepted: 03/17/2010] [Indexed: 11/23/2022]
Abstract
Given the considerable promise whole-genome sequencing offers for phylogeny and classification, it is surprising that microbial systematics and genomics have not yet been reconciled. This might be due to the intrinsic difficulties in inferring reasonable phylogenies from genomic sequences, particularly in the light of the significant amount of lateral gene transfer in prokaryotic genomes. However, recent studies indicate that the species tree and the hierarchical classification based on it are still meaningful concepts, and that state-of-the-art phylogenetic inference methods are able to provide reliable estimates of the species tree to the benefit of taxonomy. Conversely, we suspect that the current lack of completely sequenced genomes for many of the major lineages of prokaryotes and for most type strains is a major obstacle in progress towards a genome-based classification of microorganisms. We conclude that phylogeny-driven microbial genome sequencing projects such as the Genomic Encyclopaedia of Archaea and Bacteria (GEBA) project are likely to rectify this situation.
Collapse
|
29
|
Cavalier-Smith T. Origin of the cell nucleus, mitosis and sex: roles of intracellular coevolution. Biol Direct 2010; 5:7. [PMID: 20132544 PMCID: PMC2837639 DOI: 10.1186/1745-6150-5-7] [Citation(s) in RCA: 139] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2009] [Accepted: 02/04/2010] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND The transition from prokaryotes to eukaryotes was the most radical change in cell organisation since life began, with the largest ever burst of gene duplication and novelty. According to the coevolutionary theory of eukaryote origins, the fundamental innovations were the concerted origins of the endomembrane system and cytoskeleton, subsequently recruited to form the cell nucleus and coevolving mitotic apparatus, with numerous genetic eukaryotic novelties inevitable consequences of this compartmentation and novel DNA segregation mechanism. Physical and mutational mechanisms of origin of the nucleus are seldom considered beyond the long-standing assumption that it involved wrapping pre-existing endomembranes around chromatin. Discussions on the origin of sex typically overlook its association with protozoan entry into dormant walled cysts and the likely simultaneous coevolutionary, not sequential, origin of mitosis and meiosis. RESULTS I elucidate nuclear and mitotic coevolution, explaining the origins of dicer and small centromeric RNAs for positionally controlling centromeric heterochromatin, and how 27 major features of the cell nucleus evolved in four logical stages, making both mechanisms and selective advantages explicit: two initial stages (origin of 30 nm chromatin fibres, enabling DNA compaction; and firmer attachment of endomembranes to heterochromatin) protected DNA and nascent RNA from shearing by novel molecular motors mediating vesicle transport, division, and cytoplasmic motility. Then octagonal nuclear pore complexes (NPCs) arguably evolved from COPII coated vesicle proteins trapped in clumps by Ran GTPase-mediated cisternal fusion that generated the fenestrated nuclear envelope, preventing lethal complete cisternal fusion, and allowing passive protein and RNA exchange. Finally, plugging NPC lumens by an FG-nucleoporin meshwork and adopting karyopherins for nucleocytoplasmic exchange conferred compartmentation advantages. These successive changes took place in naked growing cells, probably as indirect consequences of the origin of phagotrophy. The first eukaryote had 1-2 cilia and also walled resting cysts; I outline how encystation may have promoted the origin of meiotic sex. I also explain why many alternative ideas are inadequate. CONCLUSION Nuclear pore complexes are evolutionary chimaeras of endomembrane- and mitosis-related chromatin-associated proteins. The keys to understanding eukaryogenesis are a proper phylogenetic context and understanding organelle coevolution: how innovations in one cell component caused repercussions on others.
Collapse
|
30
|
Cohen O, Pupko T. Inference and characterization of horizontally transferred gene families using stochastic mapping. Mol Biol Evol 2009; 27:703-13. [PMID: 19808865 PMCID: PMC2822287 DOI: 10.1093/molbev/msp240] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Macrogenomic events, in which genes are gained and lost, play a pivotal evolutionary role in microbial evolution. Nevertheless, probabilistic-evolutionary models describing such events and methods for their robust inference are considerably less developed than existing methodologies for analyzing site-specific sequence evolution. Here, we present a novel method for the inference of gains and losses of gene families. First, we develop probabilistic-evolutionary models describing the dynamics of gene-family content, which are more biologically realistic than previously suggested models. In our likelihood-based models, gains and losses are represented by transitions between presence and absence, given an underlying phylogeny. We employ a mixture-model approach in which we allow both the gain rate and the loss rate to vary among gene families. Second, we use these models together with the analytic implementation of stochastic mapping to infer branch-specific events. Our novel methodology allows us to infer and quantify horizontal gene transfer (HGT) events. This enables us to rank various gene families and lineages according to their propensity to undergo gains and losses. Applying our methodology to 4,873 gene families shows that: 1) the novel mixture models describe the observed variability in gene-family content among microbes significantly better than previous models; 2) The stochastic mapping approach enables accurate inference of gain and loss events based on simulations; 3) At least 34% of the gene families analyzed are inferred to have experienced HGT at least once during their evolution; and 4) Gene families that were inferred to experience HGT are both enriched and depleted with respect to specific functional categories.
Collapse
Affiliation(s)
- Ofir Cohen
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | | |
Collapse
|
31
|
Abstract
Endosymbioses have dramatically altered eukaryotic life, but are thought to have negligibly affected prokaryotic evolution. Here, by analysing the flows of protein families, I present evidence that the double-membrane, gram-negative prokaryotes were formed as the result of a symbiosis between an ancient actinobacterium and an ancient clostridium. The resulting taxon has been extraordinarily successful, and has profoundly altered the evolution of life by providing endosymbionts necessary for the emergence of eukaryotes and by generating Earth's oxygen atmosphere. Their double-membrane architecture and the observed genome flows into them suggest a common evolutionary mechanism for their origin: an endosymbiosis between a clostridium and actinobacterium.
Collapse
Affiliation(s)
- James A Lake
- Department of Molecular, Cellular and Developmental Biology, University of California, Los Angeles, California 90095, USA.
| |
Collapse
|
32
|
Cohen O, Rubinstein ND, Stern A, Gophna U, Pupko T. A likelihood framework to analyse phyletic patterns. Philos Trans R Soc Lond B Biol Sci 2009; 363:3903-11. [PMID: 18852099 DOI: 10.1098/rstb.2008.0177] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Probabilistic evolutionary models revolutionized our capability to extract biological insights from sequence data. While these models accurately describe the stochastic processes of site-specific substitutions, single-base substitutions represent only a fraction of all the events that shape genomes. Specifically, in microbes, events in which entire genes are gained (e.g. via horizontal gene transfer) and lost play a pivotal evolutionary role. In this research, we present a novel likelihood-based evolutionary model for gene gains and losses, and use it to analyse genome-wide patterns of the presence and absence of gene families. The model assumes a Markovian stochastic process, where gains and losses are represented by the transition between presence and absence, respectively, given an underlying phylogenetic tree. To account for differences in the rates of gain and loss of different gene families, we assume among-gene family rate variability, thus allowing for more accurate description of the data. Using the Bayesian approach, we estimated an evolutionary rate for each gene family. Simulation studies demonstrated that our methodology accurately infers these rates. Our methodology was applied to analyse a large corpus of data, consisting of 4873 gene families spanning 63 species and revealed novel insights regarding the evolutionary nature of genome-wide gain and loss dynamics.
Collapse
Affiliation(s)
- Ofir Cohen
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | | | | | |
Collapse
|
33
|
New methods for selective isolation of bacterial DNA from human clinical specimens. Anaerobe 2009; 16:47-53. [PMID: 19463963 DOI: 10.1016/j.anaerobe.2009.04.009] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2008] [Revised: 04/20/2009] [Accepted: 04/30/2009] [Indexed: 11/23/2022]
Abstract
Separation of bacterial DNA from human DNA in clinical samples may have an important impact on downstream applications, involving microbial diagnostic systems. We evaluated two commercially available reagents (MolYsis), Molzym GmbH & Co. KG, Bremen and Pureprove, SIRS-Lab GmbH, Jena, both Germany) for their potential to isolate and purify bacterial DNA from human DNA. We chose oral samples, which usually contain very high amounts of both human and bacterial cells. Three different DNA preparations each were made from eight caries and eight periodontal specimens using the two reagents above and a conventional DNA extraction strategy as reference. Based on target-specific real-time-quantitative PCR assays we compared the reduction of human DNA versus loss of bacterial DNA. Human DNA was monitored by targeting the beta-2-microglobulin gene, while bacteria were monitored by targeting 16S rDNA (total bacteria and Porphyromonas gingivalis) or the glycosyltransferase gene (Streptococcus mutans). We found that in most cases at least 90% of human DNA could successfully be removed, with complete removal in eight of 16 cases using MolYsis, and two (of 16) cases using Pureprove. Conversely, detection of bacterial DNA was possible in all cases with a recovery rate generally ranging from 35% to 50%. In conclusion, both strategies have the potential to reduce background interference from the host DNA which may be of remarkable value for nucleic-acid based microbial diagnostic systems.
Collapse
|
34
|
Spencer M, Sangaralingam A. A phylogenetic mixture model for gene family loss in parasitic bacteria. Mol Biol Evol 2009; 26:1901-8. [PMID: 19435739 DOI: 10.1093/molbev/msp102] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Gene families are frequently gained and lost from prokaryotic genomes. It is widely believed that the rate of loss was accelerated for some but not all gene families in lineages that became parasites or endosymbionts. This leads to a form of heterotachy that may be responsible for the poor performance of phylogeny estimation based on gene content. We describe a mixture model that accounts for this heterotachy. We show that this model fits data on the distribution of gene families across bacteria from the COG database much better than previous models. However, it still favors an artifactual tree topology in which parasites form a clade over the more plausible 16S topology. In contrast to a previous model of genome dynamics, our model suggests that the ancestral bacterium had a small genome. We suggest that models of gene family gain and loss are likely to be more useful for understanding genome dynamics than for estimating phylogenetic trees.
Collapse
Affiliation(s)
- Matthew Spencer
- School of Biological Sciences, University of Liverpool, Liverpool, UK.
| | | |
Collapse
|
35
|
Creevey CJ, McInerney JO. Trees from trees: construction of phylogenetic supertrees using clann. Methods Mol Biol 2009; 537:139-161. [PMID: 19378143 DOI: 10.1007/978-1-59745-251-9_7] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Supertree methods combine multiple phylogenetic trees to produce the overall best "supertree." They can be used to combine phylogenetic information from datasets only partially overlapping and from disparate sources (like molecular and morphological data), or to break down problems thought to be computationally intractable. Some of the longest standing phylogenetic conundrums are now being brought to light using supertree approaches. We describe the most widely used supertree methods implemented in the software program "clann" and provide a step by step tutorial for investigating phylogenetic information and reconstructing the best supertree. Clann is freely available for Windows, Mac and Unix/Linux operating systems under the GNU public licence at (http://bioinf.nuim.ie/software/clann).
Collapse
|
36
|
Beiko RG, Doolittle WF, Charlebois RL. The Impact of Reticulate Evolution on Genome Phylogeny. Syst Biol 2008; 57:844-56. [DOI: 10.1080/10635150802559265] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Affiliation(s)
- Robert G. Beiko
- Faculty of Computer Science, Dalhousie University, and Institute for Molecular Bioscience/ARC Centre for Bioinformatics
Brisbane, Australia; E-mail:
| | - W. Ford Doolittle
- Genome Atlantic, Department of Biochemistry & Molecular Biology, Dalhousie University
Halifax, Nova Scotia, Canada
| | - Robert L. Charlebois
- Genome Atlantic, Department of Biochemistry & Molecular Biology, Dalhousie University
Halifax, Nova Scotia, Canada
| |
Collapse
|
37
|
McCann A, Cotton JA, McInerney JO. The tree of genomes: an empirical comparison of genome-phylogeny reconstruction methods. BMC Evol Biol 2008; 8:312. [PMID: 19014489 PMCID: PMC2592249 DOI: 10.1186/1471-2148-8-312] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Accepted: 11/12/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the past decade or more, the emphasis for reconstructing species phylogenies has moved from the analysis of a single gene to the analysis of multiple genes and even completed genomes. The simplest method of scaling up is to use familiar analysis methods on a larger scale and this is the most popular approach. However, duplications and losses of genes along with horizontal gene transfer (HGT) can lead to a situation where there is only an indirect relationship between gene and genome phylogenies. In this study we examine five widely-used approaches and their variants to see if indeed they are more-or-less saying the same thing. In particular, we focus on Conditioned Reconstruction as it is a method that is designed to work well even if HGT is present. RESULTS We confirm a previous suggestion that this method has a systematic bias. We show that no two methods produce the same results and most current methods of inferring genome phylogenies produce results that are significantly different to other methods. CONCLUSION We conclude that genome phylogenies need to be interpreted differently, depending on the method used to construct them.
Collapse
Affiliation(s)
- Angela McCann
- Bioinformatics laboratory, Department of Biology, National University of Ireland Maynooth, Maynooth, Co, Kildare, Ireland.
| | | | | |
Collapse
|
38
|
Ding G, Yu Z, Zhao J, Wang Z, Li Y, Xing X, Wang C, Liu L, Li Y. Tree of life based on genome context networks. PLoS One 2008; 3:e3357. [PMID: 18852873 PMCID: PMC2566592 DOI: 10.1371/journal.pone.0003357] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2008] [Accepted: 09/11/2008] [Indexed: 11/18/2022] Open
Abstract
Efforts in phylogenomics have greatly improved our understanding of the backbone tree of life. However, due to the systematic error in sequence data, a sequence-based phylogenomic approach leads to well-resolved but statistically significant incongruence. Thus, independent test of current phylogenetic knowledge is required. Here, we have devised a distance-based strategy to reconstruct a highly resolved backbone tree of life, on the basis of the genome context networks of 195 fully sequenced representative species. Along with strongly supporting the monophylies of three superkingdoms and most taxonomic sub-divisions, the derived tree also suggests some intriguing results, such as high G+C gram positive origin of Bacteria, classification of Symbiobacterium thermophilum and Alcanivorax borkumensis in Firmicutes. Furthermore, simulation analyses indicate that addition of more gene relationships with high accuracy can greatly improve the resolution of the phylogenetic tree. Our results demonstrate the feasibility of the reconstruction of highly resolved phylogenetic tree with extensible gene networks across all three domains of life. This strategy also implies that the relationships between the genes (gene network) can define what kind of species it is.
Collapse
Affiliation(s)
- Guohui Ding
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
- Graduate School of the Chinese Academy of Sciences, Shanghai, People's Republic of China
| | - Zhonghao Yu
- College of Life Science & Biotechnology, Shanghai Jiao Tong University, Shanghai, People's Republic of China
| | - Jing Zhao
- College of Life Science & Biotechnology, Shanghai Jiao Tong University, Shanghai, People's Republic of China
- Shanghai Center for Bioinformation Technology, Shanghai, People's Republic of China
| | - Zhen Wang
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
- Graduate School of the Chinese Academy of Sciences, Shanghai, People's Republic of China
| | - Yun Li
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
- Graduate School of the Chinese Academy of Sciences, Shanghai, People's Republic of China
| | - Xiaobin Xing
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
- Graduate School of the Chinese Academy of Sciences, Shanghai, People's Republic of China
| | - Chuan Wang
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
| | - Lei Liu
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
- Shanghai Center for Bioinformation Technology, Shanghai, People's Republic of China
| | - Yixue Li
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
- College of Life Science & Biotechnology, Shanghai Jiao Tong University, Shanghai, People's Republic of China
- Shanghai Center for Bioinformation Technology, Shanghai, People's Republic of China
- * E-mail:
| |
Collapse
|
39
|
[Methods for the identification of horizontal gene transfer (HGT) events and progress in related fields]. YI CHUAN = HEREDITAS 2008; 30:1108-14. [PMID: 18779166 DOI: 10.3724/sp.j.1005.2008.01108] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Horizontal gene transfer is the gene exchange between different organisms or different organelles, which occurs frequently in prokaryotes. Many newly identified horizontal transfer events in eukaryotes indicates that it is a common phenomenon in all organisms. This paper describes the concept of horizontal gene transfer, the standard for judging a horizontal gene transfer events, the character, the mode, the way of horizontal gene transfer, and its impact on gene and genome evolution. The analyses of phylogenetic tree, base composition, selection pressure, intron sequence comparison, inserted special sequence, and biased nucleotide substitution are the most common methods used in previous researches. Evidence accumulated demonstrated that transposable sequences are most likely undergoing horizontal transferring. Transformation, conjugation, and transduction are the main forms of horizontal gene transfer in prokaryotes, but no clear clue was related with the mechanism of horizontal gene transfer in eukaryotes. Horizontal gene transfer plays a special role in genetic, genomic, and the biological evolution.
Collapse
|
40
|
Abstract
The increasing recognition that symbioses have greatly altered evolution through genome fusions is creating a need for algorithms that can reliably detect and reconstruct fusions. Here, we generalize the bootstrappers gambit algorithm (a quartet method) in order to permit it to analyze both bifurcations and fusions under a single mathematical model, and thereby detect past genomic branchings and endosymbioses. This new method, 3-dimensional parsimony, can be applied to aligned sequences, such as gene, indel, or other genomic presence/absence sequences. It also provides a statistical measure of support for each possible graph. The usefulness of this method is demonstrated by applying it to the ring of life.
Collapse
Affiliation(s)
- James A Lake
- Department of Molecular, Cellular, and Developmental Biology, University of California, Los Angeles, USA
| |
Collapse
|
41
|
Abstract
The availability of whole-genome data has created the extraordinary opportunity to reconstruct in fine details the 'tree of life'. The application of such comprehensive effort promises to unravel the enigmatic evolutionary relationships between prokaryotes and eukaryotes. Traditionally, biologists have represented the evolutionary relationships of all organisms by a bifurcating phylogenetic tree. But recent analyses of completely sequenced genomes using conditioned reconstruction (CR), a newly developed gene-content algorithm, suggest that a cycle graph or 'ring' rather than a 'tree' is a better representation of the evolutionary relationships between prokaryotes and eukaryotes. CR is the first phylogenetic-reconstruction method to provide precise evidence about the origin of the eukaryotes. This review summarizes how the CR analyses of complete genomes provide evidence for a fusion origin of the eukaryotes.
Collapse
Affiliation(s)
- Maria C Rivera
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Trani Center for Life Sciences, 1000 West Cary Street, P.O. Box 842030, Richmond, VA 23284-0333, USA.
| |
Collapse
|
42
|
Egel R, Penny D. On the Origin of Meiosis in Eukaryotic Evolution: Coevolution of Meiosis and Mitosis from Feeble Beginnings. RECOMBINATION AND MEIOSIS 2007. [DOI: 10.1007/7050_2007_036] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
43
|
Struck TH. Data congruence, paedomorphosis and salamanders. Front Zool 2007; 4:22. [PMID: 17974010 PMCID: PMC2234405 DOI: 10.1186/1742-9994-4-22] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2006] [Accepted: 10/31/2007] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The retention of ancestral juvenile characters by adult stages of descendants is called paedomorphosis. However, this process can mislead phylogenetic analyses based on morphological data, even in combination with molecular data, because the assessment if a character is primary absent or secondary lost is difficult. Thus, the detection of incongruence between morphological and molecular data is necessary to investigate the reliability of simultaneous analyses. Different methods have been proposed to detect data congruence or incongruence. Five of them (PABA, PBS, NDI, LILD, DRI) are used herein to assess incongruence between morphological and molecular data in a case study addressing salamander phylogeny, which comprises several supposedly paedomorphic taxa. Therefore, previously published data sets were compiled herein. Furthermore, two strategies ameliorating effects of paedomorphosis on phylogenetic studies were tested herein using a statistical rigor. Additionally, efficiency of the different methods to assess incongruence was analyzed using this empirical data set. Finally, a test statistic is presented for all these methods except DRI. RESULTS The addition of morphological data to molecular data results in both different positions of three of the four paedomorphic taxa and strong incongruence, but treating the morphological data using different strategies ameliorating the negative impact of paedomorphosis revokes these changes and minimizes the conflict. Of these strategies the strategy to just exclude paedomorphic character traits seem to be most beneficial. Of the three molecular partitions analyzed herein the RAG1 partition seems to be the most suitable to resolve deep salamander phylogeny. The rRNA and mtDNA partition are either too conserved or too variable, respectively. Of the different methods to detect incongruence, the NDI and PABA approaches are more conservative in the indication of incongruence than LILD and PBS. CONCLUSION Paedomorphosis induces strong conflicts and can mislead the phylogenetic analyses even in combined analyses. However, different strategies are efficiently minimizing these problems. Though the exploration of different methods to detect incongruence is preferable NDI and PABA are more conservative than the others and NDI is computational less extensive than PABA.
Collapse
Affiliation(s)
- Torsten H Struck
- Department of Biology/Chemistry, University of Osnabrück, Barbarastr, 11, Osnabrück, D-49076, Germany.
| |
Collapse
|
44
|
Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nat Genet 2007; 39:1361-8. [PMID: 17922013 DOI: 10.1038/ng.2007.9] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2007] [Accepted: 08/07/2007] [Indexed: 01/22/2023]
Abstract
Human segmental duplications are hotspots for nonallelic homologous recombination leading to genomic disorders, copy-number polymorphisms and gene and transcript innovations. The complex structure and history of these regions have precluded a global evolutionary analysis. Combining a modified A-Bruijn graph algorithm with comparative genome sequence data, we identify the origin of 4,692 ancestral duplication loci and use these to cluster 437 complex duplication blocks into 24 distinct groups. The sequence-divergence data between ancestral-derivative pairs and a comparison with the chimpanzee and macaque genome support a 'punctuated' model of evolution. Our analysis reveals that human segmental duplications are frequently organized around 'core' duplicons, which are enriched for transcripts and, in some cases, encode primate-specific genes undergoing positive selection. We hypothesize that the rapid expansion and fixation of some intrachromosomal segmental duplications during great-ape evolution has been due to the selective advantage conferred by these genes and transcripts embedded within these core duplications.
Collapse
|
45
|
Spencer M, Bryant D, Susko E. Conditioned genome reconstruction: how to avoid choosing the conditioning genome. Syst Biol 2007; 56:25-43. [PMID: 17366135 DOI: 10.1080/10635150601156313] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Genome phylogenies can be inferred from data on the presence and absence of genes across taxa. Logdet distances may be a good method, because they allow expected genome size to vary across the tree. Recently, Lake and Rivera proposed conditioned genome reconstruction (calculation of logdet distances using only those genes present in a conditioning genome) to deal with unobservable genes that are absent from every taxon of interest. We prove that their method can consistently estimate the topology for almost any choice of conditioning genome. Nevertheless, the choice of conditioning genome is important for small samples. For real bacterial genome data, different choices of conditioning genome can result in strong bootstrap support for different tree topologies. To overcome this problem, we developed supertree methods that combine information from all choices of conditioning genome. One of these methods, based on the BIONJ algorithm, performs well on simulated data and may have applications to other supertree problems. However, an analysis of 40 bacterial genomes using this method supports an incorrect clade of parasites. This is a common feature of model-based gene content methods and is due to parallel gene loss.
Collapse
Affiliation(s)
- Matthew Spencer
- Department of Mathematics and Statistics, Dalhousie University, Hali, Nova Scotia, B3H 3J5, Canada.
| | | | | |
Collapse
|
46
|
Gupta RS, Sneath PHA. Application of the character compatibility approach to generalized molecular sequence data: branching order of the proteobacterial subdivisions. J Mol Evol 2006; 64:90-100. [PMID: 17160641 DOI: 10.1007/s00239-006-0082-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2006] [Accepted: 08/28/2006] [Indexed: 10/23/2022]
Abstract
The character compatibility approach, which removes all homoplasic characters and involves finding the largest clique of compatible characters in a dataset, in principle, provides a powerful means for obtaining correct topology in difficult to resolve cases. However, the usefulness of this approach to generalized molecular sequence data for phylogeny determination has not been studied in the past. We have used this approach to determine the topology of 23 proteobacterial species (6 each of alpha-, beta- and gamma-, 3 delta-, and 2 epsilon-proteobacteria) using sequence data for 10 conserved proteins (Hsp60, Hsp70, EF-Tu, EF-G, alanyl-tRNA synthetase, RecA, GyrA, GyrB, RpoB and RpoC). All sites in the sequence alignments of these proteins where only two amino acids were found, with each amino acid present in at least two species, were selected. Mutual compatibility determination on these binary state sites was carried out by two means. In one case, all of these sites were combined into a large dataset (Set A; 957 characters) prior to compatibility analysis. In the second case, compatibility analysis was carried out on characters from individual proteins and all compatible sites were combined into a large dataset (Set B; 398 characters) for further studies. Upon compatibility analyses, the largest cliques that were obtained from Sets A and B consisted of 337 and 323 compatible characters, respectively. In these cliques, all proteobacterial subgroups were clearly distinguished and branching orders of most of the species were also resolved. The epsilon-proteobacteria exhibited the earliest branching, whereas the beta- and gamma-subgroups were found to have emerged last. The relative placement of the alpha- and delta-subgroups, however, was not resolved. The topology of these species was also determined based on 16S rRNA sequences and a concatenated dataset of sequences for all 10 proteins by means of neighbor-joining, maximum likelihood, and maximum parsimony methods. In the protein trees, all proteobacterial groups were reliably resolved and they branched in the following order: (epsilon(delta(alpha(beta,gamma)))). However, in the rRNA trees, the gamma- and beta-subgroups exhibited polyphyletic branching and many internal nodes were not resolved. These results indicate that the character compatibility analysis using generalized molecular sequence data provides a powerful means for evolutionary studies. Based on molecular sequences, it should be possible to obtain very large datasets of compatible characters that should prove very helpful in clarifying difficult to resolve phylogenetic relationships.
Collapse
Affiliation(s)
- Radhey S Gupta
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Canada L8N 3Z5.
| | | |
Collapse
|
47
|
Dohm JC, Vingron M, Staub E. Horizontal Gene Transfer in Aminoacyl-tRNA Synthetases Including Leucine-Specific Subtypes. J Mol Evol 2006; 63:437-47. [PMID: 16955236 DOI: 10.1007/s00239-005-0094-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2005] [Accepted: 04/19/2006] [Indexed: 10/24/2022]
Abstract
Aminoacyl-tRNA synthetases catalyze a fundamental reaction for the flow of genetic information from RNA to protein. Their presence in all organisms known today highlights their important role in the early evolution of life. We investigated the evolutionary history of aminoacyl-tRNA synthetases on the basis of sequence data from more than 200 Archaea, Bacteria, and Eukaryota. Phylogenetic profiles are in agreement with previous observations that many genes for aminoacyl-tRNA synthetases were transferred horizontally between species from all domains of life. We extended these findings by a detailed analysis of the history of leucyl-tRNA synthetases. Thereby, we identified a previously undetected case of horizontal gene transfer from Bacteria to Archaea based on phylogenetic profiles, trees, and networks. This means that, finally, the last subfamily of aminoacyl-tRNA synthetases has lost its exceptional position as the sole subfamily that is devoid of horizontal gene transfer. Furthermore, the leucyl-tRNA synthetase phylogenetic tree suggests a dichotomy of the archaeal/eukaryotic-cytosolic and bacterial/eukaryotic-mitochondrial proteins. We argue that the traditional division of life into Prokaryota (non-chimeric) and Eukaryota (chimeric) is favorable compared to Woese's trichotomy into Archaea/Bacteria/Eukaryota.
Collapse
Affiliation(s)
- Juliane C Dohm
- Max Planck Institute for Molecular Genetics, Department of Computational Molecular Biology, AG Protein Families and Cellular Evolution, Ihnestrasse 63-73, 14195, Berlin, Germany
| | | | | |
Collapse
|
48
|
Lienau EK, DeSalle R, Rosenfeld JA, Planet PJ. Reciprocal illumination in the gene content tree of life. Syst Biol 2006; 55:441-53. [PMID: 16861208 DOI: 10.1080/10635150600697416] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
Phylogenies based on gene content rely on statements of primary homology to characterize gene presence or absence. These statements (hypotheses) are usually determined by techniques based on threshold similarity or distance measurements between genes. This fundamental but problematic step can be examined by evaluating each homology hypothesis by the extent to which it is corroborated by the rest of the data. Here we test the effects of varying the stringency for making primary homology statements using a range of similarity (e-value) cutoffs in 166 fully sequenced and annotated genomes spanning the tree of life. By evaluating each resulting data set with tree-based measurements of character consistency and information content, we find a set of homology statements that optimizes overall corroboration. The resulting data set produces well-resolved and well-supported trees of life and greatly ameliorates previously noted inconsistencies such as the misclassification of small genomes. The method presented here, which can be used to test any technique for recognizing primary homology, provides an objective framework for evaluating phylogenetic hypotheses and data sets for the tree of life. It also can serve as a technique for identifying well-corroborated sets of homologous genes for functional genomic applications.
Collapse
Affiliation(s)
- E Kurt Lienau
- American Museum of Natural History, Molecular Laboratories, Central Park West at 79th Street, (P.J.P.), New York, New York 10024, USA
| | | | | | | |
Collapse
|
49
|
Hao W, Golding GB. The fate of laterally transferred genes: life in the fast lane to adaptation or death. Genome Res 2006; 16:636-43. [PMID: 16651664 PMCID: PMC1457040 DOI: 10.1101/gr.4746406] [Citation(s) in RCA: 128] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Large-scale genome arrangement plays an important role in bacterial genome evolution. A substantial number of genes can be inserted into, deleted from, or rearranged within genomes during evolution. Detecting or inferring gene insertions/deletions is of interest because such information provides insights into bacterial genome evolution and speciation. However, efficient inference of genome events is difficult because genome comparisons alone do not generally supply enough information to distinguish insertions, deletions, and other rearrangements. In this study, homologous genes from the complete genomes of 13 closely related bacteria were examined. The presence or absence of genes from each genome was cataloged, and a maximum likelihood method was used to infer insertion/deletion rates according to the phylogenetic history of the taxa. It was found that whole gene insertions/deletions in genomes occur at rates comparable to or greater than the rate of nucleotide substitution and that higher insertion/deletion rates are often inferred to be present at the tips of the phylogeny with lower rates on more ancient interior branches. Recently transferred genes are under faster and relaxed evolution compared with more ancient genes. Together, this implies that many of the lineage-specific insertions are lost quickly during evolution and that perhaps a few of the genes inserted by lateral transfer are niche specific.
Collapse
Affiliation(s)
- Weilong Hao
- Department of Biology, McMaster University, Hamilton, Ontario, Canada L8S 4K1
| | - G. Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario, Canada L8S 4K1
- Corresponding author.E-mail ; fax (905) 522-6066
| |
Collapse
|
50
|
Abstract
The eukaryotic genome is a mosaic of eubacterial and archaeal genes in addition to those unique to itself. The mosaic may have arisen as the result of two prokaryotes merging their genomes, or from genes acquired from an endosymbiont of eubacterial origin. A third possibility is that the eukaryotic genome arose from successive events of lateral gene transfer over long periods of time. This theory does not exclude the endosymbiont, but questions whether it is necessary to explain the peculiar set of eukaryotic genes. We use phylogenetic studies and reconstructions of ancestral first appearances of genes on the prokaryotic phylogeny to assess evidence for the lateral gene transfer scenario. We find that phylogenies advanced to support fusion can also arise from a succession of lateral gene transfer events. Our reconstructions of ancestral first appearances of genes reveal that the various genes that make up the eukaryotic mosaic arose at different times and in diverse lineages on the prokaryotic tree, and were not available in a single lineage. Successive events of lateral gene transfer can explain the unusual mosaic structure of the eukaryotic genome, with its content linked to the immediate adaptive value of the genes its acquired. Progress in understanding eukaryotes may come from identifying ancestral features such as the eukaryotic splicesome that could explain why this lineage invaded, or created, the eukaryotic niche.
Collapse
Affiliation(s)
- Leo Lester
- School of Animal and Microbial Sciences, The University of Reading, UK
| | | | | |
Collapse
|