1
|
Abstract
Recent studies suggested that network methods should supplant tree building as the basis of genealogical analysis. This proposition is based upon two arguments. First is the observation that bacterial and archaeal lineages experience processes oppositional to bifurcation and hence the representation of the evolutionary process in a tree like structure is illogical. Second is the argument tree building approaches are circular-you ask for a tree and you get one, which pins a verificationist label on tree building that, if correct, should be the end of phylogenetic analysis as we currently know it. In this review, we examine these questions and suggest that rumors of the death of the bacterial tree of life are exaggerated at best.
Collapse
Affiliation(s)
- Rob DeSalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA;
| | - Margaret Riley
- Department of Biology, University of Massachusetts Amherst, 116 North Pleasant Street, Amherst, MA 01003, USA
| |
Collapse
|
2
|
Oppenheim SJ, Rosenfeld JA, DeSalle R. Genome content analysis yields new insights into the relationship between the human malaria parasite Plasmodium falciparum and its anopheline vectors. BMC Genomics 2017; 18:205. [PMID: 28241792 PMCID: PMC5327517 DOI: 10.1186/s12864-017-3590-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 02/13/2017] [Indexed: 11/24/2022] Open
Abstract
Background The persistent and growing gap between the availability of sequenced genomes and the ability to assign functions to sequenced genes led us to explore ways to maximize the information content of automated annotation for studies of anopheline mosquitos. Specifically, we use genome content analysis of a large number of previously sequenced anopheline mosquitos to follow the loss and gain of protein families over the evolutionary history of this group. The importance of this endeavor lies in the potential for comparative genomic studies between Anopheles and closely related non-vector species to reveal ancestral genome content dynamics involved in vector competence. In addition, comparisons within Anopheles could identify genome content changes responsible for variation in the vectorial capacity of this family of important parasite vectors. Results The competence and capacity of P. falciparum vectors do not appear to be phylogenetically constrained within the Anophelinae. Instead, using ancestral reconstruction methods, we suggest that a previously unexamined component of vector biology, anopheline nucleotide metabolism, may contribute to the unique status of anophelines as P. falciparum vectors. While the fitness effects of nucleotide co-option by P. falciparum parasites on their anopheline hosts are not yet known, our results suggest that anopheline genome content may be responding to selection pressure from P. falciparum. Whether this response is defensive, in an attempt to redress improper nucleotide balance resulting from P. falciparum infection, or perhaps symbiotic, resulting from an as-yet-unknown mutualism between anophelines and P. falciparum, is an open question that deserves further study. Conclusions Clearly, there is a wealth of functional information to be gained from detailed manual genome annotation, yet the rapid increase in the number of available sequences means that most researchers will not have the time or resources to manually annotate all the sequence data they generate. We believe that efforts to maximize the amount of information obtained from automated annotation can help address the functional annotation deficit that most evolutionary biologists now face, and here demonstrate the value of such an approach. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3590-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sara J Oppenheim
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, 10024, USA.
| | - Jeffrey A Rosenfeld
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, 10024, USA.,Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| | - Rob DeSalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, 10024, USA
| |
Collapse
|
3
|
Rosenfeld JA, Oppenheim S, DeSalle R. A whole genome gene content phylogenetic analysis of anopheline mosquitoes. Mol Phylogenet Evol 2017; 107:266-9. [PMID: 27866013 DOI: 10.1016/j.ympev.2016.11.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Revised: 10/25/2016] [Accepted: 11/09/2016] [Indexed: 11/21/2022]
Abstract
Construction of stringent gene content matrices was accomplished for 21 Anopheline mosquito species and strains and four outgroups species. The presence absence matrix using e-75 as a cutoff in single linkage clustering had over 17,000 ortholog groups. We used the gene content matrix to generate a phylogenetic hypothesis that is in general agreement with gene sequence based phylogenies. In addition to establishing a congruent gene content phylogeny we examined the consistency of three methods for analyzing presence absence data - unweighted parsimony, dollo parsimonly and maximum likelihood using a BINGAMMA model. An examination of the chromosomal location of the gains and losses in the presence absence matrix revealed a low frequency of gains and losses at centromeres and tips of chromosomes.
Collapse
|
4
|
Rosenfeld JA, Foox J, DeSalle R. Insect genome content phylogeny and functional annotation of core insect genomes. Mol Phylogenet Evol 2016; 97:224-232. [DOI: 10.1016/j.ympev.2015.10.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Revised: 09/02/2015] [Accepted: 10/13/2015] [Indexed: 10/22/2022]
|
5
|
Hamilton CA, Hendrixson BE, Bond JE. Taxonomic revision of the tarantula genus Aphonopelma Pocock, 1901 (Araneae, Mygalomorphae, Theraphosidae) within the United States. Zookeys 2016; 560:1-340. [PMID: 27006611 PMCID: PMC4768370 DOI: 10.3897/zookeys.560.6264] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 11/24/2015] [Indexed: 11/12/2022] Open
Abstract
This systematic study documents the taxonomy, diversity, and distribution of the tarantula spider genus Aphonopelma Pocock, 1901 within the United States. By employing phylogenomic, morphological, and geospatial data, we evaluated all 55 nominal species in the United States to examine the evolutionary history of Aphonopelma and the group's taxonomy by implementing an integrative approach to species delimitation. Based on our analyses, we now recognize only 29 distinct species in the United States. We propose 33 new synonymies (Aphonopelma apacheum, Aphonopelma minchi, Aphonopelma rothi, Aphonopelma schmidti, Aphonopelma stahnkei = Aphonopelma chalcodes; Aphonopelma arnoldi = Aphonopelma armada; Aphonopelma behlei, Aphonopelma vogelae = Aphonopelma marxi; Aphonopelma breenei = Aphonopelma anax; Aphonopelma chambersi, Aphonopelma clarum, Aphonopelma cryptethum, Aphonopelma sandersoni, Aphonopelma sullivani = Aphonopelma eutylenum; Aphonopelma clarki, Aphonopelma coloradanum, Aphonopelma echinum, Aphonopelma gurleyi, Aphonopelma harlingenum, Aphonopelma odelli, Aphonopelma waconum, Aphonopelma wichitanum = Aphonopelma hentzi; Aphonopelma heterops = Aphonopelma moderatum; Aphonopelma jungi, Aphonopelma punzoi = Aphonopelma vorhiesi; Aphonopelma brunnius, Aphonopelma chamberlini, Aphonopelma iviei, Aphonopelma lithodomum, Aphonopelma smithi, Aphonopelma zionis = Aphonopelma iodius; Aphonopelma phanum, Aphonopelma reversum = Aphonopelma steindachneri), 14 new species (Aphonopelma atomicum sp. n., Aphonopelma catalina sp. n., Aphonopelma chiricahua sp. n., Aphonopelma icenoglei sp. n., Aphonopelma johnnycashi sp. n., Aphonopelma madera sp. n., Aphonopelma mareki sp. n., Aphonopelma moellendorfi sp. n., Aphonopelma parvum sp. n., Aphonopelma peloncillo sp. n., Aphonopelma prenticei sp. n., Aphonopelma saguaro sp. n., Aphonopelma superstitionense sp. n., and Aphonopelma xwalxwal sp. n.), and seven nomina dubia (Aphonopelma baergi, Aphonopelma cratium, Aphonopelma hollyi, Aphonopelma mordax, Aphonopelma radinum, Aphonopelma rusticum, Aphonopelma texense). Our proposed species tree based on Anchored Enrichment data delimits five major lineages: a monotypic group confined to California, a western group, an eastern group, a group primarily distributed in high-elevation areas, and a group that comprises several miniaturized species. Multiple species are distributed throughout two biodiversity hotspots in the United States (i.e., California Floristic Province and Madrean Pine-Oak Woodlands). Keys are provided for identification of both males and females. By conducting the most comprehensive sampling of a single theraphosid genus to date, this research significantly broadens the scope of prior molecular and morphological investigations, finally bringing a modern understanding of species delimitation in this dynamic and charismatic group of spiders.
Collapse
Affiliation(s)
- Chris A. Hamilton
- Department of Biological Sciences and Auburn University Museum of Natural History, Auburn University, Auburn, AL 36849, USA
| | | | - Jason E. Bond
- Department of Biological Sciences and Auburn University Museum of Natural History, Auburn University, Auburn, AL 36849, USA
| |
Collapse
|
6
|
Rosenfeld JA, Reeves D, Brugler MR, Narechania A, Simon S, Durrett R, Foox J, Shianna K, Schatz MC, Gandara J, Afshinnekoo E, Lam ET, Hastie AR, Chan S, Cao H, Saghbini M, Kentsis A, Planet PJ, Kholodovych V, Tessler M, Baker R, DeSalle R, Sorkin LN, Kolokotronis SO, Siddall ME, Amato G, Mason CE. Genome assembly and geospatial phylogenomics of the bed bug Cimex lectularius. Nat Commun 2016; 7:10164. [PMID: 26836631 PMCID: PMC4740774 DOI: 10.1038/ncomms10164] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 11/10/2015] [Indexed: 01/21/2023] Open
Abstract
The common bed bug (Cimex lectularius) has been a persistent pest of humans for thousands of years, yet the genetic basis of the bed bug's basic biology and adaptation to dense human environments is largely unknown. Here we report the assembly, annotation and phylogenetic mapping of the 697.9-Mb Cimex lectularius genome, with an N50 of 971 kb, using both long and short read technologies. A RNA-seq time course across all five developmental stages and male and female adults generated 36,985 coding and noncoding gene models. The most pronounced change in gene expression during the life cycle occurs after feeding on human blood and included genes from the Wolbachia endosymbiont, which shows a simultaneous and coordinated host/commensal response to haematophagous activity. These data provide a rich genetic resource for mapping activity and density of C. lectularius across human hosts and cities, which can help track, manage and control bed bug infestations.
Collapse
Affiliation(s)
- Jeffrey A Rosenfeld
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York 10024, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA.,Cancer Institute of New Jersey, Rutgers University, New Brunswick, New Jersey 08908, USA
| | - Darryl Reeves
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065, USA.,The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, New York 10065, USA.,Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York 10065, USA
| | - Mercer R Brugler
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York 10024, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA.,Biological Sciences Department, NYC College of Technology (CUNY), Brooklyn, New York 11201, USA
| | - Apurva Narechania
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York 10024, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA
| | - Sabrina Simon
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York 10024, USA.,Biosystematics, Wageningen University, Wageningen 6708 PB, The Netherlands
| | - Russell Durrett
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065, USA.,The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, New York 10065, USA
| | - Jonathan Foox
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York 10024, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA
| | - Kevin Shianna
- Illumina Inc. 5200 Illumina Way, San Diego, California 92122, USA
| | - Michael C Schatz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Jorge Gandara
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065, USA.,The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, New York 10065, USA
| | - Ebrahim Afshinnekoo
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065, USA.,The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, New York 10065, USA
| | - Ernest T Lam
- BioNanoGenomics Inc. 9640 Towne Centre Drive Ste. 100, San Diego, California 92121, USA
| | - Alex R Hastie
- BioNanoGenomics Inc. 9640 Towne Centre Drive Ste. 100, San Diego, California 92121, USA
| | - Saki Chan
- BioNanoGenomics Inc. 9640 Towne Centre Drive Ste. 100, San Diego, California 92121, USA
| | - Han Cao
- BioNanoGenomics Inc. 9640 Towne Centre Drive Ste. 100, San Diego, California 92121, USA
| | - Michael Saghbini
- BioNanoGenomics Inc. 9640 Towne Centre Drive Ste. 100, San Diego, California 92121, USA
| | - Alex Kentsis
- Molecular Pharmacology and Chemistry Program, Department of Pediatrics, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Department of Pediatrics, Memorial Sloan Kettering Cancer Center, Weill Cornell Medical College, Cornell University, New York, New York 10065, USA
| | - Paul J Planet
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York 10024, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA.,Division of Pediatric Infectious Diseases, College of Physicians and Surgeons, Columbia University, New York, New York 10032, USA
| | - Vladyslav Kholodovych
- High Performance and Research Computing Group, Rutgers Biomedical and Health Sciences, Newark, New Jersey 07103, USA
| | - Michael Tessler
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York 10024, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA
| | - Richard Baker
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York 10024, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA
| | - Rob DeSalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York 10024, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA
| | - Louis N Sorkin
- Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA
| | - Sergios-Orestis Kolokotronis
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York 10024, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA.,Department of Biological Sciences, Fordham University, Bronx, New York 10458, USA
| | - Mark E Siddall
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York 10024, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA
| | - George Amato
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York 10024, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065, USA.,The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, New York 10065, USA.,Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York 10065, USA.,The Feil Family Brain and Mind Research Institute, New York, New York 10065, USA
| |
Collapse
|
7
|
Kim KM, Nasir A, Caetano-Anollés G. The importance of using realistic evolutionary models for retrodicting proteomes. Biochimie 2014; 99:129-37. [DOI: 10.1016/j.biochi.2013.11.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 11/22/2013] [Indexed: 01/16/2023]
|
8
|
Abstract
Bacterial genomes are remarkably stable from one generation to the next but are plastic on an evolutionary time scale, substantially shaped by horizontal gene transfer, genome rearrangement, and the activities of mobile DNA elements. This implies the existence of a delicate balance between the maintenance of genome stability and the tolerance of genome instability. In this review, we describe the specialized genetic elements and the endogenous processes that contribute to genome instability. We then discuss the consequences of genome instability at the physiological level, where cells have harnessed instability to mediate phase and antigenic variation, and at the evolutionary level, where horizontal gene transfer has played an important role. Indeed, this ability to share DNA sequences has played a major part in the evolution of life on Earth. The evolutionary plasticity of bacterial genomes, coupled with the vast numbers of bacteria on the planet, substantially limits our ability to control disease.
Collapse
|
9
|
Lienau EK, Blazar JM, Wang C, Brown EW, Stones R, Musser S, Allard MW. Phylogenomic analysis identifies gene gains that define Salmonella enterica subspecies I. PLoS One 2013; 8:e76821. [PMID: 24204679 PMCID: PMC3810377 DOI: 10.1371/journal.pone.0076821] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2012] [Accepted: 09/04/2013] [Indexed: 11/29/2022] Open
Abstract
Comparative methods for analyzing whole genome sequence (WGS) data enable us to assess the genetic information available for reconstructing the evolutionary history of pathogens. We used the comparative approach to determine diagnostic genes for Salmonella enterica subspecies I. S. enterica subsp. I strains are known to infect warm-blooded organisms regularly while its close relatives tend to infect only cold-blooded organisms. We found 71 genes gained by the common ancestor of Salmonella enterica subspecies I and not subsequently lost by any member of this subspecies sequenced to date. These genes included many putative functional phenotypes. Twenty-seven of these genes are found only in Salmonella enterica subspecies I; we designed primers to test these genes for use as diagnostic sequence targets and data mined the NCBI Sequence Read Archive (SRA) database for draft genomes which carried these genes. We found that the sequence specificity and variability of these amplicons can be used to detect and discriminate among 317 different serovars and strains of Salmonella enterica subspecies I.
Collapse
Affiliation(s)
- E. Kurt Lienau
- Office of the Center Director, Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, Maryland, United States of America
- Evolution Industries LLC, Frederick, Maryland, United States of America
| | - Jeffrey M. Blazar
- Office of the Center Director, Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, Maryland, United States of America
- Department of Biology, University of Maryland, College Park, Maryland, United States of America
| | - Charles Wang
- Office of the Center Director, Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, Maryland, United States of America
| | - Eric W. Brown
- Office of the Center Director, Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, Maryland, United States of America
| | - Robert Stones
- The Food and Environment Research Agency, Sand Hutton, York, United Kingdom
| | - Steven Musser
- Office of the Center Director, Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, Maryland, United States of America
| | - Marc W. Allard
- Office of the Center Director, Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, Maryland, United States of America
| |
Collapse
|
10
|
Caetano-Anollés G, Wang M, Caetano-Anollés D. Structural phylogenomics retrodicts the origin of the genetic code and uncovers the evolutionary impact of protein flexibility. PLoS One 2013; 8:e72225. [PMID: 23991065 PMCID: PMC3749098 DOI: 10.1371/journal.pone.0072225] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Accepted: 07/07/2013] [Indexed: 11/18/2022] Open
Abstract
The genetic code shapes the genetic repository. Its origin has puzzled molecular scientists for over half a century and remains a long-standing mystery. Here we show that the origin of the genetic code is tightly coupled to the history of aminoacyl-tRNA synthetase enzymes and their interactions with tRNA. A timeline of evolutionary appearance of protein domain families derived from a structural census in hundreds of genomes reveals the early emergence of the 'operational' RNA code and the late implementation of the standard genetic code. The emergence of codon specificities and amino acid charging involved tight coevolution of aminoacyl-tRNA synthetases and tRNA structures as well as episodes of structural recruitment. Remarkably, amino acid and dipeptide compositions of single-domain proteins appearing before the standard code suggest archaic synthetases with structures homologous to catalytic domains of tyrosyl-tRNA and seryl-tRNA synthetases were capable of peptide bond formation and aminoacylation. Results reveal that genetics arose through coevolutionary interactions between polypeptides and nucleic acid cofactors as an exacting mechanism that favored flexibility and folding of the emergent proteins. These enhancements of phenotypic robustness were likely internalized into the emerging genetic system with the early rise of modern protein structure.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
- * E-mail:
| | - Minglei Wang
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| | - Derek Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| |
Collapse
|
11
|
Affiliation(s)
- Nico M. Franz
- School of Life Sciences; Arizona State University; PO Box 874501 Tempe AZ 85287-4501 USA
| |
Collapse
|
12
|
Meinel T, Krause A. Meta-analysis of general bacterial subclades in whole-genome phylogenies using tree topology profiling. Evol Bioinform Online 2012; 8:489-525. [PMID: 22915837 PMCID: PMC3422217 DOI: 10.4137/ebo.s9642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.
Collapse
Affiliation(s)
- Thomas Meinel
- Charité-University Medicine Berlin, Institute for Physiology, Structural Bioinformatics Group, Thielallee 71, 14195 Berlin, Germany
| | | |
Collapse
|
13
|
Rosenfeld JA, DeSalle R. E value cutoff and eukaryotic genome content phylogenetics. Mol Phylogenet Evol 2012; 63:342-50. [PMID: 22306824 DOI: 10.1016/j.ympev.2012.01.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2011] [Revised: 01/02/2012] [Accepted: 01/03/2012] [Indexed: 10/14/2022]
Abstract
Genome content analysis has been used as a source of phylogenetic information in large prokaryotic tree of life studies. Recently the sequencing of many eukaryotic genomes has allowed for the similar use of genome content analysis for these organisms too. In this communication we examine the utility of genome content analysis for recovering phylogenetic patterns in several eukaryotic groups. By constructing multiple matrices using different e value cutoffs we examine the dynamics of altering the e value cutoff on five eukaryotic genome data sets. Our analysis indicates that the e value cutoff that is used as a criterion in the construction of the genome content matrix is a critical factor in both the accuracy and information content of the analysis. Strikingly, genome content by itself is not a reliable or accurate source of characters for phylogenetic analysis of the taxa in the five data sets we analyzed. We discuss two problems--small genome attraction and genome duplications as being involved in the rather poor performance of genome content data in recovering eukaryotic phylogeny.
Collapse
Affiliation(s)
- Jeffrey A Rosenfeld
- IST/High Performance and Research Computing, University of Medicine and Dentistry of New Jersey, Newark, NJ 07103, United States.
| | | |
Collapse
|
14
|
Abstract
Large-scale databases are available that contain homologous gene families constructed from hundreds of complete genome sequences from across the three domains of life. Here, we discuss the approaches of increasing complexity aimed at extracting information on the pattern and process of gene family evolution from such datasets. In particular, we consider the models that invoke processes of gene birth (duplication and transfer) and death (loss) to explain the evolution of gene families. First, we review birth-and-death models of family size evolution and their implications in light of the universal features of family size distribution observed across different species and the three domains of life. Subsequently, we proceed to recent developments on models capable of more completely considering information in the sequences of homologous gene families through the probabilistic reconciliation of the phylogenetic histories of individual genes with the phylogenetic history of the genomes in which they have resided. To illustrate the methods and results presented, we use data from the HOGENOM database, demonstrating that the distribution of homologous gene family sizes in the genomes of the eukaryota, archaea, and bacteria exhibits remarkably similar shapes. We show that these distributions are best described by models of gene family size evolution, where for individual genes the death (loss) rate is larger than the birth (duplication and transfer) rate but new families are continually supplied to the genome by a process of origination. Finally, we use probabilistic reconciliation methods to take into consideration additional information from gene phylogenies, and find that, for prokaryotes, the majority of birth events are the result of transfer.
Collapse
|
15
|
Kurt Lienau E, DeSalle R, Allard M, Brown EW, Swofford D, Rosenfeld JA, Sarkar IN, Planet PJ. The mega-matrix tree of life: using genome-scale horizontal gene transfer and sequence evolution data as information about the vertical history of life. Cladistics 2011; 27:417-427. [PMID: 34875790 DOI: 10.1111/j.1096-0031.2010.00337.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome-based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information available in fully sequenced genomes can recover a reasonable ToL. Consequently, we used empirically defined gene homology definitions from a previous study that delineate xenologous gene families (gene families derived from a common transfer event) to generate a massively concatenated, combined-data ToL matrix derived from 323 404 translated open reading frames arranged into 12 381 gene homologue groups coded as amino acid data and 63 336, 64 105, 65 153, 66 922 and 67 109 gene homologue groups coded as gene presence/absence data for 166 fully sequenced genomes. This whole-genome gene presence/absence and amino acid sequence ToL data matrix is composed of 4867 184 characters (a combined data-type mega-matrix). Phylogenetic analysis of this mega-matrix yielded a fully resolved ToL that classifies all three commonly accepted domains of life as monophyletic and groups most taxa in traditionally recognized locations with high support. Most importantly, these results corroborate the existence of a common evolutionary history for these taxa present in both data types that is evident only when these data are analysed in combination. © The Willi Hennig Society 2010.
Collapse
Affiliation(s)
- E Kurt Lienau
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St, New York, NY 10024, USA.,Department of Biology, Graduate School of Arts and Science, New York University, 100 Washington Square East, New York, NY 10003, USA.,Division of Microbiology, Center for Food Safety and Nutrition, Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD 20740, USA
| | - Rob DeSalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St, New York, NY 10024, USA
| | - Marc Allard
- Division of Microbiology, Center for Food Safety and Nutrition, Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD 20740, USA
| | - Eric W Brown
- Division of Microbiology, Center for Food Safety and Nutrition, Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD 20740, USA
| | - David Swofford
- Duke Institute for Genomes and Science Policy, 366 BioSci, Duke University, Durham, NC 27708, USA
| | - Jeffrey A Rosenfeld
- Department of Biology, Graduate School of Arts and Science, New York University, 100 Washington Square East, New York, NY 10003, USA
| | - Indra N Sarkar
- Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA
| | - Paul J Planet
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St, New York, NY 10024, USA.,Department of Pediatrics, Children's Hospital of New York, Columbia University, College of Physicians and Surgeons, New York, NY 10032, USA
| |
Collapse
|
16
|
Vishnoi A, Roy R, Prasad HK, Bhattacharya A. Anchor-based whole genome phylogeny (ABWGP): a tool for inferring evolutionary relationship among closely related microorganisms [corrected]. PLoS One 2010; 5:e14159. [PMID: 21152403 PMCID: PMC2994773 DOI: 10.1371/journal.pone.0014159] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2010] [Accepted: 10/21/2010] [Indexed: 12/03/2022] Open
Abstract
Phenotypic behavior of a group of organisms can be studied using a range of molecular evolutionary tools that help to determine evolutionary relationships. Traditionally a gene or a set of gene sequences was used for generating phylogenetic trees. Incomplete evolutionary information in few selected genes causes problems in phylogenetic tree construction. Whole genomes are used as remedy. Now, the task is to identify the suitable parameters to extract the hidden information from whole genome sequences that truly represent evolutionary information. In this study we explored a random anchor (a stretch of 100 nucleotides) based approach (ABWGP) for finding distance between any two genomes, and used the distance estimates to compute evolutionary trees. A number of strains and species of Mycobacteria were used for this study. Anchor-derived parameters, such as cumulative normalized score, anchor order and indels were computed in a pair-wise manner, and the scores were used to compute distance/phylogenetic trees. The strength of branching was determined by bootstrap analysis. The terminal branches are clearly discernable using the distance estimates described here. In general, different measures gave similar trees except the trees based on indels. Overall the tree topology reflected the known biology of the organisms. This was also true for different strains of Escherichia coli. A new whole genome-based approach has been described here for studying evolutionary relationships among bacterial strains and species.
Collapse
Affiliation(s)
- Anchal Vishnoi
- School of Information Technology, Center for Computational Biology and Bioinformatics, Jawaharlal Nehru University, New Delhi, India
| | - Rahul Roy
- Indian Statistical Institute, New Delhi, India
| | - Hanumanthappa K. Prasad
- Department of Biotechnology, All India Institute of Medical Sciences (AIIMS), New Delhi, India
| | - Alok Bhattacharya
- School of Information Technology, Center for Computational Biology and Bioinformatics, Jawaharlal Nehru University, New Delhi, India
- School of Life Sciences, Jawaharlal Nehru University, New Delhi, India
| |
Collapse
|
17
|
Abstract
BACKGROUND Taxonomy is the biological discipline that identifies, describes, classifies and names extant and extinct species and other taxa. Nowadays, species taxonomy is confronted with the challenge to fully incorporate new theory, methods and data from disciplines that study the origin, limits and evolution of species. RESULTS Integrative taxonomy has been proposed as a framework to bring together these conceptual and methodological developments. Here we review perspectives for an integrative taxonomy that directly bear on what species are, how they can be discovered, and how much diversity is on Earth. CONCLUSIONS We conclude that taxonomy needs to be pluralistic to improve species discovery and description, and to develop novel protocols to produce the much-needed inventory of life in a reasonable time. To cope with the large number of candidate species revealed by molecular studies of eukaryotes, we propose a classification scheme for those units that will facilitate the subsequent assembly of data sets for the formal description of new species under the Linnaean system, and will ultimately integrate the activities of taxonomists and molecular biologists.
Collapse
Affiliation(s)
- José M Padial
- Department of Evolution Genomics and Systematics, Evolutionary Biology Centre (EBC), Uppsala University, Norbyvägen 18D, Uppsala 75236, Sweden
| | - Aurélien Miralles
- Department of Evolutionary Biology, Zoological Institute, Technical University of Braunschweig, Spielmannstrasse 8, 38106 Braunschweig, Germany
| | - Ignacio De la Riva
- Department of Biodiversity and Evolutionary Biology, Museo Nacional de Ciencias Naturales, CSIC, C/José Gutiérrez Abascal 2, Madrid 28006, Spain
| | - Miguel Vences
- Department of Evolutionary Biology, Zoological Institute, Technical University of Braunschweig, Spielmannstrasse 8, 38106 Braunschweig, Germany
| |
Collapse
|
18
|
Abstract
BACKGROUND Taxonomy is the biological discipline that identifies, describes, classifies and names extant and extinct species and other taxa. Nowadays, species taxonomy is confronted with the challenge to fully incorporate new theory, methods and data from disciplines that study the origin, limits and evolution of species. RESULTS Integrative taxonomy has been proposed as a framework to bring together these conceptual and methodological developments. Here we review perspectives for an integrative taxonomy that directly bear on what species are, how they can be discovered, and how much diversity is on Earth. CONCLUSIONS We conclude that taxonomy needs to be pluralistic to improve species discovery and description, and to develop novel protocols to produce the much-needed inventory of life in a reasonable time. To cope with the large number of candidate species revealed by molecular studies of eukaryotes, we propose a classification scheme for those units that will facilitate the subsequent assembly of data sets for the formal description of new species under the Linnaean system, and will ultimately integrate the activities of taxonomists and molecular biologists.
Collapse
|
19
|
|
20
|
Abstract
We examine three critical aspects of Popper's formulation of the 'Logic of Scientific Discovery'--evidence, content and degree of corroboration--and place these concepts in the context of the Tree of Life (ToL) problem with particular reference to molecular systematics. Content, in the sense discussed by Popper, refers to the breadth and scope of existence that a hypothesis purports to explain. Content, in conjunction with the amount of available and relevant evidence, determines the testability, or potential degree of corroboration, of a statement; content distinguishes scientific hypotheses from metaphysical assertions. Degree of corroboration refers to the relative and tentative confidence assigned to one hypothesis over another, based upon the performance of each under critical tests. Here we suggest that systematists attempt to maximize content and evidence to increase the potential degree of corroboration in all phylogenetic endeavors. Discussion of this "total evidence" approach leads to several interesting conclusions about generating ToL hypotheses.
Collapse
Affiliation(s)
- E Kurt Lienau
- Department of Biology, New York University, New York, NY 10003, USA.
| | | |
Collapse
|
21
|
Abstract
OrthologID (http://nypg.bio.nyu.edu/orthologid/) allows for the rapid and accurate identification of gene orthology within a character-based phylogenetic framework. The Web application has two functions - an orthologous group search and a query orthology classification. The former determines orthologous gene sets for complete genomes and identifies diagnostic characters that define each orthologous gene set; and the latter allows for the classification of unknown query sequences to orthology groups. The first module of the Web application, the gene family generator, uses an E-value based approach to sort genes into gene families. An alignment constructor then aligns members of gene families and the resulting gene family alignments are submitted to the tree builder to obtain gene family guide trees. Finally, the diagnostics generator extracts diagnostic characters from guide trees and these diagnostics are used to determine gene orthology for query sequences.
Collapse
Affiliation(s)
- Mary Egan
- Department of Biology, Montclair State University, Montclair, NJ, USA
| | | | | | | | | |
Collapse
|
22
|
|
23
|
Abstract
Willi Hennig conceptualized the systematic character as an inherited transformation event whereas he operationalized it in terms of the similarity of objects. A causal evolutionary explanation underlies his conceptualization; however, the operational definition is denied that explication because similarity is an abstraction-the similarity of objects is a function of their intensionally defined properties. These opposing treatments of evidence involve other problems that affect the coherence of systematics. For example, the uses of similarity, and the taxonomic relationships it defines, lead to category errors when the particulars of evolution are inferred. Not only is there no requirement in evolutionary theory that similarity relations be explained as homologs, reification occurs when the similarity defined class or set of organisms is claimed to be a part of lineage system history. Also worth noting, it is the transcendentalism put forward in the early nineteenth century by Étienne Geoffroy Saint-Hilaire, and not the evolutionary theory of Charles Darwin, that provides the theoretical justification for using object similarities in comparative biology. All of these problems are removed by considering the transformation event, instead of the object, as the thing being explained in the operational definition of character. Ostension or description of objects by extension are consistent with evolutionary theory, and may be used in phylogenetic practice, as when delimiting the accompanying states of a particular transformation series event. In addition to a coherent evolutionary epistemology being achieved with this operational redefinition of character, event hypotheses as evidence provide severe and critical objective tests of phylogenetic relationships. With this redefinition, morphology is expected to retake its place among the most rigorous theoretical sciences. This reconsideration of the systematic character, the elimination of all prescriptive references to object similarity and its accompanying transcendentalism, is also seen as a significant step towards completing the neo-Darwinian synthesis of evolution.
Collapse
|
24
|
Abstract
Darwin claimed that a unique inclusively hierarchical pattern of relationships between all organisms based on their similarities and differences [the Tree of Life (TOL)] was a fact of nature, for which evolution, and in particular a branching process of descent with modification, was the explanation. However, there is no independent evidence that the natural order is an inclusive hierarchy, and incorporation of prokaryotes into the TOL is especially problematic. The only data sets from which we might construct a universal hierarchy including prokaryotes, the sequences of genes, often disagree and can seldom be proven to agree. Hierarchical structure can always be imposed on or extracted from such data sets by algorithms designed to do so, but at its base the universal TOL rests on an unproven assumption about pattern that, given what we know about process, is unlikely to be broadly true. This is not to say that similarities and differences between organisms are not to be accounted for by evolutionary mechanisms, but descent with modification is only one of these mechanisms, and a single tree-like pattern is not the necessary (or expected) result of their collective operation. Pattern pluralism (the recognition that different evolutionary models and representations of relationships will be appropriate, and true, for different taxa or at different scales or for different purposes) is an attractive alternative to the quixotic pursuit of a single true TOL.
Collapse
Affiliation(s)
- W Ford Doolittle
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada B3H 1X5.
| | | |
Collapse
|