51
|
Catanach TA, Sweet AD, Nguyen NPD, Peery RM, Debevec AH, Thomer AK, Owings AC, Boyd BM, Katz AD, Soto-Adames FN, Allen JM. Fully automated sequence alignment methods are comparable to, and much faster than, traditional methods in large data sets: an example with hepatitis B virus. PeerJ 2019; 7:e6142. [PMID: 30627489 PMCID: PMC6321758 DOI: 10.7717/peerj.6142] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 11/14/2018] [Indexed: 01/05/2023] Open
Abstract
Aligning sequences for phylogenetic analysis (multiple sequence alignment; MSA) is an important, but increasingly computationally expensive step with the recent surge in DNA sequence data. Much of this sequence data is publicly available, but can be extremely fragmentary (i.e., a combination of full genomes and genomic fragments), which can compound the computational issues related to MSA. Traditionally, alignments are produced with automated algorithms and then checked and/or corrected "by eye" prior to phylogenetic inference. However, this manual curation is inefficient at the data scales required of modern phylogenetics and results in alignments that are not reproducible. Recently, methods have been developed for fully automating alignments of large data sets, but it is unclear if these methods produce alignments that result in compatible phylogenies when compared to more traditional alignment approaches that combined automated and manual methods. Here we use approximately 33,000 publicly available sequences from the hepatitis B virus (HBV), a globally distributed and rapidly evolving virus, to compare different alignment approaches. Using one data set comprised exclusively of whole genomes and a second that also included sequence fragments, we compared three MSA methods: (1) a purely automated approach using traditional software, (2) an automated approach including by eye manual editing, and (3) more recent fully automated approaches. To understand how these methods affect phylogenetic results, we compared resulting tree topologies based on these different alignment methods using multiple metrics. We further determined if the monophyly of existing HBV genotypes was supported in phylogenies estimated from each alignment type and under different statistical support thresholds. Traditional and fully automated alignments produced similar HBV phylogenies. Although there was variability between branch support thresholds, allowing lower support thresholds tended to result in more differences among trees. Therefore, differences between the trees could be best explained by phylogenetic uncertainty unrelated to the MSA method used. Nevertheless, automated alignment approaches did not require human intervention and were therefore considerably less time-intensive than traditional approaches. Because of this, we conclude that fully automated algorithms for MSA are fully compatible with older methods even in extremely difficult to align data sets. Additionally, we found that most HBV diagnostic genotypes did not correspond to evolutionarily-sound groups, regardless of alignment type and support threshold. This suggests there may be errors in genotype classification in the database or that HBV genotypes may need a revision.
Collapse
Affiliation(s)
- Therese A. Catanach
- Ornithology Department, Academy of Natural Sciences of Drexel University, Philadelphia, PA, United States of America
- Illinois Natural History Survey, University of Illinois at Urbana-Champaign, Champaign, IL, United States of America
- Department of Wildlife and Fisheries Sciences, Texas A&M University, College Station, TX, United States of America
| | - Andrew D. Sweet
- Illinois Natural History Survey, University of Illinois at Urbana-Champaign, Champaign, IL, United States of America
- Department of Entomology, Purdue University, West Lafayette, IN, United States of America
| | - Nam-phuong D. Nguyen
- Computer Science and Engineering, University of San Diego, California, La Jolla, CA, United States of America
| | - Rhiannon M. Peery
- Department of Biology, University of Alberta, Edmonton, Alberta, Canada
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Champaign, IL, United States of America
| | - Andrew H. Debevec
- School of Integrative Biology, University of Illinois at Urbana-Champaign, Champaign, IL, United States of America
| | - Andrea K. Thomer
- School of Information, University of Michigan—Ann Arbor, Ann Arbor, MI, United States of America
| | - Amanda C. Owings
- Program in Ecology, Evolution, and Conservation Biology, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Bret M. Boyd
- Illinois Natural History Survey, University of Illinois at Urbana-Champaign, Champaign, IL, United States of America
- Department of Entomology, University of Georga, Athens, GA, United States of America
| | - Aron D. Katz
- Illinois Natural History Survey, University of Illinois at Urbana-Champaign, Champaign, IL, United States of America
- Department of Entomology, University of Illinois at Urbana-Champaign, Champaign, IL, United States of America
| | - Felipe N. Soto-Adames
- Florida State Collection of Arthropods, Florida Department of Agriculture and Consumer Services, Gainesville, FL, United States of America
- Department of Entomology and Nematology, University of Florida, Gainesville, FL, United States of America
| | - Julie M. Allen
- Biology Department, University of Nevada, Reno, Reno, NV, United States of America
| |
Collapse
|
52
|
Puigbò P, Wolf YI, Koonin EV. Genome-Wide Comparative Analysis of Phylogenetic Trees: The Prokaryotic Forest of Life. Methods Mol Biol 2019; 1910:241-269. [PMID: 31278667 DOI: 10.1007/978-1-4939-9074-0_8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the boot-split distance (BSD) method is introduced as an extension of the previously developed split distance (SD) method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting treelike and netlike evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the applications methods used to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a "species tree."
Collapse
Affiliation(s)
- Pere Puigbò
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.,Division of Genetics and Physiology, Department of Biology, University of Turku, Turku, Finland
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
53
|
Baele G, Ayres DL, Rambaut A, Suchard MA, Lemey P. High-Performance Computing in Bayesian Phylogenetics and Phylodynamics Using BEAGLE. Methods Mol Biol 2019; 1910:691-722. [PMID: 31278682 DOI: 10.1007/978-1-4939-9074-0_23] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
In this chapter, we focus on the computational challenges associated with statistical phylogenomics and how use of the broad-platform evolutionary analysis general likelihood evaluator (BEAGLE), a high-performance library for likelihood computation, can help to substantially reduce computation time in phylogenomic and phylodynamic analyses. We discuss computational improvements brought about by the BEAGLE library on a variety of state-of-the-art multicore hardware, and for a range of commonly used evolutionary models. For data sets of varying dimensions, we specifically focus on comparing performance in the Bayesian evolutionary analysis by sampling trees (BEAST) software between multicore central processing units (CPUs) and a wide range of graphics processing cards (GPUs). We put special emphasis on computational benchmarks from the field of phylodynamics, which combines the challenges of phylogenomics with those of modelling trait data associated with the observed sequence data. In conclusion, we show that for increasingly large molecular sequence data sets, GPUs can offer tremendous computational advancements through the use of the BEAGLE library, which is available for software packages for both Bayesian inference and maximum-likelihood frameworks.
Collapse
Affiliation(s)
- Guy Baele
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium.
| | - Daniel L Ayres
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| | - Marc A Suchard
- Department of Human Genetics and Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Philippe Lemey
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium
| |
Collapse
|
54
|
Zhao Z, Hu J, Chen S, Luo Z, Luo D, Wen J, Tu T, Zhang D. Evolution of CYCLOIDEA-like genes in Fabales: Insights into duplication patterns and the control of floral symmetry. Mol Phylogenet Evol 2018; 132:81-89. [PMID: 30508631 DOI: 10.1016/j.ympev.2018.11.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Revised: 11/08/2018] [Accepted: 11/15/2018] [Indexed: 11/29/2022]
Abstract
Cycloidea-like (CYC-like) genes are the key regulatory factors in the development of flower symmetry. Duplication and/or reduction of CYC-like genes have occurred several times in various angiosperm groups and are hypothesized to be correlated with the evolution of flower symmetry, which in turn has contributed to the evolutionary success of these groups. However, less is known about the evolutionary scenario of CYC-like genes in the whole Fabales, which contains four families with either symmetric or actinomorphic flowers. Here we investigated the evolution of CYC-like genes in all the four families of Fabales and recovered one to nine CYC-like genes (CYC1, CYC2, and CYC3) depending on which lineages, but the CYC3 genes were most likely lost in the ancestor of Leguminosae. Phylogenetic analysis suggested that the CYC-like genes could have undergone multiple duplications and losses in different plant lineages and formed distinct paralogous/orthologous clades. The ancestor of the Papilionoideae and Caesalpinioideae may possess two paralogs of CYC1 genes but one of them was subsequently lost in Papilionoideae and was retained only in several species of Caesalpinioideae. CYC2 genes were more frequently duplicated in Papilionoideae than in other legumes. We propose that the diversification patterns of both CYC1 and CYC2 genes are not related to the floral symmetry in non-papilionoid Fabales groups, however, gene duplication and functional divergence of CYC2 are essential for the floral zygomorphy of Papilionoideae. This is the first systematic analysis of the CYC-like genes in Fabales and could form the basis for further study of molecular mechanisms controlling floral symmetry in non-model plants of Fabales.
Collapse
Affiliation(s)
- Zhongtao Zhao
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
| | - Jin Hu
- Guangdong Eco-engineering Polytechnic, Guangzhou 510520, China
| | - Shi Chen
- Beneficial Insects Institute, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Zhonglai Luo
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
| | - Da Luo
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin 150081, China
| | - Jun Wen
- Department of Botany, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Tieyao Tu
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China.
| | - Dianxiang Zhang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
| |
Collapse
|
55
|
Prabh N, Roeseler W, Witte H, Eberhardt G, Sommer RJ, Rödelsperger C. Deep taxon sampling reveals the evolutionary dynamics of novel gene families in Pristionchus nematodes. Genome Res 2018; 28:1664-1674. [PMID: 30232197 PMCID: PMC6211646 DOI: 10.1101/gr.234971.118] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 09/05/2018] [Indexed: 01/20/2023]
Abstract
The widespread identification of genes without detectable homology in related taxa is a hallmark of genome sequencing projects in animals, together with the abundance of gene duplications. Such genes have been called novel, young, taxon-restricted, or orphans, but little is known about the mechanisms accounting for their origin, age, and mode of evolution. Phylogenomic studies relying on deep and systematic taxon sampling and using the comparative method can provide insight into the evolutionary dynamics acting on novel genes. We used a phylogenomic approach for the nematode model organism Pristionchus pacificus and sequenced six additional Pristionchus and two outgroup species. This resulted in 10 genomes with a ladder-like phylogeny, sequenced in one laboratory using the same platform and analyzed by the same bioinformatic procedures. Our analysis revealed that 68%-81% of genes are assignable to orthologous gene families, the majority of which defined nine age classes with presence/absence patterns that can be explained by single evolutionary events. Contrasting different age classes, we find that older age classes are concentrated at chromosome centers, whereas novel gene families preferentially arise at the periphery, are weakly expressed, evolve rapidly, and have a high propensity of being lost. Over time, they increase in expression and become more constrained. Thus, the detailed phylogenetic resolution allowed a comprehensive characterization of the evolutionary dynamics of Pristionchus genomes indicating that distribution of age classes and their associated differences shape chromosomal divergence. This study establishes the Pristionchus system for future research on the mechanisms that drive the formation of novel genes.
Collapse
Affiliation(s)
- Neel Prabh
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
| | - Waltraud Roeseler
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
| | - Hanh Witte
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
| | - Gabi Eberhardt
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
| | - Ralf J Sommer
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
| | - Christian Rödelsperger
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
| |
Collapse
|
56
|
Pizarro D, Divakar PK, Grewe F, Leavitt SD, Huang JP, Dal Grande F, Schmitt I, Wedin M, Crespo A, Lumbsch HT. Phylogenomic analysis of 2556 single-copy protein-coding genes resolves most evolutionary relationships for the major clades in the most diverse group of lichen-forming fungi. FUNGAL DIVERS 2018. [DOI: 10.1007/s13225-018-0407-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
57
|
Yu X, Yang D, Guo C, Gao L. Plant phylogenomics based on genome-partitioning strategies: Progress and prospects. PLANT DIVERSITY 2018; 40:158-164. [PMID: 30740560 PMCID: PMC6137260 DOI: 10.1016/j.pld.2018.06.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 06/26/2018] [Accepted: 06/27/2018] [Indexed: 05/26/2023]
Abstract
The rapid expansion of next-generation sequencing (NGS) has generated a powerful array of approaches to address fundamental questions in biology. Several genome-partitioning strategies to sequence selected subsets of the genome have emerged in the fields of phylogenomics and evolutionary genomics. In this review, we summarize the applications, advantages and limitations of four NGS-based genome-partitioning approaches in plant phylogenomics: genome skimming, transcriptome sequencing (RNA-seq), restriction site associated DNA sequencing (RAD-Seq), and targeted capture (Hyb-seq). Of these four genome-partitioning approaches, targeted capture (especially Hyb-seq) shows the greatest promise for plant phylogenetics over the next few years. This review will aid researchers in their selection of appropriate genome-partitioning approaches to address questions of evolutionary scale, where we anticipate continued development and expansion of whole-genome sequencing strategies in the fields of plant phylogenomics and evolutionary biology research.
Collapse
Affiliation(s)
- Xiangqin Yu
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | - Dan Yang
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | - Cen Guo
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| | - Lianming Gao
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China
| |
Collapse
|
58
|
Griesmann M, Chang Y, Liu X, Song Y, Haberer G, Crook MB, Billault-Penneteau B, Lauressergues D, Keller J, Imanishi L, Roswanjaya YP, Kohlen W, Pujic P, Battenberg K, Alloisio N, Liang Y, Hilhorst H, Salgado MG, Hocher V, Gherbi H, Svistoonoff S, Doyle JJ, He S, Xu Y, Xu S, Qu J, Gao Q, Fang X, Fu Y, Normand P, Berry AM, Wall LG, Ané JM, Pawlowski K, Xu X, Yang H, Spannagl M, Mayer KFX, Wong GKS, Parniske M, Delaux PM, Cheng S. Phylogenomics reveals multiple losses of nitrogen-fixing root nodule symbiosis. Science 2018; 361:science.aat1743. [DOI: 10.1126/science.aat1743] [Citation(s) in RCA: 198] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 05/16/2018] [Indexed: 12/20/2022]
|
59
|
Ma S, Wu Q, Hu Y, Wei F. Patterns and effects of GC3 heterogeneity and parsimony informative sites on the phylogenetic tree of genes. Gene 2018; 655:56-60. [DOI: 10.1016/j.gene.2018.02.037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 01/27/2018] [Accepted: 02/12/2018] [Indexed: 11/29/2022]
|
60
|
Shpynov SN, Fournier PE, Pozdnichenko NN, Gumenuk AS, Skiba AA. New approaches in the systematics of rickettsiae. New Microbes New Infect 2018; 23:93-102. [PMID: 29692912 PMCID: PMC5913362 DOI: 10.1016/j.nmni.2018.02.012] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Revised: 01/27/2018] [Accepted: 02/07/2018] [Indexed: 11/24/2022] Open
Abstract
The development of a formal order analysis (FOA) allowed constructing a classification of 49 genomes of Rickettsiaceae family representatives. Recently FOA has been extended with new tools—‘Map of genes,’ ‘Matrix of similarity’ and ‘Locality-sensitive hashing’—for a more in-depth study of the structure of rickettsial genomes. The new classification confirmed and supplemented the previously constructed one by determining the position of Rickettsia africae str. ESF-5, R. heilongjiangensis 054, R. monacensis str. IrR/Munich, R. montanensis str. OSU 85-930, R. raoultii str. Khabarovsk, R. rhipicephali str. 3-7-female6-CWPP and Rickettsiales bacterium str. Ac37b. The ‘Map of genes’ demonstrated the complete genomes and their components in a graphical form. The ‘Matrix of similarity’ was applied for an in-depth classification to a subtaxonomic category of the strain within the species R. rickettsii (11 strains) and R. prowazekii (ten strains). The ‘Matrix of similarity’ determines the degree of homology of complete genomes by pairwise comparison of their components and identification of those being identical and similar in the arrangement of nucleotides. A new genomosystematics approach is proposed for the study of complete genomes and their components through the development and application of FOA tools. Its applications include the development of principles for the classification of microorganisms, based on the analysis of complete genomes and their annotations. This approach may help in the taxonomic classification and characterization of some Candidatus Rickettsia spp. that are found in large numbers in arthropods worldwide.
Collapse
Affiliation(s)
| | - P-E Fournier
- UMR VITROME, Aix-Marseille Université, IRD, Service de Santé des Armées, Assistance Publique-Hôpitaux de Marseille, Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, France
| | | | | | | |
Collapse
|
61
|
UBCG: Up-to-date bacterial core gene set and pipeline for phylogenomic tree reconstruction. J Microbiol 2018; 56:280-285. [PMID: 29492869 DOI: 10.1007/s12275-018-8014-6] [Citation(s) in RCA: 999] [Impact Index Per Article: 142.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 01/27/2018] [Accepted: 01/28/2018] [Indexed: 10/17/2022]
Abstract
Genome-based phylogeny plays a central role in the future taxonomy and phylogenetics of Bacteria and Archaea by replacing 16S rRNA gene phylogeny. The concatenated core gene alignments are frequently used for such a purpose. The bacterial core genes are defined as single-copy, homologous genes that are present in most of the known bacterial species. There have been several studies describing such a gene set, but the number of species considered was rather small. Here we present the up-to-date bacterial core gene set, named UBCG, and software suites to accommodate necessary steps to generate and evaluate phylogenetic trees. The method was successfully used to infer phylogenomic relationship of Escherichia and related taxa and can be used for the set of genomes at any taxonomic ranks of Bacteria. The UBCG pipeline and file viewer are freely available at https://www.ezbiocloud.net/tools/ubcg and https://www.ezbiocloud.net/tools/ubcg_viewer , respectively.
Collapse
|
62
|
Allaby RG, Woodwark M. Phylogenomic Analysis Reveals Extensive Phylogenetic Mosaicism in the Human GPCR Superfamily. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
A novel high throughput phylogenomic analysis (HTP) was applied to the rhodopsin G-protein coupled receptor (GPCR) family. Instances of phylogenetic mosaicism between receptors were found to be frequent, often as instances of correlated mosaicism and repeated mosaicism. A null data set was constructed with the same phylogenetic topology as the rhodopsin GPCRs. Comparison of the two data sets revealed that mosaicism was found in GPCRs in a higher frequency than would be expected by homoplasy or the effects of topology alone. Various evolutionary models of differential conservation, recombination and homoplasy are explored which could result in the patterns observed in this analysis. We find that the results are most consistent with frequent recombination events. A complex evolutionary history is illustrated in which it is likely frequent recombination has endowed GPCRs with new functions. The pattern of mosaicism is shown to be informative for functional prediction for orphan receptors. HTP analysis is complementary to conventional phylogenomic analyses revealing mosaicism that would not otherwise have been detectable through conventional phylogenetics.
Collapse
Affiliation(s)
- Robin G. Allaby
- Warwick HRI, University of Warwick, Wellesbourne, CV35 9EF, UK
| | - Mathew Woodwark
- Cambridge Antibody Technology Ltd., Milstein Building, Granta Park, Cambridge CB1 6GH, UK
| |
Collapse
|
63
|
Abstract
The study of evolutionary relationships among protein sequences was one of the first applications of bioinformatics. Since then, and accompanying the wealth of biological data produced by genome sequencing and other high-throughput techniques, the use of bioinformatics in general and phylogenetics in particular has been gaining ground in the study of protein and proteome evolution. Nowadays, the use of phylogenetics is instrumental not only to infer the evolutionary relationships among species and their genome sequences, but also to reconstruct ancestral states of proteins and proteomes and hence trace the paths followed by evolution. Here I survey recent progress in the elucidation of mechanisms of protein and proteome evolution in which phylogenetics has played a determinant role.
Collapse
Affiliation(s)
- Toni Gabaldón
- Bioinformatics Department, Centro de Investigación Principe Felipe
| |
Collapse
|
64
|
Song J, Zheng S, Nguyen N, Wang Y, Zhou Y, Lin K. Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family. BMC Bioinformatics 2017; 18:439. [PMID: 28974198 PMCID: PMC5627428 DOI: 10.1186/s12859-017-1850-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2017] [Accepted: 09/26/2017] [Indexed: 11/28/2022] Open
Abstract
Background Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. Results On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. Conclusions By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated. Electronic supplementary material The online version of this article (10.1186/s12859-017-1850-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jia Song
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Sisi Zheng
- Beijing Key Laboratory of Gene Resources and Molecular Development College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Nhung Nguyen
- Center for Translational Cancer Research, Institute of Biosciences and Technology, Department of Medical Physiology, College of Medicine, Texas A&M University, Houston, TX, 77030, USA
| | - Youjun Wang
- Beijing Key Laboratory of Gene Resources and Molecular Development College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Yubin Zhou
- Center for Translational Cancer Research, Institute of Biosciences and Technology, Department of Medical Physiology, College of Medicine, Texas A&M University, Houston, TX, 77030, USA
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
65
|
Zhang SD, Jin JJ, Chen SY, Chase MW, Soltis DE, Li HT, Yang JB, Li DZ, Yi TS. Diversification of Rosaceae since the Late Cretaceous based on plastid phylogenomics. THE NEW PHYTOLOGIST 2017; 214:1355-1367. [PMID: 28186635 DOI: 10.1111/nph.14461] [Citation(s) in RCA: 207] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 12/26/2016] [Indexed: 05/18/2023]
Abstract
Phylogenetic relationships in Rosaceae have long been problematic because of frequent hybridisation, apomixis and presumed rapid radiation, and their historical diversification has not been clarified. With 87 genera representing all subfamilies and tribes of Rosaceae and six of the other eight families of Rosales (outgroups), we analysed 130 newly sequenced plastomes together with 12 from GenBank in an attempt to reconstruct deep relationships and reveal temporal diversification of this family. Our results highlight the importance of improving sequence alignment and the use of appropriate substitution models in plastid phylogenomics. Three subfamilies and 16 tribes (as previously delimited) were strongly supported as monophyletic, and their relationships were fully resolved and strongly supported at most nodes. Rosaceae were estimated to have originated during the Late Cretaceous with evidence for rapid diversification events during several geological periods. The major lineages rapidly diversified in warm and wet habits during the Late Cretaceous, and the rapid diversification of genera from the early Oligocene onwards occurred in colder and drier environments. Plastid phylogenomics offers new and important insights into deep phylogenetic relationships and the diversification history of Rosaceae. The robust phylogenetic backbone and time estimates we provide establish a framework for future comparative studies on rosaceous evolution.
Collapse
Affiliation(s)
- Shu-Dong Zhang
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Jian-Jun Jin
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
- Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| | - Si-Yun Chen
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Mark W Chase
- Science Directorate, Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3DS, UK
- School of Plant Biology, University of Western Australia, 35 Stirling Highway, Crawley, WA, 6009, Australia
| | - Douglas E Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611-7800, USA
- Department of Biology, University of Florida, Gainesville, FL, 32611, USA
- Genetics Institute, University of Florida, Gainesville, FL, 32608, USA
| | - Hong-Tao Li
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Jun-Bo Yang
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - De-Zhu Li
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Ting-Shuang Yi
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| |
Collapse
|
66
|
Snir S. Ordered orthology as a tool in prokaryotic evolutionary inference. Mob Genet Elements 2017; 6:e1120576. [PMID: 28090377 DOI: 10.1080/2159256x.2015.1120576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Revised: 10/27/2015] [Accepted: 11/10/2015] [Indexed: 10/22/2022] Open
Abstract
Molecular data is accumulated at exponentially increasing pace. This deluge of information should have brought us closer to resolving one of the most fundamental issues in biology - deciphering the history of life on Earth. So far, however, this abundance of data only seems to blur our understanding of the problem. This is largely due to horizontal gene transfer (HGT), the transfer of genetic material between evolutionarily unrelated organisms that transforms the prokaryotic tree into a network of relationships. Recently, we developed a method to infer evolutionary relationships among closely related species where the conventional evolutionary markers do not provide a strong enough signal. The method relies on the loss of synteny, gene order conservation among species that provides a stronger signal, sufficient to classify even strains of a given species. Here we elaborate on this method and suggest further uses of it in the context of detecting HGT events and genome architecture.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary Biology, University of Haifa , Haifa, Israel
| |
Collapse
|
67
|
Frenkel Z, Kiat Y, Izhaki I, Snir S. Convex recoloring as an evolutionary marker. Mol Phylogenet Evol 2016; 107:209-220. [PMID: 27818264 DOI: 10.1016/j.ympev.2016.10.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2016] [Revised: 10/16/2016] [Accepted: 10/25/2016] [Indexed: 11/27/2022]
Abstract
With the availability of enormous quantities of genetic data it has become common to construct very accurate trees describing the evolutionary history of the species under study, as well as every single gene of these species. These trees allow us to examine the evolutionary compliance of given markers (characters). A marker compliant with the history of the species investigated, has undergone mutations along the species tree branches, such that every subtree of that tree exhibits a different state. Convex recoloring (CR) uses combinatorial representation to measure the adequacy of a taxonomic classifier to a given tree. Despite its biological origins, research on CR has been almost exclusively dedicated to mathematical properties of the problem, or variants of it with little, if any, relationship to taxonomy. In this work we return to the origins of CR. We put CR in a statistical framework and introduce and learn the notion of the statistical significance of a character. We apply this measure to two data sets - Passerine birds and prokaryotes, and four examples. These examples demonstrate various applications of CR, from evolutionary relatedness, through lateral evolution, to supertree construction. The above study was done with a new software that we provide, containing algorithmic improvement with a graphical output of a (optimally) recolored tree. AVAILABILITY A code implementing the features and a README is available at http://research.haifa.ac.il/ssagi/software/convexrecoloring.zip.
Collapse
Affiliation(s)
- Zeev Frenkel
- Department of Ecology and Evolutionary Biology, University of Haifa, Israel
| | - Yosef Kiat
- Israeli Bird Ringing Center, Society for the Protection of Nature in Israel, Israel
| | - Ido Izhaki
- Department of Ecology and Evolutionary Biology, University of Haifa, Israel
| | - Sagi Snir
- Department of Ecology and Evolutionary Biology, University of Haifa, Israel
| |
Collapse
|
68
|
Ziemert N, Alanjary M, Weber T. The evolution of genome mining in microbes - a review. Nat Prod Rep 2016; 33:988-1005. [PMID: 27272205 DOI: 10.1039/c6np00025h] [Citation(s) in RCA: 439] [Impact Index Per Article: 48.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Covering: 2006 to 2016The computational mining of genomes has become an important part in the discovery of novel natural products as drug leads. Thousands of bacterial genome sequences are publically available these days containing an even larger number and diversity of secondary metabolite gene clusters that await linkage to their encoded natural products. With the development of high-throughput sequencing methods and the wealth of DNA data available, a variety of genome mining methods and tools have been developed to guide discovery and characterisation of these compounds. This article reviews the development of these computational approaches during the last decade and shows how the revolution of next generation sequencing methods has led to an evolution of various genome mining approaches, techniques and tools. After a short introduction and brief overview of important milestones, this article will focus on the different approaches of mining genomes for secondary metabolites, from detecting biosynthetic genes to resistance based methods and "evo-mining" strategies including a short evaluation of the impact of the development of genome mining methods and tools on the field of natural products and microbial ecology.
Collapse
Affiliation(s)
- Nadine Ziemert
- Interfaculty Institute for Microbiology and Infection Medicine Tübingen (IMIT), Microbiology and Biotechnology, University of Tuebingen, Germany.
| | | | | |
Collapse
|
69
|
McInerney J, Pisani D, O'Connell MJ. The ring of life hypothesis for eukaryote origins is supported by multiple kinds of data. Philos Trans R Soc Lond B Biol Sci 2016; 370:20140323. [PMID: 26323755 DOI: 10.1098/rstb.2014.0323] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The literature is replete with manuscripts describing the origin of eukaryotic cells. Most of the models for eukaryogenesis are either autogenous (sometimes called slow-drip), or symbiogenic (sometimes called big-bang). In this article, we use large and diverse suites of 'Omics' and other data to make the inference that autogeneous hypotheses are a very poor fit to the data and the origin of eukaryotic cells occurred in a single symbiosis.
Collapse
Affiliation(s)
- James McInerney
- Department of Biology, National University of Ireland Maynooth, Co. Kildare, Republic of Ireland Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Davide Pisani
- School of Biological Sciences and School of Earth Sciences, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol BS8 1TG, UK
| | - Mary J O'Connell
- School of Biotechnology, Dublin City University, Glasnevin, Dublin 9, Republic of Ireland
| |
Collapse
|
70
|
Tekaia F. Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016; 9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]
Abstract
With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|
71
|
Cannon JT, Kocot KM. Phylogenomics Using Transcriptome Data. Methods Mol Biol 2016; 1452:65-80. [PMID: 27460370 DOI: 10.1007/978-1-4939-3774-5_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This chapter presents a generalized protocol for conducting phylogenetic analyses using large-scale molecular datasets, specifically using transcriptome data from the Illumina sequencing platform. The general molecular lab bench protocol consists of RNA extraction, cDNA synthesis, and sequencing, in this case via Illumina. After sequences have been obtained, bioinformatics methods are used to assemble raw reads, identify coding regions, and categorize sequences from different species into groups of orthologous genes (OGs). The specific OGs to be used for phylogenetic inference are selected using a custom shell script. Finally, the selected orthologous groups are concatenated into a supermatrix. Generalized methods for phylogenomic inference using maximum likelihood and Bayesian inference software are presented.
Collapse
Affiliation(s)
- Johanna Taylor Cannon
- Department of Zoology, Naturhistoriska Riksmuseet, 50007, SE-104 05, Stockholm, Sweden.
| | - Kevin Michael Kocot
- Department of Biological Sciences and Alabama Museum of Natural History, The University of Alabama, 307 Mary Harmon Bryant Hall, Tuscaloosa, AL, 35487, USA
| |
Collapse
|
72
|
Davidson R, Vachaspati P, Mirarab S, Warnow T. Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC Genomics 2015; 16 Suppl 10:S1. [PMID: 26450506 PMCID: PMC4603753 DOI: 10.1186/1471-2164-16-s10-s1] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. RESULTS We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartet-based species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. CONCLUSION Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS.
Collapse
Affiliation(s)
- Ruth Davidson
- Department of Mathematics, University of Illinois at Urbana-Champaign, 1409 W. Green Street, 61801 Urbana, IL, USA
| | - Pranjal Vachaspati
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, 61801 Urbana, IL, USA
| | - Siavash Mirarab
- Department of Computer Science, University of Texas at Austin, 2317 Speedway, Stop D9500, 78712 Austin, TX, USA
- Department of Electrical and Computer Engineering, University of California at San Diego, 9500 Gilman Drive, 92093, La Jolla, CA, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, 61801 Urbana, IL, USA
- Department of Bioengineering, University of Illinois at Urbana-Champaign, 1270 Digital Computer Laboratory, MC-278, 61801 Urbana, IL, USA
| |
Collapse
|
73
|
New animal phylogeny: future challenges for animal phylogeny in the age of phylogenomics. ORG DIVERS EVOL 2015. [DOI: 10.1007/s13127-015-0236-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
74
|
Xie XH, Yu ZG, Han GS, Yang WF, Anh V. Whole-proteome based phylogenetic tree construction with inter-amino-acid distances and the conditional geometric distribution profiles. Mol Phylogenet Evol 2015; 89:37-45. [PMID: 25882834 DOI: 10.1016/j.ympev.2015.04.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Revised: 03/29/2015] [Accepted: 04/06/2015] [Indexed: 11/18/2022]
Abstract
There has been a growing interest in alignment-free methods for whole genome comparison and phylogenomic studies. In this study, we propose an alignment-free method for phylogenetic tree construction using whole-proteome sequences. Based on the inter-amino-acid distances, we first convert the whole-proteome sequences into inter-amino-acid distance vectors, which are called observed inter-amino-acid distance profiles. Then, we propose to use conditional geometric distribution profiles (the distributions of sequences where the amino acids are placed randomly and independently) as the reference distribution profiles. Last the relative deviation between the observed and reference distribution profiles is used to define a simple metric that reflects the phylogenetic relationships between whole-proteome sequences of different organisms. We name our method inter-amino-acid distances and conditional geometric distribution profiles (IAGDP). We evaluate our method on two data sets: the benchmark dataset including 29 genomes used in previous published papers, and another one including 67 mammal genomes. Our results demonstrate that the new method is useful and efficient.
Collapse
Affiliation(s)
- Xian-Hua Xie
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, PR China; School of Mathematics and Computer Science, Gannan Normal University, Jiangxi 341000, PR China.
| | - Zu-Guo Yu
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, PR China; School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
| | - Guo-Sheng Han
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, PR China.
| | - Wei-Feng Yang
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, PR China.
| | - Vo Anh
- School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
| |
Collapse
|
75
|
Duarte M, Jauregui R, Vilchez-Vargas R, Junca H, Pieper DH. AromaDeg, a novel database for phylogenomics of aerobic bacterial degradation of aromatics. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau118. [PMID: 25468931 PMCID: PMC4250580 DOI: 10.1093/database/bau118] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Understanding prokaryotic transformation of recalcitrant pollutants and the in-situ metabolic nets require the integration of massive amounts of biological data. Decades of biochemical studies together with novel next-generation sequencing data have exponentially increased information on aerobic aromatic degradation pathways. However, the majority of protein sequences in public databases have not been experimentally characterized and homology-based methods are still the most routinely used approach to assign protein function, allowing the propagation of misannotations. AromaDeg is a web-based resource targeting aerobic degradation of aromatics that comprises recently updated (September 2013) and manually curated databases constructed based on a phylogenomic approach. Grounded in phylogenetic analyses of protein sequences of key catabolic protein families and of proteins of documented function, AromaDeg allows query and data mining of novel genomic, metagenomic or metatranscriptomic data sets. Essentially, each query sequence that match a given protein family of AromaDeg is associated to a specific cluster of a given phylogenetic tree and further function annotation and/or substrate specificity may be inferred from the neighboring cluster members with experimentally validated function. This allows a detailed characterization of individual protein superfamilies as well as high-throughput functional classifications. Thus, AromaDeg addresses the deficiencies of homology-based protein function prediction, combining phylogenetic tree construction and integration of experimental data to obtain more accurate annotations of new biological data related to aerobic aromatic biodegradation pathways. We pursue in future the expansion of AromaDeg to other enzyme families involved in aromatic degradation and its regular update. Database URL:http://aromadeg.siona.helmholtz-hzi.de
Collapse
Affiliation(s)
- Márcia Duarte
- Microbial Interactions and Processes Research Group, HZI-Helmholtz Centre for Infection Research, Inhoffenstr. 7, D-38124 Braunschweig, Germany, Research Group Microbial Ecology, Metabolism, Genomics and Evolution of Communities of Environmental Microorganisms, CorpoGen. Carrera 5 No. 66A-35, Bogotá, Colombia and Faculty of Basic and Applied Sciences, Universidad Militar Nueva Granada-UMNG, Campus Cajicá, Bogotá DC, Colombia
| | - Ruy Jauregui
- Microbial Interactions and Processes Research Group, HZI-Helmholtz Centre for Infection Research, Inhoffenstr. 7, D-38124 Braunschweig, Germany, Research Group Microbial Ecology, Metabolism, Genomics and Evolution of Communities of Environmental Microorganisms, CorpoGen. Carrera 5 No. 66A-35, Bogotá, Colombia and Faculty of Basic and Applied Sciences, Universidad Militar Nueva Granada-UMNG, Campus Cajicá, Bogotá DC, Colombia Microbial Interactions and Processes Research Group, HZI-Helmholtz Centre for Infection Research, Inhoffenstr. 7, D-38124 Braunschweig, Germany, Research Group Microbial Ecology, Metabolism, Genomics and Evolution of Communities of Environmental Microorganisms, CorpoGen. Carrera 5 No. 66A-35, Bogotá, Colombia and Faculty of Basic and Applied Sciences, Universidad Militar Nueva Granada-UMNG, Campus Cajicá, Bogotá DC, Colombia
| | - Ramiro Vilchez-Vargas
- Microbial Interactions and Processes Research Group, HZI-Helmholtz Centre for Infection Research, Inhoffenstr. 7, D-38124 Braunschweig, Germany, Research Group Microbial Ecology, Metabolism, Genomics and Evolution of Communities of Environmental Microorganisms, CorpoGen. Carrera 5 No. 66A-35, Bogotá, Colombia and Faculty of Basic and Applied Sciences, Universidad Militar Nueva Granada-UMNG, Campus Cajicá, Bogotá DC, Colombia
| | - Howard Junca
- Microbial Interactions and Processes Research Group, HZI-Helmholtz Centre for Infection Research, Inhoffenstr. 7, D-38124 Braunschweig, Germany, Research Group Microbial Ecology, Metabolism, Genomics and Evolution of Communities of Environmental Microorganisms, CorpoGen. Carrera 5 No. 66A-35, Bogotá, Colombia and Faculty of Basic and Applied Sciences, Universidad Militar Nueva Granada-UMNG, Campus Cajicá, Bogotá DC, Colombia Microbial Interactions and Processes Research Group, HZI-Helmholtz Centre for Infection Research, Inhoffenstr. 7, D-38124 Braunschweig, Germany, Research Group Microbial Ecology, Metabolism, Genomics and Evolution of Communities of Environmental Microorganisms, CorpoGen. Carrera 5 No. 66A-35, Bogotá, Colombia and Faculty of Basic and Applied Sciences, Universidad Militar Nueva Granada-UMNG, Campus Cajicá, Bogotá DC, Colombia
| | - Dietmar H Pieper
- Microbial Interactions and Processes Research Group, HZI-Helmholtz Centre for Infection Research, Inhoffenstr. 7, D-38124 Braunschweig, Germany, Research Group Microbial Ecology, Metabolism, Genomics and Evolution of Communities of Environmental Microorganisms, CorpoGen. Carrera 5 No. 66A-35, Bogotá, Colombia and Faculty of Basic and Applied Sciences, Universidad Militar Nueva Granada-UMNG, Campus Cajicá, Bogotá DC, Colombia
| |
Collapse
|
76
|
Joice R, Yasuda K, Shafquat A, Morgan XC, Huttenhower C. Determining microbial products and identifying molecular targets in the human microbiome. Cell Metab 2014; 20:731-741. [PMID: 25440055 PMCID: PMC4254638 DOI: 10.1016/j.cmet.2014.10.003] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Human-associated microbes are the source of many bioactive microbial products (proteins and metabolites) that play key functions both in human host pathways and in microbe-microbe interactions. Culture-independent studies now provide an accelerated means of exploring novel bioactives in the human microbiome; however, intriguingly, a substantial fraction of the microbial metagenome cannot be mapped to annotated genes or isolate genomes and is thus of unknown function. Meta'omic approaches, including metagenomic sequencing, metatranscriptomics, metabolomics, and integration of multiple assay types, represent an opportunity to efficiently explore this large pool of potential therapeutics. In combination with appropriate follow-up validation, high-throughput culture-independent assays can be combined with computational approaches to identify and characterize novel and biologically interesting microbial products. Here we briefly review the state of microbial product identification and characterization and discuss possible next steps to catalog and leverage the large uncharted fraction of the microbial metagenome.
Collapse
Affiliation(s)
- Regina Joice
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Koji Yasuda
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Afrah Shafquat
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Xochitl C Morgan
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
77
|
Sharma AR, Chakraborty C, Lee SS, Sharma G, Yoon JK, George Priya Doss C, Song DK, Nam JS. Computational biophysical, biochemical, and evolutionary signature of human R-spondin family proteins, the member of canonical Wnt/β-catenin signaling pathway. BIOMED RESEARCH INTERNATIONAL 2014; 2014:974316. [PMID: 25276837 PMCID: PMC4172882 DOI: 10.1155/2014/974316] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Revised: 07/12/2014] [Accepted: 07/12/2014] [Indexed: 12/27/2022]
Abstract
In human, Wnt/β-catenin signaling pathway plays a significant role in cell growth, cell development, and disease pathogenesis. Four human (Rspo)s are known to activate canonical Wnt/β-catenin signaling pathway. Presently, (Rspo)s serve as therapeutic target for several human diseases. Henceforth, basic understanding about the molecular properties of (Rspo)s is essential. We approached this issue by interpreting the biochemical and biophysical properties along with molecular evolution of (Rspo)s thorough computational algorithm methods. Our analysis shows that signal peptide length is roughly similar in (Rspo)s family along with similarity in aa distribution pattern. In Rspo3, four N-glycosylation sites were noted. All members are hydrophilic in nature and showed alike GRAVY values, approximately. Conversely, Rspo3 contains the maximum positively charged residues while Rspo4 includes the lowest. Four highly aligned blocks were recorded through Gblocks. Phylogenetic analysis shows Rspo4 is being rooted with Rspo2 and similarly Rspo3 and Rspo1 have the common point of origin. Through phylogenomics study, we developed a phylogenetic tree of sixty proteins (n = 60) with the orthologs and paralogs seed sequences. Protein-protein network was also illustrated. Results demonstrated in our study may help the future researchers to unfold significant physiological and therapeutic properties of (Rspo)s in various disease models.
Collapse
Affiliation(s)
- Ashish Ranjan Sharma
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University Hospital, College of Medicine, Chuncheon-si, Gangwon-do 200-704, Republic of Korea
| | - Chiranjib Chakraborty
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
- Department of Bioinformatics, School of Computer Sciences, Galgotias University, Greater Noida 203201, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
| | - Garima Sharma
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
| | - Jeong Kyo Yoon
- Center for Molecular Medicine, Maine Medial Center Research Institute, 81 Research Drive, Scarborough, ME 04074, USA
| | - C. George Priya Doss
- Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore, Tamil Nadu 632014, India
| | - Dong-Keun Song
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
| | - Ju-Suk Nam
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
| |
Collapse
|
78
|
A daily-updated tree of (sequenced) life as a reference for genome research. Sci Rep 2014; 3:2015. [PMID: 23778980 PMCID: PMC6504836 DOI: 10.1038/srep02015] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2013] [Accepted: 05/10/2013] [Indexed: 11/08/2022] Open
Abstract
We report a daily-updated sequenced/species Tree Of Life (sTOL) as a reference for the increasing number of cellular organisms with their genomes sequenced. The sTOL builds on a likelihood-based weight calibration algorithm to consolidate NCBI taxonomy information in concert with unbiased sampling of molecular characters from whole genomes of all sequenced organisms. Via quantifying the extent of agreement between taxonomic and molecular data, we observe there are many potential improvements that can be made to the status quo classification, particularly in the Fungi kingdom; we also see that the current state of many animal genomes is rather poor. To augment the use of sTOL in providing evolutionary contexts, we integrate an ontology infrastructure and demonstrate its utility for evolutionary understanding on: nuclear receptors, stem cells and eukaryotic genomes. The sTOL (http://supfam.org/SUPERFAMILY/sTOL) provides a binary tree of (sequenced) life, and contributes to an analytical platform linking genome evolution, function and phenotype.
Collapse
|
79
|
Som A. Causes, consequences and solutions of phylogenetic incongruence. Brief Bioinform 2014; 16:536-48. [PMID: 24872401 DOI: 10.1093/bib/bbu015] [Citation(s) in RCA: 91] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Accepted: 04/05/2014] [Indexed: 11/14/2022] Open
Abstract
Phylogenetic analysis is used to recover the evolutionary history of species, genes or proteins. Understanding phylogenetic relationships between organisms is a prerequisite of almost any evolutionary study, as contemporary species all share a common history through their ancestry. Moreover, it is important because of its wide applications that include understanding genome organization, epidemiological investigations, predicting protein functions, and deciding the genes to be analyzed in comparative studies. Despite immense progress in recent years, phylogenetic reconstruction involves many challenges that create uncertainty with respect to the true evolutionary relationships of the species or genes analyzed. One of the most notable difficulties is the widespread occurrence of incongruence among methods and also among individual genes or different genomic regions. Presence of widespread incongruence inhibits successful revealing of evolutionary relationships and applications of phylogenetic analysis. In this article, I concisely review the effect of various factors that cause incongruence in molecular phylogenies, the advances in the field that resolved some factors, and explore unresolved factors that cause incongruence along with possible ways for tackling them.
Collapse
|
80
|
Seo JH, Park J, Kim EM, Kim J, Joo K, Lee J, Kim BG. Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm. Comput Biol Chem 2014; 48:64-70. [PMID: 24378653 DOI: 10.1016/j.compbiolchem.2013.11.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Revised: 10/12/2013] [Accepted: 11/23/2013] [Indexed: 11/28/2022]
Abstract
Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping.
Collapse
Affiliation(s)
- Joo-Hyun Seo
- School of Chemical and Biological Engineering, Seoul National University, Seoul 151-742, Republic of Korea; School of Computational Sciences, Korea Institute of Advanced Study, Seoul 130-722, Republic of Korea
| | - Jihyang Park
- School of Chemical and Biological Engineering, Seoul National University, Seoul 151-742, Republic of Korea
| | - Eun-Mi Kim
- School of Chemical and Biological Engineering, Seoul National University, Seoul 151-742, Republic of Korea
| | - Juhan Kim
- School of Chemical and Biological Engineering, Seoul National University, Seoul 151-742, Republic of Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 130-722, Republic of Korea
| | - Jooyoung Lee
- School of Computational Sciences, Korea Institute of Advanced Study, Seoul 130-722, Republic of Korea
| | - Byung-Gee Kim
- School of Chemical and Biological Engineering, Seoul National University, Seoul 151-742, Republic of Korea.
| |
Collapse
|
81
|
Merging Ecology and Genomics to Dissect Diversity in Wild Tomatoes and Their Relatives. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 781:273-98. [DOI: 10.1007/978-94-007-7347-9_14] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
82
|
Wang X, Wang R, Zhang Y, Zhang H. Evolutionary survey of druggable protein targets with respect to their subcellular localizations. Genome Biol Evol 2013; 5:1291-7. [PMID: 23749117 PMCID: PMC3730344 DOI: 10.1093/gbe/evt092] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The druggable subset of the human genome, termed the “druggable genome,” provides the pharmaceutical industry with a unique opportunity for the advancement of new therapeutic interventions for a multitude of diseases and disorders. To date, there is no systematic assessment of the evolutionary history and nature of the defined druggable proteins derived from the contemporary druggable genome (i.e., proteins that bind or are predicted to bind with high affinity to a biologic). An understanding of drug–protein target interactions in specific cellular compartments is crucial for the optimal therapeutic delivery of pharmaceutical agents, as well as for preclinical drug trials in model animals. This study applied the concept of pharmacophylogenomics, the study of genes, evolution, and drug targets, to conduct an evolutionary survey of drug targets with respect to their subcellular localizations. Using multiple models and modes of druggable genome comparison, the results concordantly indicated that orthologous drug targets with a nuclear localization in the human, macaque, mouse, and rat showed a higher trend for evolutionary conservation compared with drug targets in the cell membrane and the extracellular compartment. As such, this study provides important information regarding druggable protein targets and the druggable genome at the pharmacophylogenomics level.
Collapse
Affiliation(s)
- Xiaotong Wang
- School of Agriculture, Ludong University, Yantai, China
| | | | | | | |
Collapse
|
83
|
Huerta-Cepas J, Capella-Gutiérrez S, Pryszcz LP, Marcet-Houben M, Gabaldón T. PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome. Nucleic Acids Res 2013; 42:D897-902. [PMID: 24275491 PMCID: PMC3964985 DOI: 10.1093/nar/gkt1177] [Citation(s) in RCA: 190] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Phylogenetic trees representing the evolutionary relationships of homologous genes are the entry point for many evolutionary analyses. For instance, the use of a phylogenetic tree can aid in the inference of orthology and paralogy relationships, and in the detection of relevant evolutionary events such as gene family expansions and contractions, horizontal gene transfer, recombination or incomplete lineage sorting. Similarly, given the plurality of evolutionary histories among genes encoded in a given genome, there is a need for the combined analysis of genome-wide collections of phylogenetic trees (phylomes). Here, we introduce a new release of PhylomeDB (http://phylomedb.org), a public repository of phylomes. Currently, PhylomeDB hosts 120 public phylomes, comprising >1.5 million maximum likelihood trees and multiple sequence alignments. In the current release, phylogenetic trees are annotated with taxonomic, protein-domain arrangement, functional and evolutionary information. PhylomeDB is also a major source for phylogeny-based predictions of orthology and paralogy, covering >10 million proteins across 1059 sequenced species. Here we describe newly implemented PhylomeDB features, and discuss a benchmark of the orthology predictions provided by the database, the impact of proteome updates and the use of the phylome approach in the analysis of newly sequenced genomes and transcriptomes.
Collapse
Affiliation(s)
- Jaime Huerta-Cepas
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader, 88. 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain
| | | | | | | | | |
Collapse
|
84
|
Yang JB, Yang SX, Li HT, Yang J, Li DZ. Comparative chloroplast genomes of camellia species. PLoS One 2013; 8:e73053. [PMID: 24009730 PMCID: PMC3751842 DOI: 10.1371/journal.pone.0073053] [Citation(s) in RCA: 112] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Accepted: 07/16/2013] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Camellia, comprising more than 200 species, is a valuable economic commodity due to its enormously popular commercial products: tea leaves, flowers, and high-quality edible oils. It is the largest and most important genus in the family Theaceae. However, phylogenetic resolution of the species has proven to be difficult. Consequently, the interspecies relationships of the genus Camellia are still hotly debated. Phylogenomics is an attractive avenue that can be used to reconstruct the tree of life, especially at low taxonomic levels. METHODOLOGY/PRINCIPAL FINDINGS Seven complete chloroplast (cp) genomes were sequenced from six species representing different subdivisions of the genus Camellia using Illumina sequencing technology. Four junctions between the single-copy segments and the inverted repeats were confirmed and genome assemblies were validated by PCR-based product sequencing using 123 pairs of primers covering preliminary cp genome assemblies. The length of the Camellia cp genome was found to be about 157kb, which contained 123 unique genes and 23 were duplicated in the IR regions. We determined that the complete Camellia cp genome was relatively well conserved, but contained enough genetic differences to provide useful phylogenetic information. Phylogenetic relationships were analyzed using seven complete cp genomes of six Camellia species. We also identified rapidly evolving regions of the cp genome that have the potential to be used for further species identification and phylogenetic resolution. CONCLUSIONS/SIGNIFICANCE In this study, we wanted to determine if analyzing completely sequenced cp genomes could help settle these controversies of interspecies relationships in Camellia. The results demonstrate that cp genome data are beneficial in resolving species definition because they indicate that organelle-based "barcodes", can be established for a species and then used to unmask interspecies phylogenetic relationships. It reveals that phylogenomics based on cp genomes is an effective approach for achieving phylogenetic resolution between Camellia species.
Collapse
Affiliation(s)
- Jun-Bo Yang
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Shi-Xiong Yang
- Key Laboratory of Biodiversity and Biogeography, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Hong-Tao Li
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Jing Yang
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - De-Zhu Li
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
- Key Laboratory of Biodiversity and Biogeography, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
| |
Collapse
|
85
|
Roch S, Snir S. Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. J Comput Biol 2013; 20:93-112. [PMID: 23383996 DOI: 10.1089/cmb.2012.0234] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Lateral gene transfer (LGT) is a common mechanism of nonvertical evolution, during which genetic material is transferred between two more or less distantly related organisms. It is particularly common in bacteria where it contributes to adaptive evolution with important medical implications. In evolutionary studies, LGT has been shown to create widespread discordance between gene trees as genomes become mosaics of gene histories. In particular, the Tree of Life has been questioned as an appropriate representation of bacterial evolutionary history. Nevertheless a common hypothesis is that prokaryotic evolution is primarily treelike, but that the underlying trend is obscured by LGT. Extensive empirical work has sought to extract a common treelike signal from conflicting gene trees. Here we give a probabilistic perspective on the problem of recovering the treelike trend despite LGT. Under a model of randomly distributed LGT, we show that the species phylogeny can be reconstructed even in the presence of surprisingly many (almost linear number of) LGT events per gene tree. Our results, which are optimal up to logarithmic factors, are based on the analysis of a robust, computationally efficient reconstruction method and provides insight into the design of such methods. Finally, we show that our results have implications for the discovery of highways of gene sharing.
Collapse
Affiliation(s)
- Sebastien Roch
- Department of Mathematics and Bioinformatics Program, University of California at Los Angeles, Los Angeles, CA, USA.
| | | |
Collapse
|
86
|
Renner SS, Piednoël M. Phylogenomics: A Primer. — By Rob DeSalle and Jeffrey A. Rosenfeld. Syst Biol 2013. [DOI: 10.1093/sysbio/syt046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Susanne S. Renner
- Systematic Botany and Mycology, Department of Biology, Munich University, Menzingerstr. 67, Munich 80638, Germany; E-mails: ,
| | - Mathieu Piednoël
- Systematic Botany and Mycology, Department of Biology, Munich University, Menzingerstr. 67, Munich 80638, Germany; E-mails: ,
| |
Collapse
|
87
|
Fang G, Passalacqua KD, Hocking J, Llopis PM, Gerstein M, Bergman NH, Jacobs-Wagner C. Transcriptomic and phylogenetic analysis of a bacterial cell cycle reveals strong associations between gene co-expression and evolution. BMC Genomics 2013; 14:450. [PMID: 23829427 PMCID: PMC3829707 DOI: 10.1186/1471-2164-14-450] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2012] [Accepted: 05/13/2013] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND The genetic network involved in the bacterial cell cycle is poorly understood even though it underpins the remarkable ability of bacteria to proliferate. How such network evolves is even less clear. The major aims of this work were to identify and examine the genes and pathways that are differentially expressed during the Caulobacter crescentus cell cycle, and to analyze the evolutionary features of the cell cycle network. RESULTS We used deep RNA sequencing to obtain high coverage RNA-Seq data of five C. crescentus cell cycle stages, each with three biological replicates. We found that 1,586 genes (over a third of the genome) display significant differential expression between stages. This gene list, which contains many genes previously unknown for their cell cycle regulation, includes almost half of the genes involved in primary metabolism, suggesting that these "house-keeping" genes are not constitutively transcribed during the cell cycle, as often assumed. Gene and module co-expression clustering reveal co-regulated pathways and suggest functionally coupled genes. In addition, an evolutionary analysis of the cell cycle network shows a high correlation between co-expression and co-evolution. Most co-expression modules have strong phylogenetic signals, with broadly conserved genes and clade-specific genes predominating different substructures of the cell cycle co-expression network. We also found that conserved genes tend to determine the expression profile of their module. CONCLUSION We describe the first phylogenetic and single-nucleotide-resolution transcriptomic analysis of a bacterial cell cycle network. In addition, the study suggests how evolution has shaped this network and provides direct biological network support that selective pressure is not on individual genes but rather on the relationship between genes, which highlights the importance of integrating phylogenetic analysis into biological network studies.
Collapse
Affiliation(s)
- Gang Fang
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06511, USA.
| | | | | | | | | | | | | |
Collapse
|
88
|
Romiguier J, Ranwez V, Delsuc F, Galtier N, Douzery EJP. Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals. Mol Biol Evol 2013; 30:2134-44. [PMID: 23813978 DOI: 10.1093/molbev/mst116] [Citation(s) in RCA: 122] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Despite the rapid increase of size in phylogenomic data sets, a number of important nodes on animal phylogeny are still unresolved. Among these, the rooting of the placental mammal tree is still a controversial issue. One difficulty lies in the pervasive phylogenetic conflicts among genes, with each one telling its own story, which may be reliable or not. Here, we identified a simple criterion, that is, the GC content, which substantially helps in determining which gene trees best reflect the species tree. We assessed the ability of 13,111 coding sequence alignments to correctly reconstruct the placental phylogeny. We found that GC-rich genes induced a higher amount of conflict among gene trees and performed worse than AT-rich genes in retrieving well-supported, consensual nodes on the placental tree. We interpret this GC effect mainly as a consequence of genome-wide variations in recombination rate. Indeed, recombination is known to drive GC-content evolution through GC-biased gene conversion and might be problematic for phylogenetic reconstruction, for instance, in an incomplete lineage sorting context. When we focused on the AT-richest fraction of the data set, the resolution level of the placental phylogeny was greatly increased, and a strong support was obtained in favor of an Afrotheria rooting, that is, Afrotheria as the sister group of all other placentals. We show that in mammals most conflicts among gene trees, which have so far hampered the resolution of the placental tree, are concentrated in the GC-rich regions of the genome. We argue that the GC content-because it is a reliable indicator of the long-term recombination rate-is an informative criterion that could help in identifying the most reliable molecular markers for species tree inference.
Collapse
Affiliation(s)
- Jonathan Romiguier
- CNRS, Université Montpellier, Institut des Sciences de l'Evolution, Montpellier, France.
| | | | | | | | | |
Collapse
|
89
|
Baele G, Lemey P. Bayesian evolutionary model testing in the phylogenomics era: matching model complexity with computational efficiency. Bioinformatics 2013; 29:1970-9. [DOI: 10.1093/bioinformatics/btt340] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
|
90
|
Gallo A, Ferrara M, Perrone G. Phylogenetic study of polyketide synthases and nonribosomal peptide synthetases involved in the biosynthesis of mycotoxins. Toxins (Basel) 2013; 5:717-42. [PMID: 23604065 PMCID: PMC3705289 DOI: 10.3390/toxins5040717] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Revised: 03/22/2013] [Accepted: 04/10/2013] [Indexed: 01/07/2023] Open
Abstract
Polyketide synthase (PKSs) and nonribosomal peptide synthetase (NRPSs) are large multimodular enzymes involved in biosynthesis of polyketide and peptide toxins produced by fungi. Furthermore, hybrid enzymes, in which a reducing PKS region is fused to a single NRPS module, are also responsible of the synthesis of peptide-polyketide metabolites in fungi. The genes encoding for PKSs and NRPSs have been exposed to complex evolutionary mechanisms, which have determined the great number and diversity of metabolites. In this study, we considered the most important polyketide and peptide mycotoxins and, for the first time, a phylogenetic analysis of both PKSs and NRPSs involved in their biosynthesis was assessed using two domains for each enzyme: β-ketosynthase (KS) and acyl-transferase (AT) for PKSs; adenylation (A) and condensation (C) for NRPSs. The analysis of both KS and AT domains confirmed the differentiation of the three classes of highly, partially and non-reducing PKSs. Hybrid PKS-NRPSs involved in mycotoxins biosynthesis grouped together in the phylogenetic trees of all the domains analyzed. For most mycotoxins, the corresponding biosynthetic enzymes from distinct fungal species grouped together, except for PKS and NRPS involved in ochratoxin A biosynthesis, for which an unlike process of evolution could be hypothesized in different species.
Collapse
Affiliation(s)
- Antonia Gallo
- Institute of Sciences of Food Production ISPA, National Research Council CNR, Bari, Italy.
| | | | | |
Collapse
|
91
|
Mistry J, Finn RD, Eddy SR, Bateman A, Punta M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res 2013; 41:e121. [PMID: 23598997 PMCID: PMC3695513 DOI: 10.1093/nar/gkt263] [Citation(s) in RCA: 1029] [Impact Index Per Article: 85.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Detection of protein homology via sequence similarity has important applications in biology, from protein structure and function prediction to reconstruction of phylogenies. Although current methods for aligning protein sequences are powerful, challenges remain, including problems with homologous overextension of alignments and with regions under convergent evolution. Here, we test the ability of the profile hidden Markov model method HMMER3 to correctly assign homologous sequences to >13,000 manually curated families from the Pfam database. We identify problem families using protein regions that match two or more Pfam families not currently annotated as related in Pfam. We find that HMMER3 E-value estimates seem to be less accurate for families that feature periodic patterns of compositional bias, such as the ones typically observed in coiled-coils. These results support the continued use of manually curated inclusion thresholds in the Pfam database, especially on the subset of families that have been identified as problematic in experiments such as these. They also highlight the need for developing new methods that can correct for this particular type of compositional bias.
Collapse
Affiliation(s)
- Jaina Mistry
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | | | | | |
Collapse
|
92
|
Gu X, Zou Y, Su Z, Huang W, Zhou Z, Arendsee Z, Zeng Y. An update of DIVERGE software for functional divergence analysis of protein family. Mol Biol Evol 2013; 30:1713-9. [PMID: 23589455 DOI: 10.1093/molbev/mst069] [Citation(s) in RCA: 146] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
DIVERGE is a software system for phylogeny-based analyses of protein family evolution and functional divergence. It provides a suite of statistical tools for selection and prioritization of the amino acid sites that are responsible for the functional divergence of a gene family. The synergistic efforts of DIVERGE and other methods have convincingly demonstrated that the pattern of rate change at a particular amino acid site may contain insightful information about the underlying functional divergence following gene duplication. These predicted sites may be used as candidates for further experiments. We are now releasing an updated version of DIVERGE with the following improvements: 1) a feasible approach to examining functional divergence in nearly complete sequences by including deletions and insertions (indels); 2) the calculation of the false discovery rate of functionally diverging sites; 3) estimation of the effective number of functional divergence-related sites that is reliable and insensitive to cutoffs; 4) a statistical test for asymmetric functional divergence; and 5) a new method to infer functional divergence specific to a given duplicate cluster. In addition, we have made efforts to improve software design and produce a well-written software manual for the general user.
Collapse
Affiliation(s)
- Xun Gu
- State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, China.
| | | | | | | | | | | | | |
Collapse
|
93
|
|
94
|
Schreiber F, Sonnhammer ELL. Hieranoid: hierarchical orthology inference. J Mol Biol 2013; 425:2072-2081. [PMID: 23485417 DOI: 10.1016/j.jmb.2013.02.018] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Revised: 02/13/2013] [Accepted: 02/16/2013] [Indexed: 12/13/2022]
Abstract
An accurate inference of orthologs is essential in many research fields such as comparative genomics, molecular evolution, and genome annotation. Existing methods for genome-scale orthology inference are mostly based on all-versus-all similarity searches that scale quadratically with the number of species. This limits their application to the increasing number of available large-scale datasets. Here, we present Hieranoid, a new orthology inference method using a hierarchical approach. Hieranoid performs pairwise orthology analysis using InParanoid at each node in a guide tree as it progresses from its leaves to the root. This concept reduces the total runtime complexity from a quadratic to a linear function of the number of species. The tree hierarchy provides a natural structure in multi-species ortholog groups, and the aggregation of multiple sequences allows for multiple alignment similarity searching techniques, which can yield more accurate ortholog groups. Using the recently published orthobench benchmark, Hieranoid showed the overall best performance. Our progressive approach presents a new way to infer orthologs that combines efficient graph-based methodology with aspects of compute-intensive tree-based methods. The linear scaling with the number of species is a major advantage for large-scale applications and makes Hieranoid well suited to cope with vast amounts of sequenced genomes in the future. Hieranoid is an open source and can be downloaded at Hieranoid.sbc.su.se.
Collapse
Affiliation(s)
- Fabian Schreiber
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden; Department of Biochemistry and Biophysics, Stockholm University, SE-10691 Stockholm, Sweden.
| | - Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden; Department of Biochemistry and Biophysics, Stockholm University, SE-10691 Stockholm, Sweden; Swedish e-Science Research Center, SE-10044 Stockholm, Sweden
| |
Collapse
|
95
|
Research proceedings on primate comparative genomics. Zool Res 2013; 33:108-18. [DOI: 10.3724/sp.j.1141.2012.01108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
96
|
Applying Shannon's information theory to bacterial and phage genomes and metagenomes. Sci Rep 2013; 3:1033. [PMID: 23301154 PMCID: PMC3539204 DOI: 10.1038/srep01033] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 11/20/2012] [Indexed: 01/12/2023] Open
Abstract
All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis.
Collapse
|
97
|
Warnow T. Large-Scale Multiple Sequence Alignment and Phylogeny Estimation. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
98
|
Chan JZM, Halachev MR, Loman NJ, Constantinidou C, Pallen MJ. Defining bacterial species in the genomic era: insights from the genus Acinetobacter. BMC Microbiol 2012; 12:302. [PMID: 23259572 PMCID: PMC3556118 DOI: 10.1186/1471-2180-12-302] [Citation(s) in RCA: 154] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Accepted: 12/18/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microbial taxonomy remains a conservative discipline, relying on phenotypic information derived from growth in pure culture and techniques that are time-consuming and difficult to standardize, particularly when compared to the ease of modern high-throughput genome sequencing. Here, drawing on the genus Acinetobacter as a test case, we examine whether bacterial taxonomy could abandon phenotypic approaches and DNA-DNA hybridization and, instead, rely exclusively on analyses of genome sequence data. RESULTS In pursuit of this goal, we generated a set of thirteen new draft genome sequences, representing ten species, combined them with other publically available genome sequences and analyzed these 38 strains belonging to the genus. We found that analyses based on 16S rRNA gene sequences were not capable of delineating accepted species. However, a core genome phylogenetic tree proved consistent with the currently accepted taxonomy of the genus, while also identifying three misclassifications of strains in collections or databases. Among rapid distance-based methods, we found average-nucleotide identity (ANI) analyses delivered results consistent with traditional and phylogenetic classifications, whereas gene content based approaches appear to be too strongly influenced by the effects of horizontal gene transfer to agree with previously accepted species. CONCLUSION We believe a combination of core genome phylogenetic analysis and ANI provides an appropriate method for bacterial species delineation, whereby bacterial species are defined as monophyletic groups of isolates with genomes that exhibit at least 95% pair-wise ANI. The proposed method is backwards compatible; it provides a scalable and uniform approach that works for both culturable and non-culturable species; is faster and cheaper than traditional taxonomic methods; is easily replicable and transferable among research institutions; and lastly, falls in line with Darwin's vision of classification becoming, as far as is possible, genealogical.
Collapse
Affiliation(s)
- Jacqueline Z-M Chan
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, B15 2TT, UK
| | - Mihail R Halachev
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, B15 2TT, UK
| | - Nicholas J Loman
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, B15 2TT, UK
| | - Chrystala Constantinidou
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, B15 2TT, UK
| | - Mark J Pallen
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Edgbaston, B15 2TT, UK
| |
Collapse
|
99
|
Gevers D, Pop M, Schloss PD, Huttenhower C. Bioinformatics for the Human Microbiome Project. PLoS Comput Biol 2012; 8:e1002779. [PMID: 23209389 PMCID: PMC3510052 DOI: 10.1371/journal.pcbi.1002779] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Affiliation(s)
- Dirk Gevers
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (DG); (MP); (PS); (CH)
| | - Mihai Pop
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
- * E-mail: (DG); (MP); (PS); (CH)
| | - Patrick D. Schloss
- Department of Microbiology & Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail: (DG); (MP); (PS); (CH)
| | - Curtis Huttenhower
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- * E-mail: (DG); (MP); (PS); (CH)
| |
Collapse
|
100
|
Moroz LL. Phylogenomics meets neuroscience: how many times might complex brains have evolved? ACTA BIOLOGICA HUNGARICA 2012; 63 Suppl 2:3-19. [PMID: 22776469 DOI: 10.1556/abiol.63.2012.suppl.2.1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The origin of complex centralized brains is one of the major evolutionary transitions in the history of animals. Monophyly (i.e. presence of a centralized nervous system in urbilateria) vs polyphyly (i.e. multiple origins by parallel centralization of nervous systems within several lineages) are two historically conflicting scenarios to explain such transitions. However, recent phylogenomic and cladistic analysis suggests that complex brains may have independently evolved at least 9 times within different animal lineages. Indeed, even within the phylum Mollusca cephalization might have occurred at least 5 times. Emerging molecular data further suggest that at the genomic level such transitions might have been achieved by changes in expression of just a few transcriptional factors - not surprising since such events might happen multiple times over 700 million years of animal evolution. Both cladistic and genomic analyses also imply that neurons themselves evolved more than once. Ancestral polarized secretory cells were likely involved in coordination of ciliated locomotion in early animals, and these cells can be considered as evolutionary precursors of neurons within different lineages. Under this scenario, the origins of neurons can be linked to adaptations to stress/injury factors in the form of integrated regeneration-type cellular response with secretory signaling peptides as early neurotransmitters. To further reconstruct the parallel evolution of nervous systems genomic approaches are essential to probe enigmatic neurons of basal metazoans, selected lophotrochozoans (e.g. phoronids, brachiopods) and deuterostomes.
Collapse
Affiliation(s)
- L L Moroz
- The Whitney Laboratory for Marine Bioscience, University of Florida, 9505 Ocean Shore Blvd. St. Augustine Florida 32080, USA.
| |
Collapse
|