1
|
Bae YA. In silico identification and structural characterization of telomerase reverse transcriptases in parasitic platyhelminths. Gene 2025; 962:149558. [PMID: 40360013 DOI: 10.1016/j.gene.2025.149558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2025] [Revised: 04/28/2025] [Accepted: 05/08/2025] [Indexed: 05/15/2025]
Abstract
Telomere shortening during eukaryotic cell division can lead to severe problems such as inactivation of neighboring genes and aberrant chromosomal fusion. To protect chromosome ends from the replicative errors, most eukaryotes have evolved an enzymatic defense mechanism called telomerase, in which telomerase reverse transcriptase (TERT) plays a central role. This enzymatic activity is highly elevated in consecutively dividing somatic cells of regenerating and probably asexually reproducing, platyhelminths. Therefore, flatworms can be powerful models to investigate the biological implications of TERT in these non-embryonic developments. Current information on the protein is largely limited to a handful of representative species within the phylum Platyhelminthes. This study characterizes the structural features of TERT proteins and their encoding genes in flatworms, aiming to expand our knowledge of the telomere-protecting protein in this lower animal taxon. The platyhelminth genes exhibited exon-intron architectures that were highly divergent from their orthologs in the other lophotrochozoans, and their protein products lacked some TERT-specific domains such as the telomerase essential N-terminal and repeat addition processivity domains. Nevertheless, the unique gene and protein structures were tightly conserved among the flatworm homologs. Analysis of the tert transcripts showed that use of alternative splice acceptors or donors in a minor AT-AC intron, as well as intron retention and exon exclusion, contribute to the generation of aberrant mRNAs. The present findings demonstrate that the tert gene has undergone structural changes soon after the emergence of the platyhelminth lineage, which might have been coordinated with those of its functional counterpart, the telomerase RNA molecule.
Collapse
Affiliation(s)
- Young-An Bae
- Department of Microbiology and Lee Gil Ya Cancer and Diabetes Institute, Gachon University College of Medicine, Incheon 21999, Korea.
| |
Collapse
|
2
|
Chung HC, Friedberg I, Bromberg Y. Assembling bacterial puzzles: piecing together functions into microbial pathways. NAR Genom Bioinform 2024; 6:lqae109. [PMID: 39184378 PMCID: PMC11344244 DOI: 10.1093/nargab/lqae109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 07/24/2024] [Accepted: 08/07/2024] [Indexed: 08/27/2024] Open
Abstract
Functional metagenomics enables the study of unexplored bacterial diversity, gene families, and pathways essential to microbial communities. However, discovering biological insights with these data is impeded by the scarcity of quality annotations. Here, we use a co-occurrence-based analysis of predicted microbial protein functions to uncover pathways in genomic and metagenomic biological systems. Our approach, based on phylogenetic profiles, improves the identification of functional relationships, or participation in the same biochemical pathway, between enzymes over a comparable homology-based approach. We optimized the design of our profiles to identify potential pathways using minimal data, clustered functionally related enzyme pairs into multi-enzymatic pathways, and evaluated our predictions against reference pathways in the KEGG database. We then demonstrated a novel extension of this approach to predict inter-bacterial protein interactions amongst members of a marine microbiome. Most significantly, we show our method predicts emergent biochemical pathways between known and unknown functions. Thus, our work establishes a basis for identifying the potential functional capacities of the entire metagenome, capturing previously unknown and abstract functions into discrete putative pathways.
Collapse
Affiliation(s)
- Henri C Chung
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011 , USA
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - Yana Bromberg
- Department of Computer Science, Emory University, Atlanta, GA 30307, USA
- Department of Biology, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
3
|
Giangrazi F, Buffa D, Lloyd AT, Redmond AK, Glover LE, O'Farrelly C. Evolutionary Analysis of the Mammalian IL-17 Cytokine Family Suggests Conserved Roles in Female Fertility. Am J Reprod Immunol 2024; 92:e13907. [PMID: 39177066 DOI: 10.1111/aji.13907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/26/2024] [Accepted: 07/01/2024] [Indexed: 08/24/2024] Open
Abstract
PROBLEM The interleukin-17 (IL-17) family includes pro-inflammatory cytokines IL-17A-F with important roles in mucosal defence, barrier integrity and tissue regeneration. IL-17A can be dysregulated in fertility complications, including pre-eclampsia, endometriosis and miscarriage. Because mammalian subclasses (eutherian, metatherian, and prototherian) have different related reproductive strategies, IL-17 genes and proteins were investigated in the three mammalian classes to explore their involvement in female fertility. METHOD OF STUDY Gene and protein sequences for IL-17s are found in eutherian, metatherian and prototherian mammals. Through synteny and multiple sequence protein alignment, the relationships among mammalian IL-17s were inferred. Publicly available datasets of early pregnancy stages and female fertility in therian mammals were collected and analysed to retrieve information on IL-17 expression. RESULTS Synteny mapping and phylogenetic analyses allowed the classification of mammalian IL-17 family orthologs of human IL-17. Despite differences in their primary amino acid sequence, metatherian and prototherian IL-17s share the same tertiary structure as human IL-17s, suggesting similar functions. The analysis of available datasets for female fertility in therian mammals shows up-regulation of IL-17A and IL-17D during placentation. IL-17B and IL-17D are also found to be over-expressed in human fertility complication datasets, such as endometriosis or recurrent implantation failure. CONCLUSIONS The conservation of the IL-17 gene and protein across mammals suggests similar functions in all the analysed species. Despite significant differences, the upregulation of IL-17 expression is associated with the establishment of pregnancy in eutherian and metatherian mammals. The dysregulation of IL-17s in human reproductive disorders suggests them as a potential therapeutic target.
Collapse
Affiliation(s)
- Federica Giangrazi
- School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin, Ireland
| | - Dafne Buffa
- Department of Biology, Maynooth University, Maynooth, Ireland
| | - Andrew T Lloyd
- Department of Science and Health, Institute of Technology, Carlow, Ireland
| | | | - Louise E Glover
- School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin, Ireland
- Department of Reproductive Medicine, Merrion Fertility Clinic, Dublin 2, Ireland
| | - Cliona O'Farrelly
- School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin, Ireland
- School of Medicine, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
4
|
Vassilieff H, Geering ADW, Choisne N, Teycheney PY, Maumus F. Endogenous Caulimovirids: Fossils, Zombies, and Living in Plant Genomes. Biomolecules 2023; 13:1069. [PMID: 37509105 PMCID: PMC10377300 DOI: 10.3390/biom13071069] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/26/2023] [Accepted: 06/28/2023] [Indexed: 07/30/2023] Open
Abstract
The Caulimoviridae is a family of double-stranded DNA viruses that infect plants. The genomes of most vascular plants contain endogenous caulimovirids (ECVs), a class of repetitive DNA elements that is abundant in some plant genomes, resulting from the integration of viral DNA in the chromosomes of germline cells during episodes of infection that have sometimes occurred millions of years ago. In this review, we reflect on 25 years of research on ECVs that has shown that members of the Caulimoviridae have occupied an unprecedented range of ecological niches over time and shed light on their diversity and macroevolution. We highlight gaps in knowledge and prospects of future research fueled by increased access to plant genome sequence data and new tools for genome annotation for addressing the extent, impact, and role of ECVs on plant biology and the origin and evolutionary trajectories of the Caulimoviridae.
Collapse
Affiliation(s)
| | - Andrew D W Geering
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD 4072, Australia
| | | | - Pierre-Yves Teycheney
- CIRAD, UMR PVBMT, F-97410 Saint-Pierre de La Réunion, France
- UMR PVBMT, Université de la Réunion, F-97410 Saint-Pierre de La Réunion, France
| | - Florian Maumus
- INRAE, URGI, Université Paris-Saclay, 78026 Versailles, France
| |
Collapse
|
5
|
OBI: A computational tool for the analysis and systematization of the positive selection in proteins. MethodsX 2022; 9:101786. [PMID: 35910305 PMCID: PMC9334345 DOI: 10.1016/j.mex.2022.101786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 07/10/2022] [Indexed: 11/25/2022] Open
Abstract
There are multiple tools for positive selection analysis, including vaccine design and detection of variants of circulating drug-resistant pathogens in population selection. However, applying these tools to analyze a large number of protein families or as part of a comprehensive phylogenomics pipeline could be challenging. Since many standard bioinformatics tools are only available as executables, integrating them into complex Bioinformatics pipelines may not be possible. We have developed OBI, an open-source tool aimed to facilitate positive selection analysis on a large scale. It can be used as a stand-alone command-line app that can be easily installed and used as a Conda package. Some advantages of using OBI are:It speeds up the analysis by automating the entire process It allows multiple starting points and customization for the analysis It allows the retrieval and linkage of structural and evolutive data for a protein through We hope to provide with OBI a solution for reliably speeding up large-scale protein evolutionary and structural analysis.
Collapse
|
6
|
Abstract
The COVID-19 pandemic has given the study of virus evolution and ecology new relevance. Although viruses were first identified more than a century ago, we likely know less about their diversity than that of any other biological entity. Most documented animal viruses have been sampled from just two phyla - the Chordata and the Arthropoda - with a strong bias towards viruses that infect humans or animals of economic and social importance, often in association with strong disease phenotypes. Fortunately, the recent development of unbiased metagenomic next-generation sequencing is providing a richer view of the animal virome and shedding new light on virus evolution. In this Review, we explore our changing understanding of the diversity, composition and evolution of the animal virome. We outline the factors that determine the phylogenetic diversity and genomic structure of animal viruses on evolutionary timescales and show how this impacts assessment of the risk of disease emergence in the short term. We also describe the ongoing challenges in metagenomic analysis and outline key themes for future research. A central question is how major events in the evolutionary history of animals, such as the origin of the vertebrates and periodic mass extinction events, have shaped the diversity and evolution of the viruses they carry.
Collapse
|
7
|
Bansod S, Raj N, R A, Nair AS, Bhattacharyya S. Molecular docking and molecular dynamics simulation identify a novel Radicicol derivative that predicts exclusive binding to Plasmodium falciparum Topoisomerase VIB. J Biomol Struct Dyn 2021; 40:6939-6951. [PMID: 33650468 DOI: 10.1080/07391102.2021.1891970] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Plasmodium falciparum harbors a unique type II topoisomerase, Topoisomerase VIB (PfTopoVIB), expressed specifically at the actively replicating stage of the parasite. An earlier study showed that Radicicol inhibits the decatenation activity of PfTopoVIB and thereby arrests the parasites at the schizont stage. Radicicol targets a unique ATP-binding fold called the Bergerat fold, which is also present in the N-terminal domain of the heat shock protein 90 (PfHsp90). Hence, Radicicol may manifest off-target activity within the parasite. We speculate that the affinity of Radicicol towards PfTopoVIB could be enhanced by modifying its structure so that it shows preferential binding towards PfTopoVIB but not to PfHsp90. Here, we have performed the docking and affinity studies of 97 derivatives (structural analogs) of Radicicol and have identified 3 analogs that show selective binding only to PfTopoVIB and no binding with PfHsp90 at all. Molecular dynamics simulation study was performed for 50 ns in triplicate with those 3 analogs and we find that one of them shows a stable association with Radicicol. This study identifies the structural molecule which could be instrumental in blocking the function of PfTopoVIB and hence can serve as an important inhibitor for malaria pathogenesis. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Shephali Bansod
- Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Navya Raj
- Department of Health Informatics, College of Health Sciences, Saudi Electronic University, Dammam, Kingdom of Saudi Arabia
| | - Amjesh R
- Rajiv Gandhi Centre for Biotechnology, Thiruvananthapuram, Kerala, India
| | - Achuthsankar S Nair
- Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram, Kerala, India
| | - Sunanda Bhattacharyya
- Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| |
Collapse
|
8
|
Dijkstra JM. A method for making alignments of related protein sequences that share very little similarity; shark interleukin 2 as an example. Immunogenetics 2021; 73:35-51. [PMID: 33512550 DOI: 10.1007/s00251-020-01191-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Accepted: 11/11/2020] [Indexed: 02/07/2023]
Abstract
An optimized alignment of related protein sequences helps to see their important shared features and to deduce their phylogenetic relationships. At low levels of sequence similarity, there are no suitable computer programs for making the best possible alignment. This review summarizes some guidelines for how in such instances, nevertheless, insightful alignments can be made. The method involves, basically, the understanding of molecular family features at both the protein and intron-exon level, and the collection of many related sequences so that gradual differences may be observed. The method is exemplified by identifying and aligning interleukin 2 (IL-2) and related sequences in Elasmobranchii (sharks/rays) and coelacanth, as other authors have expressed difficulty with their identification. From the point of general immunology, it is interesting that the unusual long "leader" sequence of IL-15, already known in other species, is even more impressively conserved in cartilaginous fish. Furthermore, sequence comparisons suggest that IL-2 in cartilaginous fish has lost its ability to bind an IL-2Rα/15Rα receptor chain, which would prohibit the existence of a mechanism for regulatory T cell regulation identical to mammals.
Collapse
Affiliation(s)
- Johannes M Dijkstra
- Institute for Comprehensive Medical Science, Fujita Health University, Dengaku-gakubo 1-98Toyoake-shi, Aichi-ken, 470-1192, Japan.
| |
Collapse
|
9
|
Korotkov EV, Suvorova YM, Kostenko DO, Korotkova MA. Multiple Alignment of Promoter Sequences from the Arabidopsis thaliana L. Genome. Genes (Basel) 2021; 12:135. [PMID: 33494278 PMCID: PMC7909805 DOI: 10.3390/genes12020135] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 01/15/2021] [Accepted: 01/18/2021] [Indexed: 11/16/2022] Open
Abstract
In this study, we developed a new mathematical method for performing multiple alignment of highly divergent sequences (MAHDS), i.e., sequences that have on average more than 2.5 substitutions per position (x). We generated sets of artificial DNA sequences with x ranging from 0 to 4.4 and applied MAHDS as well as currently used multiple sequence alignment algorithms, including ClustalW, MAFFT, T-Coffee, Kalign, and Muscle to these sets. The results indicated that most of the existing methods could produce statistically significant alignments only for the sets with x < 2.5, whereas MAHDS could operate on sequences with x = 4.4. We also used MAHDS to analyze a set of promoter sequences from the Arabidopsis thaliana genome and discovered many conserved regions upstream of the transcription initiation site (from -499 to +1 bp); a part of the downstream region (from +1 to +70 bp) also significantly contributed to the obtained alignments. The possibilities of applying the newly developed method for the identification of promoter sequences in any genome are discussed. A server for multiple alignment of nucleotide sequences has been created.
Collapse
Affiliation(s)
- Eugene V. Korotkov
- Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Bld.2, 33 Leninsky Ave., 119071 Moscow, Russia;
- National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), 31 Kashirskoye Shosse, 115409 Moscow, Russia; (D.O.K.); (M.A.K.)
| | - Yulia M. Suvorova
- Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Bld.2, 33 Leninsky Ave., 119071 Moscow, Russia;
| | - Dmitrii O. Kostenko
- National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), 31 Kashirskoye Shosse, 115409 Moscow, Russia; (D.O.K.); (M.A.K.)
| | - Maria A. Korotkova
- National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), 31 Kashirskoye Shosse, 115409 Moscow, Russia; (D.O.K.); (M.A.K.)
| |
Collapse
|
10
|
Abstract
Transposable elements (TEs) are mobile DNA sequences that propagate within genomes. Through diverse invasion strategies, TEs have come to occupy a substantial fraction of nearly all eukaryotic genomes, and they represent a major source of genetic variation and novelty. Here we review the defining features of each major group of eukaryotic TEs and explore their evolutionary origins and relationships. We discuss how the unique biology of different TEs influences their propagation and distribution within and across genomes. Environmental and genetic factors acting at the level of the host species further modulate the activity, diversification, and fate of TEs, producing the dramatic variation in TE content observed across eukaryotes. We argue that cataloging TE diversity and dissecting the idiosyncratic behavior of individual elements are crucial to expanding our comprehension of their impact on the biology of genomes and the evolution of species.
Collapse
Affiliation(s)
- Jonathan N Wells
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14850; ,
| | - Cédric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14850; ,
| |
Collapse
|
11
|
Athira K, Gopakumar G. An integrated method for identifying essential proteins from multiplex network model of protein-protein interactions. J Bioinform Comput Biol 2020; 18:2050020. [PMID: 32795133 DOI: 10.1142/s0219720020500201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Cell survival requires the presence of essential proteins. Detection of essential proteins is relevant not only because of the critical biological functions they perform but also the role played by them as a drug target against pathogens. Several computational techniques are in place to identify essential proteins based on protein-protein interaction (PPI) network. Essential protein detection using only physical interaction data of proteins is challenging due to its inherent uncertainty. Hence, in this work, we propose a multiplex network-based framework that incorporates multiple protein interaction data from their physical, coexpression and phylogenetic profiles. An extended version termed as multiplex eigenvector centrality (MEC) is used to identify essential proteins from this network. The methodology integrates the score obtained from the multiplex analysis with subcellular localization and Gene Ontology information and is implemented using Saccharomyces cerevisiae datasets. The proposed method outperformed many recent essential protein prediction techniques in the literature.
Collapse
Affiliation(s)
- K Athira
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kozhikkode, Kerala 673601, India
| | - G Gopakumar
- Department of Computer Science and Engineering, National Institute of Technology Calicut, Kozhikkode, Kerala 673601, India
| |
Collapse
|
12
|
Abe T, Ikarashi R, Mizoguchi M, Otake M, Ikemura T. A strategy for predicting gene functions from genome and metagenome sequences on the basis of oligopeptide frequency distance. Genes Genet Syst 2020; 95:11-19. [DOI: 10.1266/ggs.19-00041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Takashi Abe
- Department of Information Engineering, Faculty of Engineering, Niigata University
| | - Ryo Ikarashi
- Department of Information Engineering, Faculty of Engineering, Niigata University
| | - Masaya Mizoguchi
- Department of Information Engineering, Faculty of Engineering, Niigata University
| | - Masashi Otake
- Department of Information Engineering, Faculty of Engineering, Niigata University
| | - Toshimichi Ikemura
- Department of Bioscience, Nagahama Institute of Bio-Science and Technology
| |
Collapse
|
13
|
Identification, Cloning, and Characterization of Staphylococcus pseudintermedius Coagulase. Infect Immun 2018; 86:IAI.00027-18. [PMID: 29891539 DOI: 10.1128/iai.00027-18] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 05/31/2018] [Indexed: 11/20/2022] Open
Abstract
Coagulase activation of prothrombin by staphylococcus induces the formation of fibrin deposition that facilitates the establishment of infection by Staphylococcus species. Coagulase activity is a key characteristic of Staphylococcus pseudintermedius; however, no coagulase gene or associated protein has been studied to characterize this activity. We report a recombinant protein sharing 40% similarity to Staphylococcus aureus coagulase produced from a putative S. pseudintermedius coagulase gene. Prothrombin activation by the protein was measured with a chromogenic assay using thrombin tripeptide substrate. Stronger interaction with bovine prothrombin than with human prothrombin was observed. The S. pseudintermedius coagulase protein also bound complement C3 and immunoglobulin. Recombinant coagulase facilitated the escape of S. pseudintermedius from phagocytosis, presumably by forming a bridge between opsonizing antibody, complement, and fibrinogen. Evidence from this work suggests that S. pseudintermedius coagulase has multifunctional properties that contribute to immune evasion that likely plays an important role in virulence.
Collapse
|
14
|
Keel BN, Deng B, Moriyama EN. MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks. Bioinformatics 2018; 34:1270-1277. [PMID: 29186344 DOI: 10.1093/bioinformatics/btx755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/23/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. Results The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. Availability and implementation MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot. Contact emoriyama2@unl.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Brittney N Keel
- USDA †, ARS, U.S. Meat Animal Research Center, Clay Center, NE 68933, USA.,Department of Mathematics, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Bo Deng
- Department of Mathematics, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Etsuko N Moriyama
- School of Biological Sciences and Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
15
|
Arkhipova IR. Using bioinformatic and phylogenetic approaches to classify transposable elements and understand their complex evolutionary histories. Mob DNA 2017; 8:19. [PMID: 29225705 PMCID: PMC5718144 DOI: 10.1186/s13100-017-0103-2] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Accepted: 11/28/2017] [Indexed: 12/11/2022] Open
Abstract
In recent years, much attention has been paid to comparative genomic studies of transposable elements (TEs) and the ensuing problems of their identification, classification, and annotation. Different approaches and diverse automated pipelines are being used to catalogue and categorize mobile genetic elements in the ever-increasing number of prokaryotic and eukaryotic genomes, with little or no connectivity between different domains of life. Here, an overview of the current picture of TE classification and evolutionary relationships is presented, updating the diversity of TE types uncovered in sequenced genomes. A tripartite TE classification scheme is proposed to account for their replicative, integrative, and structural components, and the need to expand in vitro and in vivo studies of their structural and biological properties is emphasized. Bioinformatic studies have now become front and center of novel TE discovery, and experimental pursuits of these discoveries hold great promise for both basic and applied science.
Collapse
Affiliation(s)
- Irina R Arkhipova
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543 USA
| |
Collapse
|
16
|
Oh Brother, Where Art Thou? Finding Orthologs in the Twilight and Midnight Zones of Sequence Similarity. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
17
|
Škunca N, Dessimoz C. Phylogenetic profiling: how much input data is enough? PLoS One 2015; 10:e0114701. [PMID: 25679783 PMCID: PMC4332489 DOI: 10.1371/journal.pone.0114701] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 11/10/2014] [Indexed: 12/04/2022] Open
Abstract
Phylogenetic profiling is a well-established approach for predicting gene function based on patterns of gene presence and absence across species. Much of the recent developments have focused on methodological improvements, but relatively little is known about the effect of input data size on the quality of predictions. In this work, we ask: how many genomes and functional annotations need to be considered for phylogenetic profiling to be effective? Phylogenetic profiling generally benefits from an increased amount of input data. However, by decomposing this improvement in predictive accuracy in terms of the contribution of additional genomes and of additional annotations, we observed diminishing returns in adding more than ∼100 genomes, whereas increasing the number of annotations remained strongly beneficial throughout. We also observed that maximising phylogenetic diversity within a clade of interest improves predictive accuracy, but the effect is small compared to changes in the number of genomes under comparison. Finally, we show that these findings are supported in light of the Open World Assumption, which posits that functional annotation databases are inherently incomplete. All the tools and data used in this work are available for reuse from http://lab.dessimoz.org/14_phylprof. Scripts used to analyse the data are available on request from the authors.
Collapse
Affiliation(s)
- Nives Škunca
- ETH Zürich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland
- Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland
- University College London, Gower St, London WC1E 6BT, UK
- * E-mail: (NS), (CD)
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland
- University College London, Gower St, London WC1E 6BT, UK
- * E-mail: (NS), (CD)
| |
Collapse
|
18
|
Bhardwaj G, Ko KD, Hong Y, Zhang Z, Ho NL, Chintapalli SV, Kline LA, Gotlin M, Hartranft DN, Patterson ME, Dave F, Smith EJ, Holmes EC, Patterson RL, van Rossum DB. PHYRN: a robust method for phylogenetic analysis of highly divergent sequences. PLoS One 2012; 7:e34261. [PMID: 22514627 PMCID: PMC3325999 DOI: 10.1371/journal.pone.0034261] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2011] [Accepted: 02/24/2012] [Indexed: 11/19/2022] Open
Abstract
Both multiple sequence alignment and phylogenetic analysis are problematic in the "twilight zone" of sequence similarity (≤ 25% amino acid identity). Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA) methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE) and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian) against a novel MSA-independent method (PHYRN) described here. Strikingly, at "midnight zone" genetic distances (~7% pairwise identity and 4.0 gaps per position), PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.
Collapse
Affiliation(s)
- Gaurav Bhardwaj
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Biochemistry and Molecular Medicine, School of Medicine, University of California Davis, Davis, California, United States of America
- Center for Translational Bioscience and Computing, University of California Davis, Davis, California, United States of America
| | - Kyung Dae Ko
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Yoojin Hong
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Zhenhai Zhang
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Ngai Lam Ho
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Sree V. Chintapalli
- Department of Physiology and Membrane Biology, School of Medicine, University of California Davis, Davis, California, United States of America
- Center for Translational Bioscience and Computing, University of California Davis, Davis, California, United States of America
| | - Lindsay A. Kline
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Matthew Gotlin
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - David Nicholas Hartranft
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Morgen E. Patterson
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Foram Dave
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Evan J. Smith
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Edward C. Holmes
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Randen L. Patterson
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Biochemistry and Molecular Medicine, School of Medicine, University of California Davis, Davis, California, United States of America
- Department of Physiology and Membrane Biology, School of Medicine, University of California Davis, Davis, California, United States of America
- Center for Translational Bioscience and Computing, University of California Davis, Davis, California, United States of America
| | - Damian B. van Rossum
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Center for Translational Bioscience and Computing, University of California Davis, Davis, California, United States of America
| |
Collapse
|
19
|
A widespread class of reverse transcriptase-related cellular genes. Proc Natl Acad Sci U S A 2011; 108:20311-6. [PMID: 21876125 DOI: 10.1073/pnas.1100266108] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Reverse transcriptases (RTs) polymerize DNA on RNA templates. They fall into several structurally related but distinct classes and form an assemblage of RT-like enzymes that, in addition to RTs, also includes certain viral RNA-dependent RNA polymerases (RdRP) synthesizing RNA on RNA templates. It is generally believed that most RT-like enzymes originate from retrotransposons or viruses and have no specific function in the host cell, with telomerases being the only notable exception. Here we report on the discovery and properties of a unique class of RT-related cellular genes collectively named rvt. We present evidence that rvts are not components of retrotransposons or viruses, but single-copy genes with a characteristic domain structure that may contain introns in evolutionarily conserved positions, occur in syntenic regions, and evolve under purifying selection. These genes can be found in all major taxonomic groups including protists, fungi, animals, plants, and even bacteria, although they exhibit patchy phylogenetic distribution in each kingdom. We also show that the RVT protein purified from one of its natural hosts, Neurospora crassa, exists in a multimeric form and has the ability to polymerize NTPs as well as dNTPs in vitro, with a strong preference for NTPs, using Mn(2+) as a cofactor. The existence of a previously unknown class of single-copy RT-related genes calls for reevaluation of the current views on evolution and functional roles of RNA-dependent polymerases in living cells.
Collapse
|
20
|
Ko KD, Bhardwaj G, Hong Y, Chang GS, Kiselyov K, van Rossum DB, Patterson RL. Phylogenetic profiles reveal structural/functional determinants of TRPC3 signal-sensing antennae. Commun Integr Biol 2011; 2:133-7. [PMID: 19704910 DOI: 10.4161/cib.7746] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2008] [Accepted: 01/02/2009] [Indexed: 11/19/2022] Open
Abstract
Biochemical assessment of channel structure/function is incredibly challenging. Developing computational tools that provide these data would enable translational research, accelerating mechanistic experimentation for the bench scientist studying ion channels. Starting with the premise that protein sequence encodes information about structure, function and evolution (SF&E), we developed a unified framework for inferring SF&E from sequence information using a knowledge-based approach. The Gestalt Domain Detection Algorithm-Basic Local Alignment Tool (GDDA-BLAST) provides phylogenetic profiles that can model, ab initio, SF&E relationships of biological sequences at the whole protein, single domain and single-amino acid level.1,2 In our recent paper,4 we have applied GDDA-BLAST analysis to study canonical TRP (TRPC) channels1 and empirically validated predicted lipid-binding and trafficking activities contained within the TRPC3 TRP_2 domain of unknown function. Overall, our in silico, in vitro, and in vivo experiments support a model in which TRPC3 has signal-sensing antennae which are adorned with lipid-binding, trafficking and calmodulin regulatory domains. In this Addendum, we correlate our functional domain analysis with the cryo-EM structure of TRPC3.3 In addition, we synthesize recent studies with our new findings to provide a refined model on the mechanism(s) of TRPC3 activation/deactivation.
Collapse
Affiliation(s)
- Kyung Dae Ko
- Department of Biology; The Pennsylvania State University; University Park; PA USA
| | | | | | | | | | | | | |
Collapse
|
21
|
Abstract
Despite recent advances in our understanding of diverse aspects of virus evolution, particularly on the epidemiological scale, revealing the ultimate origins of viruses has proven to be a more intractable problem. Herein, I review some current ideas on the evolutionary origins of viruses and assess how well these theories accord with what we know about the evolution of contemporary viruses. I note the growing evidence for the theory that viruses arose before the last universal cellular ancestor (LUCA). This ancient origin theory is supported by the presence of capsid architectures that are conserved among diverse RNA and DNA viruses and by the strongly inverse relationship between genome size and mutation rate across all replication systems, such that pre-LUCA genomes were probably both small and highly error prone and hence RNA virus-like. I also highlight the advances that are needed to come to a better understanding of virus origins, most notably the ability to accurately infer deep evolutionary history from the phylogenetic analysis of conserved protein structures.
Collapse
Affiliation(s)
- Edward C Holmes
- Center for Infectious Disease Dynamics, Department of Biology, The Pennsylvania State University, Mueller Laboratory, University Park,Pennsylvania 16802, USA.
| |
Collapse
|
22
|
Shortridge MD, Triplet T, Revesz P, Griep MA, Powers R. Bacterial protein structures reveal phylum dependent divergence. Comput Biol Chem 2011; 35:24-33. [PMID: 21315656 DOI: 10.1016/j.compbiolchem.2010.12.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2010] [Revised: 12/28/2010] [Accepted: 12/29/2010] [Indexed: 01/26/2023]
Abstract
Protein sequence space is vast compared to protein fold space. This raises important questions about how structures adapt to evolutionary changes in protein sequences. A growing trend is to regard protein fold space as a continuum rather than a series of discrete structures. From this perspective, homologous protein structures within the same functional classification should reveal a constant rate of structural drift relative to sequence changes. The clusters of orthologous groups (COG) classification system was used to annotate homologous bacterial protein structures in the Protein Data Bank (PDB). The structures and sequences of proteins within each COG were compared against each other to establish their relatedness. As expected, the analysis demonstrates a sharp structural divergence between the bacterial phyla Firmicutes and Proteobacteria. Additionally, each COG had a distinct sequence/structure relationship, indicating that different evolutionary pressures affect the degree of structural divergence. However, our analysis also shows the relative drift rate between sequence identity and structure divergence remains constant.
Collapse
Affiliation(s)
- Matthew D Shortridge
- Department of Chemistry, University of Nebraska-Lincoln, 68588-0304, United States
| | | | | | | | | |
Collapse
|
23
|
Hong Y, Kang J, Lee D, van Rossum DB. Adaptive GDDA-BLAST: fast and efficient algorithm for protein sequence embedding. PLoS One 2010; 5:e13596. [PMID: 21042584 PMCID: PMC2962639 DOI: 10.1371/journal.pone.0013596] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Accepted: 09/28/2010] [Indexed: 11/28/2022] Open
Abstract
A major computational challenge in the genomic era is annotating structure/function to the vast quantities of sequence information that is now available. This problem is illustrated by the fact that most proteins lack comprehensive annotations, even when experimental evidence exists. We previously theorized that embedded-alignment profiles (simply "alignment profiles" hereafter) provide a quantitative method that is capable of relating the structural and functional properties of proteins, as well as their evolutionary relationships. A key feature of alignment profiles lies in the interoperability of data format (e.g., alignment information, physio-chemical information, genomic information, etc.). Indeed, we have demonstrated that the Position Specific Scoring Matrices (PSSMs) are an informative M-dimension that is scored by quantitatively measuring the embedded or unmodified sequence alignments. Moreover, the information obtained from these alignments is informative, and remains so even in the "twilight zone" of sequence similarity (<25% identity). Although our previous embedding strategy was powerful, it suffered from contaminating alignments (embedded AND unmodified) and high computational costs. Herein, we describe the logic and algorithmic process for a heuristic embedding strategy named "Adaptive GDDA-BLAST." Adaptive GDDA-BLAST is, on average, up to 19 times faster than, but has similar sensitivity to our previous method. Further, data are provided to demonstrate the benefits of embedded-alignment measurements in terms of detecting structural homology in highly divergent protein sequences and isolating secondary structural elements of transmembrane and ankyrin-repeat domains. Together, these advances allow further exploration of the embedded alignment data space within sufficiently large data sets to eventually induce relevant statistical inferences. We show that sequence embedding could serve as one of the vehicles for measurement of low-identity alignments and for incorporation thereof into high-performance PSSM-based alignment profiles.
Collapse
Affiliation(s)
- Yoojin Hong
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, Korea
- Department of Biostatistics, College of Medicine, Korea University, Seoul, Korea
| | - Dongwon Lee
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- College of Information Sciences and Technology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Damian B. van Rossum
- Center for Computational Proteomics, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
24
|
Domazet-Loso T, Tautz D. Phylostratigraphic tracking of cancer genes suggests a link to the emergence of multicellularity in metazoa. BMC Biol 2010; 8:66. [PMID: 20492640 PMCID: PMC2880965 DOI: 10.1186/1741-7007-8-66] [Citation(s) in RCA: 211] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2009] [Accepted: 05/21/2010] [Indexed: 01/08/2023] Open
Abstract
Background Phylostratigraphy is a method used to correlate the evolutionary origin of founder genes (that is, functional founder protein domains) of gene families with particular macroevolutionary transitions. It is based on a model of genome evolution that suggests that the origin of complex phenotypic innovations will be accompanied by the emergence of such founder genes, the descendants of which can still be traced in extant organisms. The origin of multicellularity can be considered to be a macroevolutionary transition, for which new gene functions would have been required. Cancer should be tightly connected to multicellular life since it can be viewed as a malfunction of interaction between cells in a multicellular organism. A phylostratigraphic tracking of the origin of cancer genes should, therefore, also provide insights into the origin of multicellularity. Results We find two strong peaks of the emergence of cancer related protein domains, one at the time of the origin of the first cell and the other around the time of the evolution of the multicellular metazoan organisms. These peaks correlate with two major classes of cancer genes, the 'caretakers', which are involved in general functions that support genome stability and the 'gatekeepers', which are involved in cellular signalling and growth processes. Interestingly, this phylogenetic succession mirrors the ontogenetic succession of tumour progression, where mutations in caretakers are thought to precede mutations in gatekeepers. Conclusions A link between multicellularity and formation of cancer has often been predicted. However, this has not so far been explicitly tested. Although we find that a significant number of protein domains involved in cancer predate the origin of multicellularity, the second peak of cancer protein domain emergence is, indeed, connected to a phylogenetic level where multicellular animals have emerged. The fact that we can find a strong and consistent signal for this second peak in the phylostratigraphic map implies that a complex multi-level selection process has driven the transition to multicellularity.
Collapse
Affiliation(s)
- Tomislav Domazet-Loso
- Max-Planck Institut für Evolutionsbiologie, August-Thienemannstrasse 2, 24306 Plön, Germany
| | | |
Collapse
|
25
|
Freilich S, Goldovsky L, Gottlieb A, Blanc E, Tsoka S, Ouzounis CA. Stratification of co-evolving genomic groups using ranked phylogenetic profiles. BMC Bioinformatics 2009; 10:355. [PMID: 19860884 PMCID: PMC2775751 DOI: 10.1186/1471-2105-10-355] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2009] [Accepted: 10/27/2009] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. RESULTS The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. CONCLUSION Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.
Collapse
Affiliation(s)
- Shiri Freilich
- The Blavatnik School of Computer Sciences and School of Medicine, Tel-Aviv University, Tel-Aviv 69978, Israel.
| | | | | | | | | | | |
Collapse
|
26
|
Abe T, Kanaya S, Uehara H, Ikemura T. A novel bioinformatics strategy for function prediction of poorly-characterized protein genes obtained from metagenome analyses. DNA Res 2009; 16:287-97. [PMID: 19801558 PMCID: PMC2762413 DOI: 10.1093/dnares/dsp018] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
As a result of remarkable progresses of DNA sequencing technology, vast quantities of genomic sequences have been decoded. Homology search for amino acid sequences, such as BLAST, has become a basic tool for assigning functions of genes/proteins when genomic sequences are decoded. Although the homology search has clearly been a powerful and irreplaceable method, the functions of only 50% or fewer of genes can be predicted when a novel genome is decoded. A prediction method independent of the homology search is urgently needed. By analyzing oligonucleotide compositions in genomic sequences, we previously developed a modified Self-Organizing Map ‘BLSOM’ that clustered genomic fragments according to phylotype with no advance knowledge of phylotype. Using BLSOM for di-, tri- and tetrapeptide compositions, we developed a system to enable separation (self-organization) of proteins by function. Analyzing oligopeptide frequencies in proteins previously classified into COGs (clusters of orthologous groups of proteins), BLSOMs could faithfully reproduce the COG classifications. This indicated that proteins, whose functions are unknown because of lack of significant sequence similarity with function-known proteins, can be related to function-known proteins based on similarity in oligopeptide composition. BLSOM was applied to predict functions of vast quantities of proteins derived from mixed genomes in environmental samples.
Collapse
Affiliation(s)
- Takashi Abe
- Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Shiga-ken, Japan.
| | | | | | | |
Collapse
|
27
|
Arkhipova IR. Reverse transcriptases of retroviruses and retroelements: an evolutionary perspective. Retrovirology 2009. [PMCID: PMC2766974 DOI: 10.1186/1742-4690-6-s2-o2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
|
28
|
Dwivedi B, Gadagkar SR. Phylogenetic inference under varying proportions of indel-induced alignment gaps. BMC Evol Biol 2009; 9:211. [PMID: 19698168 PMCID: PMC2746219 DOI: 10.1186/1471-2148-9-211] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2009] [Accepted: 08/23/2009] [Indexed: 01/19/2023] Open
Abstract
Background The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect indel (insertion/deletion) events during the evolution of DNA sequences. The resulting (true) alignments were subjected to commonly used gap treatment and phylogenetic inference methods. Results (1) In general, there was a strong – almost deterministic – relationship between the amount of gap in the data and the level of phylogenetic accuracy when the alignments were very "gappy", (2) gaps resulting from deletions (as opposed to insertions) contributed more to the inaccuracy of phylogenetic inference, (3) the probabilistic methods (Bayesian, PhyML & "MLε, " a method implemented in DNAML in PHYLIP) performed better at most levels of gap percentage when compared to parsimony (MP) and distance (NJ) methods, with Bayesian analysis being clearly the best, (4) methods that treat gapped sites as missing data yielded less accurate trees when compared to those that attribute phylogenetic signal to the gapped sites (by coding them as binary character data – presence/absence, or as in the MLε method), and (5) in general, the accuracy of phylogenetic inference depended upon the amount of available data when the gaps resulted from mainly deletion events, and the amount of missing data when insertion events were equally likely to have caused the alignment gaps. Conclusion When gaps in an alignment are a consequence of indel events in the evolution of the sequences, the accuracy of phylogenetic analysis is likely to improve if: (1) alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis, (2) the evolutionary signal provided by indels is harnessed in the phylogenetic analysis, and (3) methods that utilize the phylogenetic signal in indels are developed for distance methods too. When the true homology is known and the amount of gaps is 20 percent of the alignment length or less, the methods used in this study are likely to yield trees with 90–100 percent accuracy.
Collapse
Affiliation(s)
- Bhakti Dwivedi
- Department of Biology, University of Dayton, 300 College Park, Dayton, OH 46469-2320, USA.
| | | |
Collapse
|
29
|
Haimel M, Pröll K, Rebhan M. ProteinArchitect: protein evolution above the sequence level. PLoS One 2009; 4:e6176. [PMID: 19603068 PMCID: PMC2705671 DOI: 10.1371/journal.pone.0006176] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2009] [Accepted: 05/23/2009] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND While many authors have discussed models and tools for studying protein evolution at the sequence level, molecular function is usually mediated by complex, higher order features such as independently folding domains and linear motifs that are based on or embedded in a particular arrangment of features such as secondary structure elements, transmembrane domains and regions with intrinsic disorder. This 'protein architecture' can, in its most simplistic representation, be visualized as domain organization cartoons that can be used to compare proteins in terms of the order of their mostly globular domains. METHODOLOGY Here, we describe a visual approach and a webserver for protein comparison that extend the domain organization cartoon concept. By developing an information-rich, compact visualization of different protein features above the sequence level, potentially related proteins can be compared at the level of propensities for secondary structure, transmembrane domains and intrinsic disorder, in addition to PFAM domains. A public Web server is available at www.proteinarchitect.net, while the code is provided at protarchitect.sourceforge.net. CONCLUSIONS/SIGNIFICANCE Due to recent advances in sequencing technologies we are now flooded with millions of predicted proteins that await comparative analysis. In many cases, mature tools focused on revealing hits with considerable global or local similarity to well-characterized proteins will not be able to lead us to testable hypotheses about a protein's function, or the function of a particular region. The visual comparison of different types of protein features with ProteinArchitect will be useful when assessing the relevance of similarity search hits, to discover subgroups in protein families and superfamilies, and to understand protein regions with conserved features outside globular regions. Therefore, this approach is likely to help researchers to develop testable hypotheses about a protein's function even if is somewhat distant from the more characterized proteins, by facilitating the discovery of features that are conserved above the sequence level for comparison and further experimental investigation.
Collapse
Affiliation(s)
- Matthias Haimel
- Department of Bioinformatics, Upper Austrian University of Applied Sciences, Hagenberg, Austria
| | - Karin Pröll
- Department of Bioinformatics, Upper Austrian University of Applied Sciences, Hagenberg, Austria
| | - Michael Rebhan
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
| |
Collapse
|
30
|
Piednoël M, Bonnivard E. DIRS1-like retrotransposons are widely distributed among Decapoda and are particularly present in hydrothermal vent organisms. BMC Evol Biol 2009; 9:86. [PMID: 19400949 PMCID: PMC2685390 DOI: 10.1186/1471-2148-9-86] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2008] [Accepted: 04/28/2009] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Transposable elements are major constituents of eukaryote genomes and have a great impact on genome structure and stability. Considering their mutational abilities, TEs can contribute to the genetic diversity and evolution of organisms. Knowledge of their distribution among several genomes is an essential condition to study their dynamics and to better understand their role in species evolution. DIRS1-like retrotransposons are a particular group of retrotransposons according to their mode of transposition that implies a tyrosine recombinase. To date, they have been described in a restricted number of species in comparison with the LTR retrotransposons. In this paper, we determine the distribution of DIRS1-like elements among 25 decapod species, 10 of them living in hydrothermal vents that correspond to particularly unstable environments. RESULTS Using PCR approaches, we have identified 15 new DIRS1-like families in 15 diverse decapod species (shrimps, lobsters, crabs and galatheid crabs). Hydrothermal organisms show a particularly great diversity of DIRS1-like elements with 5 families characterized among Alvinocarididae shrimps and 3 in the galatheid crab Munidopsis recta. Phylogenic analyses show that these elements are divergent toward the DIRS1-like families previously described in other crustaceans and arthropods and form a new clade called AlDIRS1. At larger scale, the distribution of DIRS1-like retrotransposons appears more or less patchy depending on the taxa considered. Indeed, a scattered distribution can be observed in the infraorder Brachyura whereas all the species tested in infraorders Caridea and Astacidea harbor some DIRS1-like elements. CONCLUSION Our results lead to nearly double both the number of DIRS1-like elements described to date, and the number of species known to harbor these ones. In this study, we provide the first degenerate primers designed to look specifically for DIRS1-like retrotransposons. They allowed for revealing for the first time a widespread distribution of these elements among a large phylum, here the order Decapoda. They also suggest some peculiar features of these retrotransposons in hydrothermal organisms where a great diversity of elements is already observed. Finally, this paper constitutes the first essential step which allows for considering further studies based on the dynamics of the DIRS1-like retrotransposons among several genomes.
Collapse
Affiliation(s)
- Mathieu Piednoël
- UMR 7138 Systématique Adaptation Evolution, Equipe Génétique et Evolution, Université Pierre et Marie Curie Paris 6, Case 5, Bâtiment A, porte 427, 7 quai St Bernard, 75252 Paris Cedex 05, France
| | - Eric Bonnivard
- UMR 7138 Systématique Adaptation Evolution, Equipe Génétique et Evolution, Université Pierre et Marie Curie Paris 6, Case 5, Bâtiment A, porte 427, 7 quai St Bernard, 75252 Paris Cedex 05, France
| |
Collapse
|
31
|
Hong Y, Chalkia D, Ko KD, Bhardwaj G, Chang GS, van Rossum DB, Patterson RL. Phylogenetic Profiles Reveal Structural and Functional Determinants of Lipid-binding. ACTA ACUST UNITED AC 2009; 2:139-149. [PMID: 19946567 DOI: 10.4172/jpb.1000071] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
One of the major challenges in the genomic era is annotating structure/function to the vast quantities of sequence information now available. Indeed, most of the protein sequence database lacks comprehensive annotation, even when experimental evidence exists. Further, within structurally resolved and functionally annotated protein domains, additional functionalities contained in these domains are not apparent. To add further complication, small changes in the amino-acid sequence can lead to profound changes in both structure and function, underscoring the need for rapid and reliable methods to analyze these types of data. Phylogenetic profiles provide a quantitative method that can relate the structural and functional properties of proteins, as well as their evolutionary relationships. Using all of the structurally resolved Src-Homology-2 (SH2) domains, we demonstrate that knowledge-bases can be used to create single-amino acid phylogenetic profiles which reliably annotate lipid-binding. Indeed, these measures isolate the known phosphotyrosine and hydrophobic pockets as integral to lipid-binding function. In addition, we determined that the SH2 domain of Tec family kinases bind to lipids with varying affinity and specificity. Simulating mutations in Bruton's tyrosine kinase (BTK) that cause X-Linked Agammaglobulinemia (XLA) predict that these mutations alter lipid-binding, which we confirm experimentally. In light of these results, we propose that XLA-causing mutations in the SH3-SH2 domain of BTK alter lipid-binding, which could play a causative role in the XLA-phenotype. Overall, our study suggests that the number of lipid-binding proteins is drastically underestimated and, with further development, phylogenetic profiles can provide a method for rapidly increasing the functional annotation of protein sequences.
Collapse
Affiliation(s)
- Yoojin Hong
- Center for Computational Proteomics, The Pennsylvania State University
| | | | | | | | | | | | | |
Collapse
|
32
|
Glutamatergic regulation of serine racemase via reversal of PIP2 inhibition. Proc Natl Acad Sci U S A 2009; 106:2921-6. [PMID: 19193859 DOI: 10.1073/pnas.0813105106] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
D-serine is a physiologic coagonist with glutamate at NMDA-subtype glutamate receptors. As D-serine is localized in glia, synaptically released glutamate presumably stimulates the glia to form and release D-serine, enabling glutamate/D-serine cotransmission. We show that serine racemase (SR), which generates D-serine from L-serine, is physiologically inhibited by phosphatidylinositol (4,5)-bisphosphate (PIP2) presence in membranes where SR is localized. Activation of metabotropic glutamate receptors (mGluR5) on glia leads to phospholipase C-mediated degradation of PIP2, relieving SR inhibition. Thus mutants of SR that cannot bind PIP2 lose their membrane localizations and display a 4-fold enhancement of catalytic activity. Moreover, mGluR5 activation of SR activity is abolished by inhibiting phospholipase C.
Collapse
|
33
|
Simon DM, Zimmerly S. A diversity of uncharacterized reverse transcriptases in bacteria. Nucleic Acids Res 2008; 36:7219-29. [PMID: 19004871 PMCID: PMC2602772 DOI: 10.1093/nar/gkn867] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Retroelements are usually considered to be eukaryotic elements because of the large number and variety in eukaryotic genomes. By comparison, reverse transcriptases (RTs) are rare in bacteria, with only three characterized classes: retrons, group II introns and diversity-generating retroelements (DGRs). Here, we present the results of a bioinformatic survey that aims to define the landscape of RTs across eubacterial, archaeal and phage genomes. We identify and categorize 1021 RTs, of which the majority are group II introns (73%). Surprisingly, a plethora of novel RTs are found that do not belong to characterized classes. The RTs have 11 domain architectures and are classified into 20 groupings based on sequence similarity, phylogenetic analyses and open reading frame domain structures. Interestingly, group II introns are the only bacterial RTs to exhibit clear evidence for independent mobility, while five other groups have putative functions in defense against phage infection or promotion of phage infection. These examples suggest that additional beneficial functions will be discovered among uncharacterized RTs. The study lays the groundwork for experimental characterization of these highly diverse sequences and has implications for the evolution of retroelements.
Collapse
Affiliation(s)
- Dawn M Simon
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada
| | | |
Collapse
|