1
|
Yunes JM, Babbitt PC. Effusion: prediction of protein function from sequence similarity networks. Bioinformatics 2019; 35:442-451. [PMID: 30084920 PMCID: PMC6361244 DOI: 10.1093/bioinformatics/bty672] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 07/24/2018] [Accepted: 07/30/2018] [Indexed: 12/26/2022] Open
Abstract
Motivation Critical evaluation of methods for protein function prediction shows that data integration improves the performance of methods that predict protein function, but a basic BLAST-based method is still a top contender. We sought to engineer a method that modernizes the classical approach while avoiding pitfalls common to state-of-the-art methods. Results We present a method for predicting protein function, Effusion, which uses a sequence similarity network to add context for homology transfer, a probabilistic model to account for the uncertainty in labels and function propagation, and the structure of the Gene Ontology (GO) to best utilize sparse input labels and make consistent output predictions. Effusion's model makes it practical to integrate rare experimental data and abundant primary sequence and sequence similarity. We demonstrate Effusion's performance using a critical evaluation method and provide an in-depth analysis. We also dissect the design decisions we used to address challenges for predicting protein function. Finally, we propose directions in which the framework of the method can be modified for additional predictive power. Availability and implementation The source code for an implementation of Effusion is freely available at https://github.com/babbittlab/effusion. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jeffrey M Yunes
- UC Berkeley - UCSF Graduate Program in Bioengineering, University of California, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
- Quantitative Biosciences Institute, University of California, San Francisco, CA, USA
| |
Collapse
|
2
|
Watson AK, Lannes R, Pathmanathan JS, Méheust R, Karkar S, Colson P, Corel E, Lopez P, Bapteste E. The Methodology Behind Network Thinking: Graphs to Analyze Microbial Complexity and Evolution. Methods Mol Biol 2019; 1910:271-308. [PMID: 31278668 DOI: 10.1007/978-1-4939-9074-0_9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In the post genomic era, large and complex molecular datasets from genome and metagenome sequencing projects expand the limits of what is possible for bioinformatic analyses. Network-based methods are increasingly used to complement phylogenetic analysis in studies in molecular evolution, including comparative genomics, classification, and ecological studies. Using network methods, the vertical and horizontal relationships between all genes or genomes, whether they are from cellular chromosomes or mobile genetic elements, can be explored in a single expandable graph. In recent years, development of new methods for the construction and analysis of networks has helped to broaden the availability of these approaches from programmers to a diversity of users. This chapter introduces the different kinds of networks based on sequence similarity that are already available to tackle a wide range of biological questions, including sequence similarity networks, gene-sharing networks and bipartite graphs, and a guide for their construction and analyses.
Collapse
Affiliation(s)
- Andrew K Watson
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Romain Lannes
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Jananan S Pathmanathan
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Raphaël Méheust
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Slim Karkar
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
- Department of Ecology, Evolution, and Natural Resources, School of Environmental and Biological Sciences, Rutgers, The State University of NJ, New Brunswick, NJ, USA
| | - Philippe Colson
- Fondation Institut Hospitalo-Universitaire Méditerranée Infection, Pôle des Maladies Infectieuses et Tropicales Clinique et Biologique, Fédération de Bactériologie-Hygiène-Virologie, Centre Hospitalo-Universitaire Tione, Assistance Publique-Hôpitaux de Marseille, Marseille, France
- Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes (URMITE) UM63, CNRS 7278, IRD 198, INSERM U1095, Aix-Marseille University, Marseille, France
| | - Eduardo Corel
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Philippe Lopez
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France
| | - Eric Bapteste
- Sorbonne Universités, Institut de Biologie Paris-Seine, UPMC Université Paris 6, Paris, France.
| |
Collapse
|
3
|
Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily. Proc Natl Acad Sci U S A 2017; 114:E9549-E9558. [PMID: 29078300 PMCID: PMC5692541 DOI: 10.1073/pnas.1706849114] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Functionally diverse enzyme superfamilies are sets of homologs that conserve a structural fold and mechanistic details but perform various distinct chemical reactions. What are the evolutionary routes by which ancestral proteins diverge to produce extant enzymes? We present an approach that combines experimental data with computational tools to trace these sequence–structure–function transitions in a model system, the functionally diverse flavin mononucleotide-dependent nitroreductases (NTRs). Our results suggest an evolutionary model in which contemporary NTR classes have diverged in a radial manner from a minimal flavin-binding scaffold via insertions at key positions and fixation of functional residues, yielding the reaction versatility of contemporary enzymes. These principles will facilitate rational design of NTRs and advance general approaches for delineating the emergence of functional diversity in enzyme superfamilies. Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold.
Collapse
|
4
|
Ahmed FH, Carr PD, Lee BM, Afriat-Jurnou L, Mohamed AE, Hong NS, Flanagan J, Taylor MC, Greening C, Jackson CJ. Sequence-Structure-Function Classification of a Catalytically Diverse Oxidoreductase Superfamily in Mycobacteria. J Mol Biol 2015; 427:3554-3571. [PMID: 26434506 DOI: 10.1016/j.jmb.2015.09.021] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Revised: 09/23/2015] [Accepted: 09/24/2015] [Indexed: 12/11/2022]
Abstract
The deazaflavin cofactor F420 enhances the persistence of mycobacteria during hypoxia, oxidative stress, and antibiotic treatment. However, the identities and functions of the mycobacterial enzymes that utilize F420 under these conditions have yet to be resolved. In this work, we used sequence similarity networks to analyze the distribution of the largest F420-dependent protein family in mycobacteria. We show that these enzymes are part of a larger split β-barrel enzyme superfamily (flavin/deazaflavin oxidoreductases, FDORs) that include previously characterized pyridoxamine/pyridoxine-5'-phosphate oxidases and heme oxygenases. We show that these proteins variously utilize F420, flavin mononucleotide, flavin adenine dinucleotide, and heme cofactors. Functional annotation using phylogenetic, structural, and spectroscopic methods revealed their involvement in heme degradation, biliverdin reduction, fatty acid modification, and quinone reduction. Four novel crystal structures show that plasticity in substrate binding pockets and modifications to cofactor binding motifs enabled FDORs to carry out a variety of functions. This systematic classification and analysis provides a framework for further functional analysis of the roles of FDORs in mycobacterial pathogenesis and persistence.
Collapse
Affiliation(s)
- F Hafna Ahmed
- Australian National University Research School of Chemistry, Sullivans Creek Road, Acton, ACT 2601, Australia
| | - Paul D Carr
- Australian National University Research School of Chemistry, Sullivans Creek Road, Acton, ACT 2601, Australia
| | - Brendon M Lee
- Australian National University Research School of Chemistry, Sullivans Creek Road, Acton, ACT 2601, Australia
| | - Livnat Afriat-Jurnou
- Australian National University Research School of Chemistry, Sullivans Creek Road, Acton, ACT 2601, Australia
| | - A Elaaf Mohamed
- Australian National University Research School of Chemistry, Sullivans Creek Road, Acton, ACT 2601, Australia
| | - Nan-Sook Hong
- Australian National University Research School of Chemistry, Sullivans Creek Road, Acton, ACT 2601, Australia
| | - Jack Flanagan
- University of Auckland Faculty of Medical and Health Sciences, 85 Park Road, Grafton, Auckland 2013, New Zealand
| | - Matthew C Taylor
- Commonwealth Scientific and Industrial Research Organisation Land and Water Flagship, Clunies Ross Street, Acton, ACT 2060, Australia
| | - Chris Greening
- Commonwealth Scientific and Industrial Research Organisation Land and Water Flagship, Clunies Ross Street, Acton, ACT 2060, Australia
| | - Colin J Jackson
- Australian National University Research School of Chemistry, Sullivans Creek Road, Acton, ACT 2601, Australia.
| |
Collapse
|
5
|
Leuthaeuser JB, Knutson ST, Kumar K, Babbitt PC, Fetrow JS. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity. Protein Sci 2015; 24:1423-39. [PMID: 26073648 PMCID: PMC4570537 DOI: 10.1002/pro.2724] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 06/10/2015] [Indexed: 01/27/2023]
Abstract
The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods.
Collapse
Affiliation(s)
- Janelle B Leuthaeuser
- Department of Molecular Genetics and Genomics, Wake Forest University, Winston-Salem, North Carolina, 27106
| | - Stacy T Knutson
- Departments of Computer Science and Physics, Wake Forest University, Winston-Salem, North Carolina, 27106
| | - Kiran Kumar
- Departments of Computer Science and Physics, Wake Forest University, Winston-Salem, North Carolina, 27106
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, Institute for Quantitative Biosciences University of California San Francisco, San Francisco, California, 94158.,Department of Pharmaceutical Chemistry, Institute for Quantitative Biosciences University of California San Francisco, San Francisco, California, 94158
| | - Jacquelyn S Fetrow
- Department of Molecular Genetics and Genomics, Wake Forest University, Winston-Salem, North Carolina, 27106.,Departments of Computer Science and Physics, Wake Forest University, Winston-Salem, North Carolina, 27106.,Office of the Provost, Maryland Hall 202, University of Richmond, VA, 23173
| |
Collapse
|
6
|
Bastos HP, Sousa L, Clarke LA, Couto FM. GRYFUN: a web application for GO term annotation visualization and analysis in protein sets. PLoS One 2015; 10:e0119631. [PMID: 25794277 PMCID: PMC4368792 DOI: 10.1371/journal.pone.0119631] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 01/31/2015] [Indexed: 11/29/2022] Open
Abstract
Functional context for biological sequence is provided in the form of annotations. However, within a group of similar sequences there can be annotation heterogeneity in terms of coverage and specificity. This in turn can introduce issues regarding the interpretation of actual functional similarity and overall functional coherence of such a group. One way to mitigate such issues is through the use of visualization and statistical techniques. Therefore, in order to help interpret this annotation heterogeneity we created a web application that generates Gene Ontology annotation graphs for protein sets and their associated statistics from simple frequencies to enrichment values and Information Content based metrics. The publicly accessible website http://xldb.di.fc.ul.pt/gryfun/ currently accepts lists of UniProt accession numbers in order to create user-defined protein sets for subsequent annotation visualization and statistical assessment. GRYFUN is a freely available web application that allows GO annotation visualization of protein sets and which can be used for annotation coherence and cohesiveness analysis and annotation extension assessments within under-annotated protein sets.
Collapse
Affiliation(s)
- Hugo P. Bastos
- LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
- * E-mail:
| | - Lisete Sousa
- Departamento de Estatística e Investigação Operacional e Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Luka A. Clarke
- BioFIG - Centre for Biodiversity, Functional and Integrative Genomics, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Francisco M. Couto
- LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
7
|
Giollo M, Martin AJM, Walsh I, Ferrari C, Tosatto SCE. NeEMO: a method using residue interaction networks to improve prediction of protein stability upon mutation. BMC Genomics 2014; 15 Suppl 4:S7. [PMID: 25057121 PMCID: PMC4083412 DOI: 10.1186/1471-2164-15-s4-s7] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The rapid growth of un-annotated missense variants poses challenges requiring novel strategies for their interpretation. From the thermodynamic point of view, amino acid changes can lead to a change in the internal energy of a protein and induce structural rearrangements. This is of great relevance for the study of diseases and protein design, justifying the development of prediction methods for variant-induced stability changes. RESULTS Here we propose NeEMO, a tool for the evaluation of stability changes using an effective representation of proteins based on residue interaction networks (RINs). RINs are used to extract useful features describing interactions of the mutant amino acid with its structural environment. Benchmarking shows NeEMO to be very effective, allowing reliable predictions in different parts of the protein such as β-strands and buried residues. Validation on a previously published independent dataset shows that NeEMO has a Pearson correlation coefficient of 0.77 and a standard error of 1 Kcal/mol, outperforming nine recent methods. The NeEMO web server can be freely accessed from URL: http://protein.bio.unipd.it/neemo/. CONCLUSIONS NeEMO offers an innovative and reliable tool for the annotation of amino acid changes. A key contribution are RINs, which can be used for modeling proteins and their interactions effectively. Interestingly, the approach is very general, and can motivate the development of a new family of RIN-based protein structure analyzers. NeEMO may suggest innovative strategies for bioinformatics tools beyond protein stability prediction.
Collapse
|